ZeST: Zero-Shot Material Transfer from a Single Image¶
Conference: ECCV 2024
arXiv: 2404.06425
Code: Project Page
Area: 3D Vision
TL;DR¶
ZeST is proposed, a zero-shot, training-free material transfer method. By combining three parallel branches—extracting material representations via IP-Adapter, providing geometric guidance through ControlNet, and utilizing a foreground grayscale image for lighting cues—it achieves 2D material transfer from a single material exemplar image to a target object.
Background & Motivation¶
Background¶
Background: Editing the materials of objects in images (e.g., transforming marble to steel) is of great value in applications such as game design and e-commerce.
Limitations of Prior Work¶
Limitations of Prior Work: Traditional methods require explicit 3D geometry, illumination estimation, and material parameter specification, which is highly complex.
Key Challenge¶
Key Challenge: Text-driven methods struggle to precisely describe the fine texture details of materials.
Core Idea¶
Core Idea: Existing approaches like TextureDreamer require 3-5 material images for DreamBooth fine-tuning, which is time-consuming and non-scalable.
Core Problem¶
Core Problem: How to transfer material to an object in a target image from a single material exemplar image under training-free conditions.
Method¶
Overall Architecture¶
Three parallel branches are input to Stable Diffusion XL Inpainting: 1. Material Encoding Branch: IP-Adapter encodes the material exemplar to extract the material latent representation \(z_M\). 2. Geometric Guidance Branch: DPT estimates the depth map \(\rightarrow\) ControlNet provides structural constraints. 3. Lighting Guidance Branch: A foreground grayscale image \(I_{init}\) is created \(\rightarrow\) Inpainting model initialization.
Key Designs¶
Material Encoding (IP-Adapter): - Utilizes a CLIP image encoder to extract the feature representation of the material exemplar. - Injects it into the diffusion model via cross-attention. - Requires no DreamBooth fine-tuning; a single image is sufficient.
Geometric Guidance (ControlNet): - Depth-based ControlNet obtains structural information from the depth map of the input image. - Overrides the geometric information in the material encoding \(z_M\), ensuring the generated object retains its original shape. - IP-Adapter + Img2Img fails to preserve the original geometry (Key Finding).
Lighting Guidance via Foreground Grayscale (Core Design Choice): - Directly using the original image: The original object's color acts as a strong prior that interferes with the material color (e.g., an orange pumpkin). - Random noise initialization: Loses lighting and shading direction information. - Foreground grayscale image (Optimal): Removes color priors while preserving lighting and shading information. - \(I_{init} = F \odot I_{gray} + (1-F) \odot I\)
Implementation Details: - For depth estimation, DPT is used, and for foreground extraction, Rembg is used. - Based on SDXL Inpainting + the corresponding versions of ControlNet and IP-Adapter. - Takes about 15 seconds to generate one image on a single A10 GPU.
Loss & Training¶
No training is involved; the entire process is completed during the inference stage of the pre-trained models.
Key Experimental Results¶
Main Results¶
Quantitative comparison on the synthetic dataset (9 materials \(\times\) 10 meshes = 90 pairs):
| Method | PSNR↑ | LPIPS↓ | CLIP↑ |
|---|---|---|---|
| IP-Adapter + InstructPix2Pix | 16.92 | 0.096 | 0.745 |
| Dreambooth + Geo/Illum Guidance | 25.46 | 0.053 | 0.893 |
| ZeST | 25.82 | 0.046 | 0.899 |
User study (real images, 1-5 scale):
| Method | Material Fidelity↑ | Realism↑ |
|---|---|---|
| IP-Adapter + InstructPix2Pix | 1.48 | 3.23 |
| Dreambooth + Geo/Illum Guidance | 3.25 | 3.41 |
| ZeST | 4.05 | 3.78 |
Ablation Study¶
Comparative validation of lighting guidance methods: The original image preserves the object's base color, interfering with the material color (Setting 1); random noise leads to incorrect lighting direction (Setting 2); the foreground grayscale image is optimal (Setting 3).
Robustness testing: - Changing the illumination direction and rotation angle of the material exemplar \(\rightarrow\) the generation results are highly consistent. - Scaling the material exemplar image \(\rightarrow\) the model automatically adjusts the texture scale to fit the target object.
Key Findings¶
- ZeST significantly leads in user ratings for material fidelity (4.05 vs. 3.25), demonstrating that the zero-shot method outperforms the fine-tuning method.
- Foreground grayscale is the optimal choice for lighting guidance (the value of core design choices).
- ZeST is robust to changes in lighting, rotation, and scaling of the material exemplar.
- The DreamBooth encoding process loses material information and causes color shifts (especially in real-world scenes).
- Can be extended to multi-object editing (iterating with SAM) and illumination-aware material transfer.
Highlights & Insights¶
- A purely engineering yet extremely elegant pipeline design—three branches each perform their own functions, completely training-free.
- The insight of foreground grayscaling is simple yet key: removing color \(\rightarrow\) removing color priors, while preserving grayscale \(\rightarrow\) preserving lighting and shading.
- As a pioneer of a new problem (2D-to-2D material transfer), it proposes both synthetic and real-world evaluation datasets.
- Can be combined with 3D texturing methods like Text2Tex to bring material-exemplar-driven texturing into 3D.
Limitations & Future Work¶
- Sometimes transfers material only to the most "plausible" regions of the object (partial transfer issue).
- Multiple materials contained within the material exemplar may get blended.
- IP-Adapter lacks region-level material extraction capabilities.
- The latent space control of diffusion models is sometimes unpredictable.
Rating¶
- Novelty: ⭐⭐⭐⭐ — New problem definition + clever training-free scheme
- Effectiveness: ⭐⭐⭐⭐ — Significant lead in user studies
- Practicality: ⭐⭐⭐⭐⭐ — Zero-shot, training-free, 15-second generation
- Recommendation: ⭐⭐⭐⭐