Origin Identification for Text-Guided Image-to-Image Diffusion Models¶

Conference: ICML 2025
arXiv: 2501.02376
Code: Yes (OriPID Dataset)
Area: Image Generation
Keywords: Image Source Identification, Diffusion Model Safety, VAE Embedding, Linear Transformation, Generalizability

TL;DR¶

This paper proposes the ID2 task (Origin Identification of text-guided image-to-image Diffusion models), constructs the first dataset OriPID, and demonstrates that applying a linear transformation to VAE embeddings can generalize to identify the original source of generated images, outperforming similarity-based methods by 31.6% in mAP.

Background & Motivation¶

Background: Text-guided image-to-image (I2I) diffusion models (e.g., SD2, SDXL, SD3) can creatively modify input images based on text prompts, which are widely applied in digital art and content creation.

Limitations of Prior Work: Such powerful editing capabilities can be abused to spread disinformation (e.g., tampering with news images), infringe copyrights (e.g., editing after removing watermarks), and evade content tracking. Currently, effective methods to identify the original source of generated images are lacking.

Key Challenge: Images generated by different diffusion models exhibit a "visual discrepancy" — each model has its unique visual style. Similarity-based retrieval methods trained on one model fail to generalize to images generated by other models, limiting their practical deployment.

Goal: Propose the ID2 task and address its core challenge — cross-model generalizable origin identification.

Key Insight: Leverage the VAE encoders of diffusion models themselves along with linear transformations to theoretically prove their existence and cross-model generalizability.

Core Idea: There exists a linear transformation matrix \(\mathbf{W}\) such that the VAE embeddings of the generated image and the original image are close enough after this transformation, and this transformation can generalize across different diffusion models.

Method¶

Overall Architecture¶

Input: Query image (generated by some diffusion model) + Large-scale reference image library (1 million images)
Process: Extract embeddings using a VAE encoder \(\rightarrow\) Apply linear transformation \(\rightarrow\) Perform nearest neighbor retrieval
Output: Identify the original source image of the query image

Key Designs¶

Existence Theorem (Theorem 1):
- Proof: For a trained diffusion model \(\mathcal{F}_1\), there exists a linear transformation matrix \(\mathbf{W}\) such that \(\mathcal{E}_1(g_1) \cdot \mathbf{W} = \mathcal{E}_1(o) \cdot \mathbf{W}\).
- Where \(g_1\) is the generated image, \(o\) is the original image, and \(\mathcal{E}_1\) is the VAE encoder.
- The proof is based on: the noise estimation \(\epsilon_\theta\) of a trained diffusion model is close to the real noise \(\epsilon\), which implies \(\mathcal{E}_1(g_1) - \mathcal{E}_1(o) \approx 0\).
- Design Motivation: Provide theoretical guarantees for the proposed method.
Generalizability Theorem (Theorem 2):
- Proof: The matrix \(\mathbf{W}\) can generalize to a different diffusion model \(\mathcal{F}_2\), i.e., \(\mathcal{E}_1(g_2) \cdot \mathbf{W} = \mathcal{E}_1(o) \cdot \mathbf{W}\).
- Key Observation: Although \(\mathbf{W}\) differs across different models, the cosine similarity of their singular value vectors is extremely high (\(>0.99\)).
- Design Motivation: In practical scenarios, it is impossible to predict beforehand which diffusion model has been abused.
Implementation: Metric Learning to Train the Linear Transformation:
- Optimize CosFace loss via gradient descent to learn the theoretical matrix \(\mathbf{W}\).
- For a triplet \((g, o, n)\) (generated image, original image, negative sample), optimize \(\mathcal{L} = \mathcal{L}_{mtr}(\mathbf{z} \cdot \mathbf{W}, \mathbf{z}_o \cdot \mathbf{W}, \mathbf{z}_n \cdot \mathbf{W})\).
- Training takes only about 1/8.6 of the time compared to deep network methods.
- Design Motivation: Convert theoretical guarantees into an optimizable practical method.

Loss & Training¶

CosFace metric loss: brings positive pairs closer and pushes negative pairs further apart.
Training Data: 2 million images generated by SD2 (100k original images \(\times\) 20 prompts/original image).
Testing: 5,000 query images for each of SD2 and 6 unseen models, retrieved from a reference set of 1 million images.
Images are uniformly resized to \(256 \times 256\), learning rate of 3.5e-4, using 8 \(\times\) A100 GPUs.

Key Experimental Results¶

Main Results¶

Diffusion Model	Type	mAP↑	Acc↑
SD2	Seen	88.8%	86.6%
SDXL	Unseen	81.5%	78.0%
OpenDalle	Unseen	87.3%	85.7%
ColorfulXL	Unseen	89.3%	87.1%
Kandinsky-3	Unseen	85.7%	84.5%
SD3	Unseen	85.7%	82.0%
Kolors	Unseen	90.3%	88.5%

Ablation Study¶

Configuration	mAP (Seen)	mAP (Unseen)	Description
Best Pre-trained Feature (AnyPattern)	29.1%	-	Directly using public model
Similarity Method (CosFace Fine-tuning)	87.1%	55.0%	Poor generalization
Domain Generalization Method (QAConv-GS)	83.4%	75.8%	Generalization improved but slow
Ours (Linear Transformation)	88.8%	86.6%	Strongest generalization
Replacing Linear with MLP	91.4%↑	80.3%↓	Overfitting!

Key Findings¶

Significant Generalization Advantage: Outperforms the best similarity-based method by +31.6% in mAP, and the best domain generalization method by +10.8% in mAP.
Linear Transformation is Key: Using a representationally stronger MLP instead leads to overfitting, validating the theoretical framework (only a linear transformation is needed).
Efficiency Advantage: Training is 8.6 times faster, and matching is 875 times faster (vector matching vs. feature map matching).
Although the VAE encoder parameters and embeddings of different diffusion models indeed differ, the linear transformation still generalizes.
mAP only decreases by 3.7% and 0.3% under Gaussian blur (\(\sigma=3\)) and JPEG compression (\(30\%\)), respectively.

Highlights & Insights¶

Valuable Task Definition: ID2 is an important and timely security task.
Elegant Theory: Derives the existence and generalizability of the linear transformation starting from the denoising principles of diffusion models.
Simple yet Effective: The linear transformation turns out to outperform deep network methods.
Overfitting Insight: The MLP experiment perfectly demonstrates why the theoretically guaranteed linear method performs better.
High Practicality: Extremely efficient training and retrieval speeds make it suitable for large-scale deployment.

Limitations & Future Work¶

It only applies to I2I models based on the "adding noise + denoising" paradigm; other paradigms like InstructPix2Pix are outside the theoretical guarantees.
It is completely inapplicable to CLIP encodings (e.g., IP-Adapter), requiring a new theoretical framework.
Hard negative sample issue: Visually highly similar unrelated images can lead to false matches.
Currently validated on only 7 diffusion models; generalizability on more models needs further validation.

Image Copy Detection (ICD) is the most relevant task, but I2I translation is far more complex than manual transformations.
Diffusion Image Generation Detection (e.g., DIRE) focuses on "whether an image is AI-generated," whereas this work focuses on "identifying which image is the original source."
Insight: Structural properties of the VAE embedding space may also be valuable for other safety-related tasks in diffusion models.

Rating¶

Novelty: ⭐⭐⭐⭐⭐ Extremely novel task definition and theoretical framework.
Experimental Thoroughness: ⭐⭐⭐⭐⭐ 7 diffusion models + extensive baselines + robustness testing + detailed ablation.
Writing Quality: ⭐⭐⭐⭐⭐ Clear problem definition, rigorous theoretical derivation, and well-designed experiments.
Value: ⭐⭐⭐⭐⭐ Important security task coupled with an elegant solution.