FADE: Fine-Grained Erasure in Text-to-Image Diffusion-based Foundation Models¶
Conference: CVPR 2025
arXiv: 2503.19783
Code: https://iab-rubric/unlearning/FG-Un
Area: Image Generation
Keywords: Concept Erasure, Fine-Grained Erasure, Adjacent-Aware, Diffusion Model Safety, LoRA
TL;DR¶
Proposed FADE (Fine-grained Attenuation for Diffusion Erasure), which addresses the adjacency issue of concept erasure in text-to-image diffusion models for the first time—precisely erasing the target concept while preserving the generation capability of semantically adjacent concepts, improving preservation performance by at least 12% compared to SOTA.
Background & Motivation¶
Background: Text-to-image diffusion models need to selectively remove specific concepts (such as harmful content). Existing erasing methods (ESD, UCE, SPM) focus on locality but ignore adjacency.
Limitations of Prior Work: After erasing "Golden Retriever", the generation of other retriever breeds is also compromised; erasing a specific flower species prevents similar species from being correctly generated.
Key Challenge: The target concept and adjacent concepts are close in the feature space, leading coarse-grained erasure to spill over to adjacent concepts.
Core Idea: Identify the set of semantically adjacent concepts via a Concept Neighborhood, and achieve precise erasure while protecting adjacent concepts using Mesh Modules (LoRA).
Method¶
Overall Architecture¶
Three components: (1) Concept Neighborhood, which uses image embedding similarity to find the top-K adjacent concepts of the target concept; (2) Mesh Modules (LoRA adapters), which precisely modify the model via three loss functions; (3) ERB evaluation metric, which measures both the erasure performance and adjacent preservation.
Key Designs¶
-
Concept Neighborhood:
- Function: Automatically identify the set of concepts semantically adjacent to the target concept.
- Mechanism: Use the original model to generate \(m\) images for each concept, use a pre-trained image encoder to compute the mean feature vector, and select the top-K most similar concepts based on cosine similarity. It is theoretically proven that under certain conditions, the k-NN classifier in the latent space converges to the optimal Bayes classifier.
- Design Motivation: Automatically construct the adjacent set when semantic annotations (such as WordNet) are unavailable.
-
Mesh Modules with Three Loss Functions:
- Function: Balance the erasure of the target concept and the preservation of adjacent concepts.
- Mechanism: Erasing Loss uses a triplet form to pull apart the noise prediction distance between the target concept and its adjacent concepts; Guidance Loss directs the noise prediction of the target concept toward the null concept; Adjacency Loss constrains the noise prediction of adjacent concepts to remain unchanged: \(\mathcal{L}_{FADE} = \lambda_{er}\mathcal{L}_{er} + \lambda_{adj}\mathcal{L}_{adj} + \lambda_{guid}\mathcal{L}_{guid}\)
- Design Motivation: The three losses are respectively responsible for erasing, guiding, and preserving, achieving a fine-grained balance.
-
ERB Evaluation Metric:
- Function: Uniformly evaluate erasing effectiveness and adjacency preservation.
- Mechanism: The Erasing-Retention Balance Score simultaneously quantifies the erasure degree of the target concept and the retention degree of adjacent concepts.
- Design Motivation: Existing metrics only focus on erasing performance, neglecting the crucial dimension of adjacency preservation.
Loss & Training¶
Only the LoRA parameters (Mesh Modules) are trained, while keeping the original model weights frozen. The training data is generated by the model itself.
Key Experimental Results¶
Main Results¶
On Stanford Dogs, Oxford Flowers, CUB, I2P, Imagenette, and ImageNet-1k: - Adjacent preservation performance improves by \(\geq 12\%\) compared to SOTA. - Target concept erasure performance is comparable to or better than SOTA.
Key Findings¶
- Adjacency Loss is crucial for protecting adjacent concepts.
- A neighborhood set size of \(K=5\) performs best in most scenarios.
- The advantages are more pronounced on fine-grained datasets (such as Stanford Dogs).
Highlights & Insights¶
- Formally defines and addresses the "adjacency" problem of concept erasure for the first time.
- The Concept Neighborhood method is simple yet theoretically grounded.
- The proposed ERB metric fills the gap in evaluation dimensions.
Limitations & Future Work¶
- Concept Neighborhood relies on the quality of the image encoder.
- The expressive capacity of LoRA adapters might limit complex erasure scenarios.
- Handling compositional concepts (such as "a Golden Retriever wearing a skirt") remains to be investigated.
Rating¶
- Novelty: 8/10 — First formalization of adjacency-aware erasure.
- Technical Depth: 8/10 — The three-loss design is theoretically grounded.
- Experimental Thoroughness: 8/10 — Extensively validated across 6 datasets.
- Writing Quality: 8/10 — Clear problem definition.