ConceptGuard: Continual Personalized Text-to-Image Generation with Forgetting and Confusion Mitigation¶
Conference: CVPR 2025
arXiv: 2503.10358
Code: None (unmentioned)
Area: Diffusion Models / Image Generation
Keywords: Continual Learning, Personalized Generation, Catastrophic Forgetting, Concept Confusion, LoRA Fine-Tuning
TL;DR¶
ConceptGuard is proposed to mitigate catastrophic forgetting and concept confusion in continual personalized T2I generation through four strategies: shift embeddings, concept-binding prompts, memory-preserving regularization, and priority queue replay. It significantly outperforms existing methods on multi-concept benchmarks.
Background & Motivation¶
Background: Personalized T2I generation (such as DreamBooth and Textual Inversion) enables diffusion models to learn new concepts. However, users typically need to teach multiple concepts sequentially—a continual learning scenario.
Limitations of Prior Work: (1) Catastrophic Forgetting: The ability to generate old concepts is lost when learning new concepts. (2) Concept Confusion: Features of different concepts are mixed (e.g., mixing the features of user A's dog and user B's cat). Existing continual diffusion methods (such as Continual Diffusion) partially mitigate forgetting but suffer from severe concept confusion (FI = 4.1).
Key Challenge: Shared model weights lead to mutual interference between old and new concepts—fine-tuning new concepts overwrites the LoRA weights of old concepts (forgetting), while embeddings of different concepts overlap in the weight space (confusion).
Goal: Concurrently mitigate forgetting (maintaining the quality of old concepts) and confusion (distinguishing identity features of different concepts) when sequentially adding new personalized concepts.
Key Insight: A multi-faceted approach—dynamically adjusting old embeddings to adapt to weight changes (shift embeddings), introducing concept-binding prompts for disambiguation, constraining the magnitude of weight changes (regularization), and intelligently choosing concepts for replay (priority queue).
Core Idea: Simultaneously mitigate both forgetting and confusion in continual personalized T2I through four complementary strategies: shift embeddings, concept-binding prompts, LoRA weight regularization, and adaptive replay.
Method¶
Overall Architecture¶
Based on SDXL + LoRA (fine-tuning only K/V self-attention layers). When learning a new concept: (1) Use shift embeddings to update all old concept embeddings to adapt to model changes. (2) Generate a trainable binding prompt and a global binding prompt for each concept to disambiguate. (3) Apply regularization to LoRA weight increments to avoid excessive shifting. (4) Use a priority queue to select replay concepts based on time and importance.
Key Designs¶
-
Shift Embeddings
- Function: Dynamically adjust the embeddings of old concepts to restore generation quality after model weights are updated due to learning new concepts.
- Mechanism: After model weights change from \(\theta_t\) to \(\theta_{t+1}\), the old embedding \(e_i\) may no longer produce correct outputs under the new weights. A lightweight embedding shift \(\Delta e_i\) is trained to allow \(e_i + \Delta e_i\) to restore the generation of the old concept under the new model.
- Design Motivation: Simply replaying old concepts may be insufficient because the model "landscape" has changed, and the "position" corresponding to the old embedding is no longer optimal.
-
Concept-Binding Prompts (CBP)
- Function: Introduce unique learnable identifiers for each concept to prevent mutual confusion between concepts.
- Mechanism: Each concept is assigned a trainable importance weight and a binding prompt token. A global binding prompt unifies the representation space of different concepts. Ablation shows that CBP is the most critical component—removing it drops multi-concept IA from 69.8 to 59.3, and degrades FI from 1.9 to 3.9.
- Design Motivation: Standard tokens (e.g., "[V1]", "[V2]") are semitically ambiguous during continual learning. Binding prompts provide stronger signals for concept differentiation.
-
Priority Queue Adaptive Replay
- Function: Intelligently select which old concepts to replay and how much to replay them.
- Mechanism: Maintain a priority queue sorted by the learning time of concepts (recent concepts require more consolidation) and their learned importance weights. When learning a new concept, top-K concepts are selected from the queue to generate replay images (using SAM to segment the background to improve diversity).
- Design Motivation: Replaying all old concepts is computationally prohibitive, while random selection is inefficient. The priority queue balances efficiency and performance.
Loss & Training¶
Standard diffusion denoising loss + L2 regularization on LoRA weight increments. LoRA fine-tunes only the K/V self-attention layers.
Key Experimental Results¶
Main Results¶
| Method | TA-Single Concept↑ | TA-Multi-Concept↑ | IA-Single Concept↑ | IA-Multi-Concept↑ | Forgetting FT↓ | Confusion FI↓ |
|---|---|---|---|---|---|---|
| Textual Inversion | 40.1 | 35.1 | 71.1 | 45.3 | 0.0 | 0.0 |
| Continual Diffusion | 42.3 | 37.8 | 77.5 | 57.1 | 1.7 | 4.1 |
| ConceptGuard | 43.1 | 40.3 | 81.3 | 69.8 | 0.9 | 1.9 |
Ablation Study¶
| Component | IA-Multi-Concept↑ | FI↓ |
|---|---|---|
| Full Model | 69.8 | 1.9 |
| w/o CBP | 59.3 | 3.9 |
| w/o Shift Embedding | 65.1 | 2.4 |
| w/o Regularization | 67.2 | 2.1 |
Key Findings¶
- CBP is the most critical component (removing it drops IA by 10.5 and worsens FI by 2×).
- The improvement is most significant in multi-concept scenarios (IA from 57.1 → 69.8), demonstrating that the method effectively resolves concept confusion.
- Both forgetting and confusion are simultaneously mitigated (FT 0.9 vs 1.7, FI 1.9 vs 4.1).
Highlights & Insights¶
- Dual analysis of forgetting and confusion is more comprehensive than focusing strictly on forgetting; concept confusion can be more severe than forgetting in practical applications.
- The idea of shift embeddings can be transferred to other continual learning scenarios, where old "interfaces" need recalibration after model updates.
Limitations & Future Work¶
- Generating replay images (including SAM segmentation) for old concepts is required at each learning step, which incurs high computational overhead.
- Performance still degrades as the number of concepts increases, albeit at a slower rate.
- Overly reliant on LoRA fine-tuning; adaptation to other fine-tuning strategies remains unexplored.
Rating¶
- Novelty: ⭐⭐⭐⭐ The combined design of the four strategies is comprehensive, with CBP as the core contribution.
- Experimental Thoroughness: ⭐⭐⭐⭐ Main experiments + detailed ablation + multi-metric evaluation.
- Writing Quality: ⭐⭐⭐⭐ Clearly defined problem.
- Value: ⭐⭐⭐⭐ Direct value to practical personalized T2I applications.