SALMUBench: A Benchmark for Sensitive Association-Level Multimodal Unlearning¶
Conference: CVPR2026
arXiv: 2603.26316
Code: cvc-mmu.github.io/salmubench
Area: Multimodal VLM
Keywords: machine unlearning, CLIP, privacy protection, association-level unlearning, benchmark
TL;DR¶
The authors propose SALMUBench—the first benchmark for association-level machine unlearning in CLIP-like models. It consists of a \(60\text{K}\) synthetic dataset of person-sensitive attribute pairs, a pair of Compromised/Clean models trained from scratch, and a structured holdout evaluation protocol. The study systematically reveals three failure modes in existing unlearning methods: catastrophic collapse, over-generalized unlearning, and ineffective unlearning.
Background & Motivation¶
Vision-language models (VLMs) like CLIP, trained on massive web-scale data, may inadvertently memorize sensitive personal information (e.g., associating faces with phone numbers). The "Right to be Forgotten" in the GDPR requires models to selectively delete learned sensitive associations.
Limitations of Prior Work: (1) Unimodal unlearning methods are difficult to transfer to embedding-based models using contrastive learning; (2) Existing multimodal unlearning benchmarks (MLLMU-Bench, FIUBench) mainly target VQA evaluation for generative MLLMs and are unsuitable for CLIP's embedding space; (3) Current evaluations inject sensitive knowledge via fine-tuning, failing to isolate unlearning effects from pre-training artifacts; (4) Crucially, existing simple "forget-retain" evaluation frameworks cannot detect over-generalized unlearning, where a method might successfully erase target information while inadvertently wiping related knowledge that should be preserved.
Method¶
Overall Architecture¶
SALMUBench is an evaluation infrastructure designed to objectively determine the success of unlearning. The core problem it addresses is confirming that when a CLIP model is requested to forget a sensitive association (e.g., "Face A \(\leftrightarrow\) Phone Number X"), it truly forgets it without collateral damage to unrelated knowledge. The framework consists of three layers: a \(60\text{K}\) synthetic dataset with person-attribute pairs, a comparison between "Compromised" (exposed to sensitive data) and "Clean" models trained from scratch, and a structured holdout protocol to quantify both unlearning efficacy and side effects.
%%{init: {'flowchart': {'rankSpacing': 24, 'nodeSpacing': 28, 'padding': 6, 'wrappingWidth': 400, 'subGraphTitleMargin': {'top': 8, 'bottom': 16}}}}%%
flowchart TD
subgraph DATA["Synthetic Dataset SALMU (5-Stage Pipeline)"]
direction TB
A1["SFHQ Anchor Faces<br/>1000 Identity Anchors"] --> A2["IP-Adapter Identity-Preserving Generation<br/>~100 Diverse Images per Person"]
A2 --> A3["CLIP Filtering + Demographic Verification<br/>Converged to 774 Coherent Identities"]
A3 --> A4["Assigned Fictional PII<br/>Global Unique Name/Phone/Email/IBAN"]
A4 --> A5["Gemma3-12B Label Rewriting<br/>5 Linguistic Styles"]
end
DATA --> B["≈60K Image-Text Pairs<br/>retain set (400M real) + sensitive set (60K sensitive)"]
B --> C1["Clean Model<br/>Trained on retain only = Gold Standard"]
B --> C2["Compromised Model<br/>retain + sensitive = Starting Point"]
C2 --> D["Unlearning Algorithm<br/>Goal: Revert Compromised to Clean"]
C1 -.Reference.-> D
D --> E
subgraph E["Structured Holdout Evaluation Protocol"]
direction TB
E1["forget set<br/>Unlearning Efficacy: RetFail↓ / AssocStr / ACS"]
E2["holdout_identity<br/>Inter-identity Collateral Damage"]
E3["holdout_association<br/>Intra-identity Collateral Damage"]
E4["retain set<br/>Utility Preservation: GenKnow↑"]
end
E --> F["Diagnosing Three Failure Modes<br/>Catastrophic / Over-generalized / Ineffective"]
Key Designs¶
1. Synthetic Dataset SALMU: Replacing Real Privacy with Controllable Fictional Identities
Using real personal privacy data is neither compliant nor reproducible. The paper utilizes a 5-stage pipeline to create "realistic but fictional" identities. It starts with 1,000 synthetic faces from SFHQ as identity anchors; generates approximately 100 diverse images per person using IP-Adapter-FaceID Plus; filters these through CLIP zero-shot labeling and consistency checks to reach 774 coherent identities; assigns globally unique, culturally consistent fictional PII (name, city, phone, etc.); and uses Gemma3-12B to rewrite labels into five styles. The final dataset contains \(60\text{K}\) pairs covering 65 countries, providing clear "to-be-forgotten" associations for public release.
2. Trained-from-scratch Compromised/Clean Model Pairs: Isolating Unlearning Targets from Artifacts
Traditional benchmarks often inject sensitive knowledge via fine-tuning, making it impossible to distinguish between the intended unlearning of the injection and the erasure of pre-existing training artifacts. SALMUBench trains two ViT-B/16 CLIP models from scratch: the Clean model is trained only on the retain set (\(\approx 400\text{M}\) real pairs), while the Compromised model is trained on both retain and sensitive (\(60\text{K}\)) sets using identical seeds and configurations (32 epochs on 128 H100s). The Clean model serves as the literal gold standard for the ideal unlearning result.
3. Structured Holdout Evaluation Protocol: Quantifying Over-generalized Unlearning
To address the blind spot of "forget/retain" splits, the paper divides the 774 sensitive identities into subsets: the forget set (seen during unlearning), holdout_identity (identities never seen by the unlearning algorithm, used to measure inter-identity damage), and holdout_association (other associations for the same person, used to measure intra-identity damage). Metrics include:
- Unlearning Efficacy: RetFail (Retrieval Failure rate, lower MRR is better), AssocStr (average cosine similarity), ACS (accuracy of a logistic regression classifier in distinguishing correct vs. shuffled pairs), and IdZSC (identity classification).
- Utility Preservation: GenKnow (Zero-shot ImageNet-1K accuracy), InterIdSim/IntraIdSim (holdout set similarity), and VisIdInt (visual identity integrity).
Key Experimental Results¶
Main Results (\(5 \times\) Budget)¶
| Method | RetFail \(\downarrow\) | GenKnow \(\uparrow\) | InterIdSim | IntraIdSim |
|---|---|---|---|---|
| Clean (Gold Standard) | 0.001 | 0.633 | 0.143 | 0.143 |
| Compromised | 0.236 | 0.638 | 0.321 | 0.321 |
| CLIPErase | 0.001 | 0.634 | 0.024 | 0.024 |
| DELETE | 0.001 | 0.632 | 0.023 | 0.023 |
| VLUnlearn | 0.001 | 0.638 | 0.210 | 0.210 |
| Finetuning | 0.003 | 0.638 | 0.209 | 0.209 |
| Neg. Gradient | 0.009 | 0.630 | 0.063 | 0.061 |
| Shuffled Captions | 0.004 | 0.548 | 0.212 | 0.212 |
| Direct Sim. Min. | 0.001 | 0.615 | -0.420 | -0.425 |
Analysis of Three Failure Modes¶
| Failure Type | Representative Method | Characteristics |
|---|---|---|
| Catastrophic Collapse | Shuffled Captions, Direct Sim. Min. | Effective unlearning but massive drop in GenKnow. |
| Over-generalized Unlearning | DELETE, CLIPErase | Precise unlearning and GenKnow preservation, but severe damage to holdout sets. |
| Ineffective Unlearning | Generic Captions | Low collateral damage but fails to erase target associations. |
Key Findings¶
- No single method simultaneously avoids all three failure modes—this remains the core open challenge in the field.
- High-efficiency unlearning (\(>99\%\) leakage reduction with \(<1\%\)
GenKnowdrop) is achievable, but existing methods (DELETE, CLIPErase) achieve this through over-generalization. - When
AssocStris pushed below the Clean model baseline (\(0.142\)), over-generalization is triggered as the method over-corrects and erases related, unseen associations. - Simple "forget-retain" evaluations are "blind" to over-generalization.
Highlights & Insights¶
- The structured holdout evaluation design is the most significant contribution; the distinction between
holdout_identityandholdout_associationmakes over-generalized unlearning quantifiable for the first time. - Training two full CLIP models from scratch (\(400\text{M}\) data point, 128 H100s) provides the cleanest possible baseline despite the high cost.
- The synthetic data pipeline (IP-Adapter identity preservation + CLIP filtering + LLM rewriting) is highly reusable.
- The taxonomy of three failure modes provides a clear target for future research: solving unlearning efficacy, utility preservation, and over-generalization avoidance simultaneously.
Limitations & Future Work¶
- The study focuses on CLIP dual-encoders; evaluating the propagation of sensitive information in diffusion models using CLIP backbones is a natural extension.
- It covers structured PII only; generalization to implicit sensitive concepts (artistic styles, political stances) is unknown.
- There is a lack of recoverability diagnostics—how quickly can a model relearn forgotten information through fine-tuning?
- While domain consistency was verified via KS tests, the diversity of 100 images per synthetic identity is still relatively limited.
Related Work & Insights¶
- vs. MultiDelete / CLIPErase: Previous methods did not target specific personal privacy and used fine-tuning for injection, making evaluation less rigorous.
- vs. TOFU / FIUBench: These focus on VQA evaluation for generative MLLMs, which are inapplicable to CLIP's embedding space.
- Insight: Over-generalized unlearning likely exists in knowledge editing/unlearning for LLMs as well, warranting cross-domain validation.
Rating¶
- Novelty: ⭐⭐⭐⭐⭐ First association-level unlearning benchmark for CLIP with innovative holdout design.
- Experimental Thoroughness: ⭐⭐⭐⭐ 9 baseline methods and multi-budget comparisons, though limited to the ViT-B/16 architecture.
- Writing Quality: ⭐⭐⭐⭐⭐ Rigorous description of dataset construction and evaluation protocols.
- Value: ⭐⭐⭐⭐⭐ Establishes a new standard for multimodal machine unlearning with open-source data, models, and code.