Neighbor-Aware Localized Concept Erasure in Text-to-Image Diffusion Models¶

Conference: CVPR 2026 arXiv: 2603.25994 Code: https://github.com/alirezafarashah/NLCE Area: Image Generation / AI Safety Keywords: Concept Erasure, Diffusion Models, Neighbor Preservation, Training-Free, Localized Erasure

TL;DR¶

This paper proposes NLCE, a training-free three-stage concept erasure framework for text-to-image diffusion models. It achieves precise localized erasure of target concepts through spectrally-weighted representation modulation, attention-guided spatial gating, and gated feature scrubbing, while explicitly preserving semantically neighboring concepts. NLCE outperforms existing methods on Oxford Flowers, Stanford Dogs, celebrity identity, and sensitive content erasure benchmarks.

Background & Motivation¶

Background: Concept erasure methods for T2I diffusion models fall into two categories: training-based approaches (e.g., ESD, MACE, SPM, which require fine-tuning) and training-free approaches (e.g., UCE, RECE, GLoCE, which intervene only at inference time). Localized erasure methods such as GLoCE aim to restrict edits to target regions only.
Limitations of Prior Work: The Neighbor Gap problem — erasing a fine-grained concept inadvertently degrades semantically adjacent concepts. For example, erasing one dog breed reduces generation quality for other breeds as well.
Key Challenge: Concept representations are highly entangled in the embedding space, making it impossible for simple projection or suppression operations to precisely distinguish between target and neighboring concepts.
Goal: To accurately erase a target concept without training, while preserving the semantic integrity of neighboring concepts.
Key Insight: A three-stage progressive erasure pipeline — spectrally-weighted modulation in representation space to suppress the target and reinforce neighbors, followed by attention-based localization of residual activations, and finally hard scrubbing for complete removal.
Core Idea: Explicitly modeling and protecting the "concept neighborhood" structure to enable precise rather than indiscriminate concept removal.

Method¶

Overall Architecture¶

NLCE intervenes in the cross-attention layers of the UNet at inference time across three stages, without modifying model weights (training-free). Stage 1 operates in the embedding space (modifying Key/Value projection matrices), while Stages 2 and 3 operate in the feature space at each denoising step.

Key Designs¶

Stage 1: Representation Space Modulation (Spectrally-Weighted Suppression + Neighbor Enhancement)
Function: Attenuate the target concept's semantics at the embedding level while recovering the representations of neighboring concepts.
Mechanism: SVD is applied to the target concept embedding to obtain an orthonormal basis \(U_{F_c}\), from which a spectrally-weighted projection \(P_{F_c} = U_{F_c}\Lambda_{F_c}U_{F_c}^T\) is constructed, with weights \(\lambda_i\) modulated by singular value importance (stronger suppression along more important directions). An analogous projection \(P_{\mathcal{N}_c}\) is constructed for neighbor concepts. The final operator \(P_c = (I - \beta P_{F_c}) + \gamma P_{\mathcal{N}_c} P_{F_c}\) is applied globally to \(W_K\) and \(W_V\). Neighbors are identified via Wikipedia retrieval, filtered by RoBERTa-based specificity scoring, and ranked by CLIP visual similarity.
Design Motivation: Unlike GLoCE's gated low-rank adapter — which may miss concept reactivation through indirect attention paths — global application is more reliable. Spectral weighting ensures suppression intensity is proportional to semantic importance.
Stage 2: Attention-Guided Spatial Gating
Function: Localize spatial regions where residual activations of the target concept persist.
Mechanism: Each denoising step involves two forward passes (a dry pass followed by a real pass). The first pass extracts attention maps from DownBlock-2. Tokens with high overlap with the target subspace (\(s_j = \|P_{F_c}x_j\|_2 > \delta_{\text{token}}\)) are flagged as "live tokens." Their attention maps are aggregated into a spatial gating map \(G_t(x,y)\). In the second forward pass, attention from live tokens is suppressed within gated regions: \(A^\ell(x,y,j) \leftarrow (1-G_t)\cdot A^\ell(x,y,j)\).
Design Motivation: Since Stage 1 is a global operation and may leave residuals, Stage 2 uses spatial attention to precisely localize where the target concept remains active.
Stage 3: Gated Feature Hard Scrubbing
Function: Completely eliminate residual target signals within gated spatial regions.
Mechanism: The gating map from Stage 2 is upsampled to each UNet layer's resolution and binarized (threshold \(\delta_{\text{scrub}}\)). Hidden features at positions where the mask equals 1 are set to zero: \(h_t^\ell(x,y) \leftarrow \mathbf{0}\). This constitutes an irreversible hard erasure.
Design Motivation: Projection-based suppression can theoretically be recovered; hard zeroing guarantees strict safety.

Loss & Training¶

No training is involved. All operations are performed at inference time. Key hyperparameters: - \(\beta, \gamma \in [0,1]\): control the strength of target suppression and neighbor enhancement. - \(\delta_{\text{token}}\): threshold for live token detection. - \(\delta_{\text{scrub}}\): threshold for hard-scrubbing gating. - In multi-concept scenarios, activated concept operators are detected per prompt and composed as \(P_{\text{multi}} = \prod_{c\in\mathcal{A}} P_c\).

Key Experimental Results¶

Main Results¶

Oxford Flowers / Stanford Dogs Fine-Grained Erasure:

Method	Alpine Sea Holly Acc_t↓/Acc_r↑/Ho↑	Bluetick Acc_t↓/Acc_r↑/Ho↑
GLoCE	32.0/78.91/73.05	28.0/73.59/72.79
RECE	0.0/64.85/78.68	0.0/73.33/84.62
NLCE	0.0/82.06/90.15	0.0/75.91/86.31

Celebrity Identity Erasure:

Method	Anna Kendrick Acc_t↓/Ho↑	Elon Musk Acc_t↓/Ho↑
SLD	0.0/96.55	3.33/94.28
GLoCE	1.33/96.63	0.67/97.29
NLCE	0.0/96.91	0.0/96.55

Ablation Study¶

Progressive effect of adding each stage (derived from Figure 9 of the paper):

Configuration	Effect
Stage 1 only	Basic erasure with possible residuals
Stage 1 + 2	More thorough erasure with spatial precision
Stage 1 + 2 + 3	Complete erasure with no residuals

The degree of reliance on all three stages varies by dataset: Stage 1 alone suffices for simple scenarios, while the full pipeline is required for complex cases.

Key Findings¶

NLCE achieves the highest Acc_r and Harmonic score (Ho) across all fine-grained datasets, demonstrating superior neighbor preservation.
GLoCE still exhibits relatively high Acc_t (e.g., 32%), indicating insufficient erasure from lightweight editing; NLCE reduces this to nearly 0% in almost all cases.
On I2P sensitive content erasure, NLCE detects the least nudity while maintaining a relatively high CLIP Score of 29.70.
Under simultaneous multi-concept erasure (10 breeds), NLCE maintains high Acc_r, whereas methods such as MACE, UCE, and RECE suffer severe collapse in retention accuracy.
NLCE consistently achieves the lowest KID values, indicating the best preservation of visual quality.

Highlights & Insights¶

The identification and formalization of the "Neighbor Gap" problem clearly explains why existing methods fail in fine-grained settings. This insight has broad implications for the concept erasure field.
The three-stage progressive erasure design transforms concept erasure from a blunt operation into a precise surgical procedure: weakening in representation space, localizing in attention space, and purging in feature space — with a well-defined objective at each stage.
The neighbor mining pipeline (Wikipedia retrieval → RoBERTa specificity filtering → CLIP visual ranking) provides a practical method for constructing semantic neighborhoods, reusable in other tasks that require concept boundary delineation.

Limitations & Future Work¶

Two forward passes per denoising step (dry pass + real pass) double inference time.
Neighbor mining relies on external resources (Wikipedia, RoBERTa), which may fail to retrieve appropriate neighbors for rare concepts.
Hard scrubbing (zeroing) may introduce localized visual artifacts in certain cases.
\(\beta\) and \(\gamma\) require manual tuning depending on the desired erasure strength, with optimal values varying across scenarios.

vs. GLoCE: Both are localized erasure methods, but GLoCE uses a gated low-rank adapter without explicitly protecting neighbors. NLCE significantly outperforms GLoCE in neighbor retention (Acc_r gap of 3–7%) while achieving lower Acc_t.
vs. RECE: Achieves thorough erasure but suffers from severe neighbor forgetting (Acc_r frequently 10–15% lower than NLCE) due to the absence of a neighbor protection mechanism.
vs. AdaVD: Employs spectral suppression but lacks spatial localization and neighbor enhancement, making it less robust in multi-concept scenarios.

Rating¶

Novelty: ⭐⭐⭐⭐ — Neighbor-aware concept erasure is a well-motivated problem formulation, and the three-stage design is principled; however, the individual techniques (SVD projection, attention gating) are not novel in themselves.
Experimental Thoroughness: ⭐⭐⭐⭐⭐ — Covers four distinct settings (fine-grained categories, celebrity identity, sensitive content, artistic style), includes multi-concept extension, and provides complete ablations.
Writing Quality: ⭐⭐⭐⭐ — Problem motivation is clearly presented, but the three-stage description is notation-heavy; the algorithmic flow could be conveyed more intuitively.
Value: ⭐⭐⭐⭐ — Directly relevant to the safe deployment of T2I models, particularly for fine-grained concept control applications.