Attention to Neural Plagiarism: Diffusion Models Can Plagiarize Your Copyrighted Images!¶
Conference: ICCV 2025
arXiv: 2603.00150
Code: https://github.com/zzzucf/Neural-Plagiarism
Area: Image Generation
Keywords: Neural Plagiarism, Diffusion Models, Copyright Protection, Watermark Removal, Attention Perturbation
TL;DR¶
This paper exposes the threat of "neural plagiarism"—diffusion models can readily replicate copyright-protected images (including watermarked ones). It proposes a universal attack framework based on "anchors and shims," searching for perturbations in the cross-attention mechanism to achieve coarse-to-fine semantic modification, bypassing copyright protections ranging from visible trademarks to invisible watermarks.
Background & Motivation¶
The growing generative capability of diffusion models has raised serious concerns about data copyright infringement. This threat, termed "neural plagiarism," manifests in two forms:
Forgery Attack: Generating visually similar copies while removing watermarks, causing copyright verification to fail: \(\mathcal{V}(x^*) \neq w\)
Ambiguity Attack: Replacing watermarks with new ones to fabricate ownership disputes: \(\mathcal{V}(x^*) = w^*\)
Existing protective measures include legal frameworks (GDPR, copyright law) and technical methods (visible/invisible watermarks). However, existing watermark removal methods (e.g., Regen, Rinse) rely on simple noise-then-denoise processes, yielding limited effectiveness and often introducing visible artifacts.
Directly optimizing the objective \(\min_{x_T^*} d_{visual}(x_0^w, x_0^*) - \gamma d_{latent}(x_T, x_T^*)\) faces three challenges:
Memory Explosion: Computing gradients \(\frac{\partial x_{t-1}^*}{\partial x_t^*}\) at each timestep requires >100 GB VRAM for 10 steps
Over-smoothing: Skip-step gradient estimation causes image blurring
Noisy Output: High perturbations produce noise rather than meaningful semantic changes
Core Idea: Anchors (inverted latent sequences) maintain the trajectory, while shims (learned perturbations) fine-tune semantics at specific timesteps—analogous to using shims when installing a door to adjust local spacing while keeping the frame aligned.
Method¶
Overall Architecture¶
The attack pipeline consists of two stages: 1. Anchor Acquisition: The copyrighted image is encoded via VAE into \(\hat{x}_0\); the DPM solver inverts the generative process to obtain the anchor sequence \(\{\hat{x}_1, ..., \hat{x}_T\}\) 2. Shim Search: Starting from a selected timestep \(K\), reverse sampling is performed, optimizing shims \(\delta_t\) (injected into text embeddings of cross-attention) at selected timesteps \(\mathcal{T}_{select}\), so that the generated image is visually similar to the original while its latent deviates from the anchors
Key Designs¶
-
Anchors and Shims Optimization:
- Function: Decouples the long-chain dependencies of latents, enabling independent per-timestep perturbation optimization
- Mechanism: Saves the complete inverted latent trajectory as anchors \(\{\hat{x}_t\}\); defines shims \(\delta_t\) as search variables: \(x_t^* = \Delta_{\delta_t \in \mathcal{S}}(x_t, \delta_t)\)
- Norm Constraint: \(\mathcal{L}_{norm}(t) = \max(0, \hat{\varepsilon}_t - \|\delta_t\|)\), ensuring shims are large enough to deviate from anchors (thereby breaking watermarks)
- Design Motivation: Decouples timestep chains by saving anchors; each step requires only single-step backpropagation memory, resolving the >100 GB VRAM issue
-
Semantic Search via Attention Perturbation:
- Function: Enables controllable semantic modification search in the semantic space
- Mechanism: Rather than directly perturbing latents, shims \(\delta_t\) are injected into the text embeddings within cross-attention—applied to the null-string embedding \(\mathbf{e}_\emptyset\)
- Semantic Preservation Loss: \(\mathcal{L}_{semantic}(t) = -\frac{\mathbf{e}_\emptyset \cdot (\mathbf{e}_\emptyset + \delta_t)}{\|\mathbf{e}_\emptyset\| \|\mathbf{e}_\emptyset + \delta_t\|}\) (maximizing cosine similarity)
- Alignment Loss: \(\mathcal{L}_{align}(t) = d(x_{t-1}, \hat{x}_{t-1})\), ensuring perturbed outputs remain close to anchors
- Key Insight: Shims at different timesteps control semantic changes at different granularities—large timesteps alter global semantics (e.g., color, shape), while small timesteps modify fine-grained texture and details
- Design Motivation: Cross-attention is the core semantic control mechanism in diffusion models; precise semantic manipulation is achieved by perturbing the inputs to \(K\) and \(V\)
-
Iterative Search Process:
- Function: Jointly optimizes shims over selected timesteps
- Joint Objective: \(\min_{\delta_t} \mathcal{L}_{norm}(t) + \gamma_1 \mathcal{L}_{semantic}(t) + \gamma_2 \mathcal{L}_{align}(t)\)
- Hyperparameters: \(\gamma_1 = 10^5, \gamma_2 = 0.1, \hat{\varepsilon}_t = 10\)
- Adam optimizer (learning rate 0.01), weight decay \(10^{-3}\), gradient clipping (max norm 1.0)
- DPM solver with 50 timesteps
- Two initialization modes:
- Late-start + Noise Initialization (\(K=140\), shims at steps 100 and 60): Small perturbations, suitable for invisible watermark removal
- Early-start + Inversion Initialization (\(K=1000\), shims at steps 600 and 200): Large perturbations for substantial semantic modification
Attack Algorithm (Algorithm 1)¶
Input: Copyrighted image x^w
1. VAE encoding → x̂_0
2. DPM Solver inversion → anchor sequence {x̂_1,...,x̂_T}
3. Initialize x*_K = x̂_K or noisy latent
4. For t = K to 1:
If t ∈ T_select: # Timesteps requiring perturbation
Initialize δ_t = 0
While not converged:
x*_{t-1} = x*_t - ζ_t·ε_θ(x*_t, t, e_∅ + δ_t)
δ_t -= η·∇L(t, δ_t, x*_{t-1}, x̂_{t-1})
Else: # Normal denoising
x*_{t-1} = x*_t - ζ_t·ε_θ(x*_t, t, e_∅)
5. Output x* = D(x*_0)
Key Experimental Results¶
Main Results (Invisible Watermark Removal, MS-COCO)¶
| Method | DwtDctSvd BA↓ | DwtDctSvd ACC↓ | RivaGAN BA↓ | RivaGAN ACC↓ | PSNR↑ | SSIM↑ |
|---|---|---|---|---|---|---|
| Regen | 0.64 | 0.15 | 0.60 | 0.05 | 26.21 | 0.75 |
| Rinse | 0.54 | 0.04 | 0.52 | 0.01 | 23.68 | 0.68 |
| Ours (Late) | 0.52 | 0.01 | 0.56 | 0.02 | 25.27 | 0.73 |
| Ours (Early) | 0.52 | 0.03 | 0.52 | 0.00 | 21.16 | 0.70 |
| Method | Stable Signature T@1%F↓ | Tree-Ring T@1%F↓ |
|---|---|---|
| Regen | 0.00 | 1.00 |
| Ours | 0.00 | 0.98 |
Ablation Study (Degree of Semantic Modification vs. Timestep Selection)¶
| Initialization Mode | Shim Timesteps | Effect | Applicable Scenario |
|---|---|---|---|
| Late-start (K=140) + Noise | 100, 60 | Minor semantic change, preserves image quality | Invisible watermark removal |
| Early-start (K=1000) + Inversion | 600, 200 | Major semantic change (hairstyle, clothing, etc.) | Visible copyright bypass |
| — | Large timestep (single step) | Alters global semantics: color, shape | IP character modification |
| — | Small timestep (single step) | Alters local semantics: texture, details | Fine-grained forgery |
Key Findings¶
- Bit accuracy drops to near 50% (random chance), indicating effective watermark removal
- Compared to Regen/Rinse, the proposed method removes watermarks more thoroughly while better preserving image quality
- Effective against Stable Signature (T@1%F reduced to 0), but fails to remove Tree-Ring watermarks—because Tree-Ring embeds local patterns in Fourier space, which are immune to global latent shifts
- Ambiguity attack succeeds: new watermarks can be embedded into already-attacked images, with both watermarks simultaneously detectable (Tree-Ring T@1%F: 1.00/0.99 for both parties)
- Among 100 generated Elon Musk variants, GPT arbitration could not definitively identify any as the real person
- Visible copyright attack: successfully modifies Disney Elsa's iconic hairstyle and clothing, Monet's painting style, and Van Gogh's signature
Highlights & Insights¶
- The "anchors and shims" analogy is intuitive, and the framework design is elegant—simultaneously resolving memory issues and providing multi-granularity semantic control
- Attention perturbation is a low-cost and precise semantic manipulation approach that modifies only text embeddings rather than the entire network
- The method is purely gradient-based search requiring no additional training or fine-tuning, and is applicable to any pretrained diffusion model
- The work clearly exposes the fragility of existing copyright protections, providing an important attack baseline for defensive research
- Timestep selection enables coarse-to-fine semantic controllability
Limitations & Future Work¶
- Cannot remove Tree-Ring watermarks (local Fourier-domain patterns are immune to global latent shifts)
- The early-start mode yields higher FID; excessive semantic changes degrade image quality
- Enhancements such as ControlNet, additional image encoders, and negative prompts are not incorporated (explicitly noted as future work in the paper)
- From a defensive perspective, developing watermarks robust to attention-based perturbation search is an urgent research direction
- Evaluation is limited to Stable Diffusion; effectiveness on other diffusion architectures (e.g., DiT) remains unknown
Related Work & Insights¶
- vs. Regen: Regen's simple noise-then-denoise strategy is insufficient for thorough watermark removal and introduces visible noise; the proposed method achieves precise semantic modification through shim optimization
- vs. Rinse: Rinse iterates the Regen process multiple times, improving watermark removal at the cost of severe image quality degradation (FID 82–87)
- vs. Tree-Ring Watermark: Tree-Ring's strategy of embedding watermarks in Fourier space proves naturally robust to latent perturbations—a valuable insight for defensive methods
- vs. Stable Signature: Despite being a state-of-the-art watermarking method, Stable Signature embeds watermarks in the VAE decoder, making any latent deviation sufficient to bypass it
Rating¶
- Novelty: ⭐⭐⭐⭐ The "anchors and shims" framework is elegantly designed; the attention perturbation search perspective is novel
- Experimental Thoroughness: ⭐⭐⭐⭐ Covers multiple watermarking methods, visible/invisible copyright, forgery/ambiguity attacks, and real-world copyrighted images
- Writing Quality: ⭐⭐⭐⭐ Problem formalization is clear and attack type definitions are rigorous, though notation and terminology are occasionally dense
- Value: ⭐⭐⭐⭐⭐ Exposes an urgent AI security threat, serving as both a warning to the copyright protection community and a baseline for defensive research