Editing Away the Evidence: Diffusion-Based Image Manipulation and the Failure Modes of Robust Watermarking¶

Conference: CVPR 2026 arXiv: 2603.12949 Code: None Area: AI Security / Digital Watermarking / Generative Models Keywords: robust watermarking, diffusion editing, watermark degradation, SNR attenuation, content provenance

TL;DR¶

This paper systematically analyzes, from both theoretical (SNR attenuation, mutual information lower bounds, denoising contraction) and empirical perspectives, how non-adversarial diffusion editing (instruction-based, drag-based, and composition-based) inadvertently destroys robust invisible watermarks, revealing that traditional post-processing robustness does not generalize to generative transformations.

Background & Motivation¶

Background: Robust invisible watermarking constitutes a core infrastructure for copyright protection and content provenance. Deep learning watermarking systems (StegaStamp, TrustMark, VINE) achieve 99%+ bit accuracy under conventional post-processing through end-to-end training with differentiable noise layers (JPEG, scaling, cropping).

Limitations of Prior Work: Diffusion-based editing methods (InstructPix2Pix, DragDiffusion, TF-ICON, etc.) introduce fundamentally different image transformations—injecting substantial Gaussian noise and then progressively denoising via powerful generative priors. Watermarks, being low-amplitude structured perturbations, are treated by the denoiser as "unnatural residuals" and removed, even when the user has no intention of erasing them.

Key Challenge: Watermarking requires signals to persist in the pixel or frequency domain, yet the core mechanism of diffusion denoising is precisely to contract perturbations that deviate from the natural image manifold—a fundamental information-theoretic conflict.

Goal: Under what conditions does non-adversarial diffusion editing inadvertently destroy robust watermark recovery, and what are the underlying theoretical mechanisms?

Key Insight: Diffusion editing is modeled as a Markov kernel, and a closed-loop theoretical explanation is established across four levels—SNR attenuation → mutual information lower bound → Fano's inequality → denoising contraction—complemented by a standardized DEW-ST evaluation protocol.

Core Idea: Diffusion editing acts as an information bottleneck: it exponentially attenuates watermark SNR during forward noising and contracts watermark residuals deviating from the manifold during reverse denoising, rendering recovery information-theoretically impossible.

Method¶

Overall Architecture¶

This work is a theoretical analysis combined with empirical evaluation; it does not propose a new watermarking scheme. The core pipeline is: (1) model diffusion editing as a Markov kernel \(K_\mathcal{T}(\tilde{x}|x_w, y)\); (2) derive SNR attenuation and mutual information upper bounds under an additive watermark signal model; (3) analyze the contraction effect of denoising steps; (4) design the DEW-ST standardized evaluation protocol to assess multiple editors and watermarking systems.

Key Designs¶

SNR Attenuation and Mutual Information Upper Bound Derivation
Discrete case: \(\text{SNR}_t = \gamma^2 \bar{\alpha}_t / (1-\bar{\alpha}_t)\), decaying exponentially with diffusion timestep \(t\)
Continuous SDE case: the watermark residual decays as \(\exp(-\frac{1}{2}\int \beta(u)du)\)
Mutual information upper bound (Theorem 6.1): \(I(M; X_{t^*}) \leq \frac{d}{2}\log(1 + \gamma^2\bar{\alpha}_{t^*}/(1-\bar{\alpha}_{t^*}))\)
Core conclusion: as noise injection strength \(t^*\) increases, mutual information approaches zero and any decoder must inevitably fail
Denoising Contraction Effect Analysis
Under local contraction assumptions, the denoising flow exponentially suppresses watermark residuals deviating from the natural image manifold at rate \(\rho^n\) (n-step composition)
Different editors (instruction/drag/composition) correspond to different conditioning parameters of the Markov kernel, but the contraction effect is universal
Explains why even mild local edits can destroy globally distributed watermarks
Frequency-Domain Analysis and DEW-ST Evaluation Protocol
Spectral retention rate \(\rho_\Omega\) is defined to quantify watermark energy preservation across frequency bands
High-frequency watermark energy is most severely suppressed by diffusion editing (\(\rho_{\text{high}}\) as low as 0.09–0.19)
The DEW-ST protocol standardizes the full pipeline: dataset, instruction set, edit strength, watermark embedding, and recovery metrics

Loss & Training¶

A conceptual template for diffusion-augmented watermark training is proposed: mixing multiple diffusion editors \(\{\mathcal{T}_j\}\) into the training noise layer and jointly optimizing \(\min_{E,D} \mathbb{E}[\ell_{\text{rec}}(D(\mathcal{T}_j(E(x,m))), m)] + \lambda \mathbb{E}[\ell_{\text{qual}}(E(x,m), x)]\). Experiments show that this strategy improves bit accuracy from 74% to 85.7% under mild editing, but recovery still degrades toward failure under strong editing, confirming the information-theoretic limits predicted by theory.

Key Experimental Results¶

Main Results¶

Transformation Type	Strength	StegaStamp BA	TrustMark BA	VINE BA
None (clean watermark)	-	99.4%	99.7%	99.8%
JPEG q50	-	96.1%	98.2%	98.9%
InstructPix2Pix	mild	86.7%	89.2%	93.5%
InstructPix2Pix	strong	53.2%	55.0%	60.7%
DragDiffusion	moderate	63.4%	67.9%	78.6%
TF-ICON composition	-	58.9%	63.2%	74.8%

Ablation Study¶

Configuration	BA	Notes
Diffusion-augmented training (mild edit)	85.7%	Effective improvement over baseline ~74%
Diffusion-augmented training (strong edit)	~55%	Still degrades to failure under strong editing
Multi-seed voting (3 seeds)	+0.5%	Degradation is systematic, not random
Diffusion-native watermark (same-model editing)	AUC 0.89–0.92	Acceptable
Diffusion-native watermark (cross-model editing)	AUC 0.58–0.65	Severe degradation
ECC decoding (strong editing)	<3% recovery rate	Errors are non-i.i.d.; ECC ineffective

Key Findings¶

Under strong diffusion editing, the bit accuracy of StegaStamp/TrustMark approaches random guessing (50%), indicating systematic watermark erasure
High-fidelity editing (high PSNR/SSIM) does not imply watermark preservation—low LPIPS can coexist with complete watermark erasure
High-frequency watermark energy is most aggressively suppressed by diffusion denoising, with \(\rho_{\text{high}}\) as low as 0.09
Experimental data are illustrative/hypothetical, but the magnitudes and trends are consistent with existing literature

Highlights & Insights¶

The theoretical analysis forms a complete closed loop from SNR → mutual information → Fano's inequality → denoising contraction, elegantly formalizing the intuition
This is the first systematic treatment of multiple diffusion editor paradigms as watermark stress tests, covering instruction-based, drag-based, and composition-based approaches
The counterintuitive finding that high-fidelity editing does not imply watermark safety is clearly demonstrated
The DEW-ST standardized evaluation protocol has potential to be adopted as a watermark security benchmark
Design guidelines are pragmatic: watermarks should pursue semantic invariance rather than pixel-level invariance

Limitations & Future Work¶

The experimental data are explicitly stated to be hypothetical rather than results of real experimental runs, which constitutes the most significant limitation
The theoretical analysis relies on an additive watermark model and idealized manifold contraction assumptions, which may diverge from practical nonlinear encoders
No concrete, deployable watermark defense scheme is provided; the contribution remains at the level of a conceptual template
Both editors and watermarking systems are rapidly evolving, limiting the longevity of any fixed benchmark
Video watermarking and multi-frame consistency scenarios are not addressed

vs. Zhao et al. (NeurIPS 2024): The latter focuses on provably removable watermarks under active regeneration attacks, while this paper addresses inadvertent destruction by non-adversarial editing—complementary perspectives
vs. VINE (ICLR 2025): VINE proposes W-Bench and diffusion-aware watermarking; this paper provides a more systematic theoretical analysis framework building on that foundation
vs. Tree-Ring/Stable Signature: Diffusion-native methods are equally vulnerable under cross-model editing (AUC 0.58–0.65)
The duality between watermark signals and diffusion denoising resembles information bottleneck theory, suggesting that future watermarks should be embedded within the generation pipeline or aligned to semantic space
Metadata schemes such as C2PA can complement watermarking to form a hybrid provenance system

Rating¶

Novelty: ⭐⭐⭐⭐ First systematic theoretical and empirical analysis of diffusion editing's impact on watermarking, with a closed-loop theoretical derivation
Experimental Thoroughness: ⭐⭐⭐ Theoretical analysis is excellent, but experimental data are hypothetical rather than real runs
Writing Quality: ⭐⭐⭐⭐ Well-structured paper with clear theoretical derivations and comprehensive related work coverage
Value: ⭐⭐⭐⭐ Provides important warnings and guidance for the watermarking community and the content provenance ecosystem