StableGuard: Towards Unified Copyright Protection and Tamper Localization in Latent Diffusion Models¶

Conference: NeurIPS 2025 arXiv: 2509.17993 Code: GitHub Area: Image Generation Keywords: Latent Diffusion Models, Watermark Embedding, Tamper Localization, Mixture of Experts, Self-Supervised Learning

TL;DR¶

StableGuard embeds global binary watermarks into the LDM generation pipeline (via MPW-VAE) and leverages changes in watermark perturbation patterns for tamper localization (via MoE-GFN), achieving the first end-to-end unified framework for copyright protection and tamper detection.

Background & Motivation¶

Images generated by latent diffusion models (LDMs) are increasingly photorealistic, giving rise to two major security requirements: (1) copyright protection—verifying whether an image was produced by a specific model; and (2) tamper localization—detecting and localizing regions that have been maliciously edited. Existing approaches suffer from the following limitations:

Traditional watermarking methods (HiDDeN, SepMark, etc.) operate in a post-hoc manner, embedding watermarks after image generation, which incurs additional computational overhead and degrades image quality. Diffusion-native watermarking methods (Stable Signature, WOUAF, WaDiff) integrate watermarks into the generation process but do not support tamper localization. Unified methods (EditGuard, OmniGuard, WAM) attempt to address both problems simultaneously but remain post-hoc, with generation and forensics optimized separately, precluding mutual enhancement.

The authors identify two core insights: (1) globally distributed watermarks are inherently robust to local tampering due to spatial redundancy; and (2) the subtle perturbation patterns introduced by watermarking are absent in tampered regions, and this absence serves as a reliable cue for tamper localization. These two complementary properties make global watermarks an ideal foundation for simultaneously addressing copyright verification and fine-grained tamper analysis.

Method¶

Overall Architecture¶

StableGuard consists of two core components: (1) MPW-VAE, which embeds watermarks within the VAE decoder of the LDM; and (2) MoE-GFN, which performs forensic analysis by exploiting watermark perturbation patterns. Both components are jointly trained end-to-end, enabling mutual enhancement between embedding quality and forensic accuracy.

Key Designs¶

Multi-Purpose Watermarking VAE (MPW-VAE): Lightweight watermark adapters are inserted after each block of the pre-trained VAE decoder. Each adapter encodes watermark bits via two fully connected layers, reshapes the output, and concatenates it with decoder features, followed by two convolutional layers and a residual connection for fusion. The key design is a switchable mechanism: the watermark adapter can be toggled on or off, generating visually indistinguishable watermarked image \(Y\) and non-watermarked image \(\hat{X}\) from the same latent code. These paired images are fused using a random binary mask \(M\), where either the real image \(X\) or the VAE-reconstructed image \(\hat{X}\) is used as the spliced region with 50% probability, simulating both manual editing and AI-generated tampering scenarios. The masking strategy combines random binary masks and SAM-generated semantic masks to increase diversity.
Mixture-of-Experts Guided Forensic Network (MoE-GFN): Built upon a UNet architecture, MoFE (Mixture of Forensic Experts) modules are inserted in the decoder, comprising three specialized expert branches:
- Watermark Extraction Expert (WEE): Transformer-based, capturing global long-range dependencies to recover the global watermark pattern even when local regions are tampered.
- Tamper Localization Expert (TLE): Employs a sub-patch Transformer that partitions feature maps into \(n \times n\) patches for local attention, detecting fine-grained manipulation traces.
- Boundary Enhancement Expert (BEE): Performs Transformer operations in the Fourier domain, capturing high-frequency anomalies at boundary regions via FFT.

The outputs of the three experts are adaptively fused by a Dynamic Soft Router (DSR): \(x_{\text{unified}} = \sum_{n=1}^{3} R_n \odot \text{Expert}_n(x)\), where \(R = \text{Softmax}(f(x))\) adaptively adjusts the fusion weights based on the input.

Self-Supervised End-to-End Training: No manual tamper annotations are required during training. The switchable mechanism of MPW-VAE automatically generates paired training data, and MoE-GFN learns to distinguish watermarked from non-watermarked regions. The pre-trained VAE decoder parameters are frozen; only the watermark adapters and MoE-GFN are trained, preserving image generation quality.

Loss & Training¶

The total loss comprises three components:

\[\mathcal{L}_{\text{total}} = \mathcal{L}_{\text{sim}} + \mathcal{L}_{\text{wm}} + \mathcal{L}_{\text{tamper}}\]

Similarity loss \(\mathcal{L}_{\text{sim}} = \|{\hat{X} - Y}\|_1 + \text{PS}(\hat{X}, Y)\): L1 distance plus perceptual similarity, maintaining visual consistency between the watermarked image and the original.
Watermark loss \(\mathcal{L}_{\text{wm}}\): Binary cross-entropy supervising accurate extraction of watermark bits.
Tamper loss \(\mathcal{L}_{\text{tamper}} = \lambda_0 \mathcal{L}_{\text{wbce}} + (1-\lambda_0) \mathcal{L}_{\text{dice}}\): Weighted binary cross-entropy plus Dice loss; the former addresses foreground/background imbalance (\(\lambda_1=2, \lambda_2=0.5\)), and the latter optimizes region overlap.

The framework is built on SD 2.1, trained with the Adam optimizer at a learning rate of \(1 \times 10^{-4}\), on 2× RTX 4090D GPUs, with batch size 16 for 10 epochs.

Key Experimental Results¶

Main Results¶

Watermark Extraction Performance (COCO + T2I datasets)

Method	Bit Length	PSNR↑	SSIM↑	FID↓	Bit Acc↑
HiDDeN	32	31.95	0.879	20.0	98.80
EditGuard	64	32.75	0.937	20.0	99.77
WAM	32	38.20	0.951	19.9	98.17
OmniGuard	100	37.54	0.950	20.1	98.11
StableGuard	32	40.50	0.970	19.5	99.97

PSNR improves by more than 2.3 dB, and Bit Acc reaches 99.97%, surpassing all baselines across the board.

AIGC Tamper Localization (SD Inpainting / SDXL / Kandinsky / ControlNet / LAMA)

Method	F1↑	AUC↑	IoU↑
MVSS-Net	0.862	0.934	0.791
EditGuard	0.937	0.977	0.911
WAM	0.924	0.977	0.868
StableGuard	0.980	0.993	0.962

Ablation Study¶

Configuration	F1↑	AUC↑	Bit Acc↑	Note
Full model (Dec position)	0.980	0.992	99.98	Default
w/o MPW-VAE (replaced by WOUAF)	0.811	0.796	99.13	Diffusion-native watermarking is critical
w/o entire MoFE module	0.931	0.920	95.12	Substantial contribution of expert modules
w/o Watermark Extraction Expert	0.969	0.958	98.69	Global perception degraded
w/o Tamper Localization Expert	0.952	0.940	98.90	Local detection degraded
w/o Boundary Enhancement Expert	0.962	0.950	99.11	Boundary precision reduced
w/o Dynamic Soft Router	0.966	0.955	98.97	Simple summation inferior to adaptive fusion
w/o Joint Optimization	0.921	0.919	99.14	Decoupled training causes large performance drop

Robustness (Gaussian noise σ=5 / JPEG Q=70)

Method	Noise σ=5 Bit/F1	JPEG Q=70 Bit/F1
EditGuard	98.11/0.866	96.77/0.577
StableGuard	99.69/0.928	99.73/0.908

Key Findings¶

The diffusion-native design of MPW-VAE is the largest single contributor to performance gains—it not only generates higher-fidelity paired images, but joint optimization also yields synergistic benefits.
Removing the three experts individually results in limited degradation (as the model approaches its performance ceiling), but removing the entire MoFE module causes F1 to drop sharply from 0.980 to 0.931, demonstrating that the experts are complementary rather than redundant.
Under JPEG compression at Q=70, StableGuard maintains Bit Acc at 99.73%, whereas EditGuard drops to 96.77%.
Performance degrades gracefully as the watermark bit length scales from 32 to 256 (Bit Acc: 99.97→99.83).

Highlights & Insights¶

Self-supervised closed loop: The switchable watermarking mechanism eliminates the need for annotated tamper data while simultaneously simulating both manual editing and AI-generated tampering scenarios.
Elegant frequency-domain expert design: A Transformer operating in the FFT domain captures high-frequency anomalies at tamper boundaries, complementing the spatial-domain experts.
High practical value: StableGuard integrates directly into the SD generation pipeline with no additional processing steps and minimal impact on generation quality (FID of only 19.5).

Limitations & Future Work¶

Validation is currently limited to SD 2.1; adaptability to newer architectures such as SDXL and SD3 has not been tested.
The additional parameters and inference overhead introduced by the watermark adapters are not discussed in detail.
The framework is not applicable to images generated by non-LDM models (e.g., GANs).
Watermark security (e.g., attacks specifically targeting the watermark) has not been explored.

The paradigm of global watermarking combined with absence-based detection could be extended to frame-level tamper detection in video diffusion models.
The application of MoE in forensic tasks warrants further investigation, as different experts may correspond to different types of tampering.
The self-supervised paired data generation strategy may inspire other visual forensics tasks that require paired supervision.

Rating¶

Novelty: ⭐⭐⭐⭐ The end-to-end unified framework is a novel contribution; the switchable watermarking mechanism and MoFE module are elegantly designed.
Experimental Thoroughness: ⭐⭐⭐⭐⭐ Covers 5 AIGC tampering methods, multiple image degradations, and comprehensive ablation studies.
Writing Quality: ⭐⭐⭐⭐ Well-structured, though certain sections are notation-heavy.
Value: ⭐⭐⭐⭐⭐ Addresses practical pain points in diffusion model content security with substantial performance gains and strong practical utility.