T2SMark: Balancing Robustness and Diversity in Noise-as-Watermark for Diffusion Models¶

Conference: NeurIPS 2025 arXiv: 2510.22366 Code: GitHub Area: Diffusion Models / Image Watermarking Keywords: Image Watermarking, Diffusion Models, Noise-as-Watermark, Tail-Truncated Sampling, Robustness-Diversity Trade-off

TL;DR¶

This paper proposes T2SMark, a two-stage watermarking scheme for diffusion models based on Tail-Truncated Sampling (TTS). By embedding watermark bits in the tail regions of the Gaussian distribution while sampling randomly from the central region, T2SMark is the first method to achieve an optimal balance between watermark robustness and generation diversity.

Background & Motivation¶

Background: High-fidelity images generated by diffusion models have created an urgent need for copyright protection and provenance tracing of AI-generated content. Noise-as-Watermark (NaW) is a promising family of watermarking techniques that encodes watermark information into a specific standard Gaussian noise vector used as the initial generation noise; extraction recovers the initial noise via diffusion inversion to decode the watermark.
Limitations of Prior Work: Existing NaW methods face a fundamental robustness–diversity trade-off:
Gaussian Shading (GS): Achieves high robustness via simple repetition codes, but each user employs a fixed codeword, severely limiting generation diversity (LPIPS only 0.6446).
PRC-Watermark (PRCW): Maintains diversity (LPIPS 0.7074) using pseudorandom error-correcting codes, but exhibits extremely weak robustness, with TPR dropping to only 29.4% under adversarial conditions.
Key Challenge: Gaussian samples near the origin are highly susceptible to sign flips under noise perturbation, leading to bit errors. Rather than naively mapping bits to positive/negative signs, embedding information exclusively in the more reliable tail regions is preferable.
Goal: To simultaneously achieve high watermark robustness and high generation diversity within the NaW paradigm, without requiring any additional training.

Method¶

Overall Architecture¶

T2SMark introduces two core innovations: 1. Tail-Truncated Sampling (TTS): Partitions the Gaussian distribution into three regions — bit-0 zone, bit-1 zone, and an uncertain zone (central region) — embedding watermarks only in the tails. 2. Two-Stage Architecture: The first stage encodes a random session key with a fixed master key; the second stage encodes the actual watermark bits with the session key, introducing controllable randomness.

Key Designs¶

Tail-Truncated Sampling (TTS): Given noise dimension $n$, truncation threshold $\tau$, and watermark length $m$, the core operations of TTS are:
- Compute the number of tail-sampled dimensions $k = 2\Phi(-\tau)n$ and per-bit subspace dimension $r = \lfloor k/m \rfloor$.
- Use key $K$ to generate $m$ pseudorandom orthogonal support vectors $\{v_j\}_{j=1}^m$ as normal vectors for bit-encoding hyperplanes.
- Watermark subspace ($w_i=1$): sampled from the tail $\mathcal{TN}(0,1; (-\infty,-\tau]\cup[\tau,\infty))$.
- Random subspace ($w_i=0$): sampled from the center $\mathcal{TN}(0,1; [-\tau,\tau])$.
- Final watermarked noise: $z^w = w \odot |z| \odot \sum_j b_j v_j + (1-w) \odot z$.

Design Motivation: Tail samples are farther from the decision boundary and have a lower sign-flip probability under AWGN perturbation — effectively trading fewer encoding dimensions for higher signal-to-noise ratio.

Projection-Based Decoding: During extraction, the same key reconstructs the set of normal vectors; bits are recovered via dot-product projection: $$p_j = \langle \hat{z}^w, v_j \rangle, \quad \hat{b} = \text{sign}(p)$$ The detection statistic uses the $L_1$ norm $l = \|p_k\|_1$. TTS yields larger projection magnitudes, improving detection reliability.
Two-Stage Watermark Structure: The $n$-dimensional noise is split into two segments of lengths $n_k$ and $n_b$. The first segment encodes a random session key $K_r$ using the master key $K$; the second segment encodes the actual watermark bits $b$ using $K_r$. Since a different random $K_r$ is used for each generation, the entire noise vector remains stochastic, ensuring generation diversity. Detection relies on the first segment (avoiding error propagation from the second stage), while attribution requires full two-stage decoding.

Loss & Training¶

T2SMark is a training-free watermarking scheme requiring no additional training. Key hyperparameters: - Truncation threshold $\tau = 0.674$ (corresponding to the 25th percentile of the Gaussian distribution) - Session key: 16 bits - Watermark capacity: 256 bits - DDIM inversion steps: 10

Key Experimental Results¶

Main Results¶

Comprehensive Evaluation on SD v2.1

Method	TPR (Clean/Adv)	Bit Acc (Clean/Adv)	Diversity LPIPS↑	CLIP Score	FID
GS	1.000/0.998	1.000/0.9548	0.6446	0.3242	58.14
PRCW	1.000/0.294	0.6494/0.5024	0.7074	0.3218	56.90
TRW	1.000/0.907	—	0.6943	0.3210	58.27
T2SMark	1.000/0.998	1.000/0.9754	0.7069	0.3227	56.93

Generalization on SD v3.5M (DiT backbone)

Method	TPR (Clean/Adv)	Bit Acc (Clean/Adv)	Det. Acc↓	Diversity↑
GS	1.000/0.990	0.9994/0.9663	0.991	0.5176
PRCW	0.998/0.279	0.9920/0.6067	0.516	0.6096
T2SMark	1.000/0.985	1.000/0.9768	0.518	0.6102

Ablation Study¶

Configuration	TPR (Clean/Adv)	Bit Acc (Clean/Adv)	Diversity↑
w/o TTS	1.000/0.996	0.9988/0.9307	0.6743
w/ TTS	1.000/0.998	1.000/0.9754	0.6746
Capacity = 256 bit	1.000/0.9754	—	—
Capacity = 512 bit	1.000/0.9437	—	—
Capacity = 1024 bit	0.9968/0.8789	—	—
Session key = 16 bit	1.000/0.998	1.000/0.9754	Best balance
Session key = 32 bit	1.000/0.991	1.000/0.9481	Still acceptable

Key Findings¶

TTS yields a substantial robustness improvement (adversarial Bit Acc from 93.07% to 97.54%) with negligible impact on diversity (difference < 0.001).
T2SMark is the only method, alongside PRCW, to satisfy the "no-degradation criterion" on image quality (t-test on CLIP score and FID both non-significant).
In terms of undetectability, T2SMark's Det. Acc (0.578) is close to chance (0.5), far superior to GS (0.994).
Varying inversion steps from 5 to 100 has minimal impact on performance, enabling faster extraction via fewer steps.

Highlights & Insights¶

The three-region partitioning design is intuitively compelling: the tails of the Gaussian distribution naturally offer higher SNR and are ideal locations for information embedding.
The two-stage key hierarchy is elegantly constructed: the session key simultaneously serves as the payload of the first stage and the encryption key of the second stage.
Multi-dimensional projection rather than element-wise sign decisions fully exploits the continuous structural information of Gaussian vectors.
The evaluation methodology is rigorous: t-tests rather than simple comparisons are used to assess visual quality degradation — a statistical practice rarely seen in comparable work.

Limitations & Future Work¶

The method is vulnerable to Gaussian noise attacks ($\sigma=0.1$ causes severe degradation), a common weakness shared by NaW-class methods.
It relies on ODE-invertible samplers (DDIM) and is inapplicable to diffusion models that do not support inversion.
No mechanism is provided to resist geometric transformations (rotation, perspective distortion).
The distributional anomaly introduced by tail sampling may be detectable by a purpose-trained classifier.
Potential conflicts with conditioned generation methods (e.g., ControlNet) are not addressed.

T2SMark directly contrasts with Gaussian Shading and PRC-Watermark, achieving a Pareto-optimal balance between the two.
Tree-Ring embeds robust patterns in the frequency domain but introduces distributional bias; T2SMark preserves distributional consistency.
The TTS idea may inspire generalizations to other information-hiding scenarios based on Gaussian sampling.

Rating¶

Novelty: ⭐⭐⭐⭐ — Tail-Truncated Sampling is a novel and theoretically motivated idea; the two-stage key architecture is elegantly designed.
Experimental Thoroughness: ⭐⭐⭐⭐⭐ — Covers two backbone architectures (UNet and DiT), 9 attack types, and multiple hyperparameter ablations; evaluation is comprehensive and systematic.
Writing Quality: ⭐⭐⭐⭐ — Method presentation is clear, mathematical derivations are rigorous, and figures are intuitive.
Value: ⭐⭐⭐⭐ — Provides a practical and theoretically sound solution for diffusion model watermarking, though applicability is confined to the NaW paradigm.