RingID: Rethinking Tree-Ring Watermarking for Enhanced Multi-Key Identification¶

Conference: ECCV 2024
arXiv: 2404.14055
Code: https://github.com/showlab/RingID
Area: AI Security
Keywords: Diffusion Model Watermarking, Tree-Ring Watermarking, Multi-Key Identification, Frequency Domain Watermarking, Distribution Shift

TL;DR¶

This paper deeply analyzes the source of robustness of the Tree-Ring watermarking method (discovering that distribution shift is an unexpected hidden helper in its verification task), reveals its severe limitations in multi-key identification tasks, and proposes RingID—a multi-channel heterogeneous watermarking framework. Through discretization, lossless embedding, and a more circular ring design, RingID improves the identification accuracy for 2048 keys from 0.07 to 0.82.

Background & Motivation¶

Background: With the widespread dissemination of high-quality images generated by diffusion models, watermarking technology has become a key means for copyright protection and source tracing. Tree-Ring watermarking is a method that embeds circular patterns into the frequency domain center of the initial noise in diffusion models, showing strong robustness against attacks like rotation and JPEG compression in the watermarking verification task (distinguishing between watermarked and unwatermarked images).

Limitations of Prior Work: Previous studies only evaluated Tree-Ring in verification scenarios, never examining its performance in multi-key identification (matching the correct key from multiple candidate keys) scenarios. However, multi-key identification is crucial for user tracking and copyright attribution.

Key Challenge: The authors discover for the first time that the robustness of Tree-Ring does not entirely stem from the ring pattern design, but rather a significant portion comes from the distribution shift caused by discarding the imaginary part during the watermark embedding process. This distribution shift helps distinguish watermarked and unwatermarked images in verification (separating the two distributions), but is entirely useless in identification (different keys undergo the same shift).

Goal: (a) Reveal the true source of Tree-Ring's robustness; (b) expose its severe defects in multi-key identification; (c) design a systematic approach to enhance multi-key identification capability.

Key Insight: Mathematically prove that the distribution shift factor is \(\frac{\sqrt{3}}{2}\), and experimentally verify its varying contributions under different attacks.

Core Idea: Integrate the complementary advantages of different types of watermarks through a multi-channel heterogeneous watermarking framework, combined with improvements such as discretization, lossless embedding, and a more circular ring design, to systematically solve the failure of Tree-Ring in multi-key identification.

Method¶

Overall Architecture¶

RingID is based on Tree-Ring but with systematic enhancements. Overall process: within the 4-channel initial noise of StableDiffusion, the improved ring watermark is embedded in channel 3, while a Gaussian noise watermark is embedded in channel 0. During detection, the initial noise is recovered through DDIM inversion, the watermark is extracted in the frequency domain, and minimum distance matching is performed against all candidate keys.

Key Designs¶

Multi-Channel Heterogeneous (MCH) Watermarking Framework:
- Function: Embed different types of watermarks in different channels to leverage complementary advantages.
- Mechanism: During matching, calculate the minimum weighted \(\ell_1\) distance to the reference keys across all watermarking channels: \(\text{ID}(\hat{w}) = \arg\min_i \{ \min_{c \in C_w} [\lambda_c \|\hat{w}^c - w_i^c\|_1] \}\).
- Design Motivation: Gaussian noise watermarks are robust against non-geometric attacks, while ring watermarks are robust against certain geometric attacks; they complement each other. Experiments demonstrate that the combination can adaptively select the most robust watermarking channel under the current attack.
Discretization to Enhance Distinctiveness:
- Function: Change the values of the ring pattern from continuous Gaussian sampling to discrete \(\pm\alpha\).
- Mechanism: The key capacity for \(n\) rings is \(2^n\), and \(\alpha\) is set to the standard deviation of the initial noise (64).
- Design Motivation: Continuous Gaussian sampling makes the differences between different keys extremely small and hard to distinguish; discretization greatly widens the distance between keys, significantly improving the effective capacity.
Lossless Embedding:
- Function: Eliminate the pattern loss caused by discarding the imaginary part during the embedding process.
- Mechanism: Embed the ring pattern only in the real part of the frequency domain while keeping the imaginary part empty, making \(X[u,v] = X_{cs}[u,v]\) hold, ensuring that the pattern remains unchanged after IFFT and subsequent FFT.
- Design Motivation: The actual pattern carried by the original Tree-Ring after discarding the imaginary part is inconsistent with the design, which destroys rotational symmetry.
Spatial Domain Shift to Prevent Rotational Cropping:
- Function: Move watermark energy from the four corners to the center to prevent it from being cropped during rotation.
- Mechanism: Perform a cyclic shift of N/2 pixels in the spatial domain, which is equivalent to multiplying by a checkerboard \((-1)^{u+v}\) in the frequency domain, and then multiplying by an attenuation factor \(\eta \in [0.8, 0.9]\) to suppress center artifacts.
Circular Ring Design:
- Function: Improve the circularity of the ring pattern at low resolutions.
- Mechanism: Draw more circular rings by rotating a single pixel 360 degrees on a low-resolution canvas to record the trajectory, eliminating aliasing and asymmetry.

Loss & Training¶

Watermark embedding and detection do not involve training loss; the core lies in signal processing: FFT/IFFT transforms + \(\ell_1\) distance matching.

Key Experimental Results¶

Main Results¶

Using StableDiffusion-V2, default ring radius 3-14, 1000 watermarked/unwatermarked images.

Method	#Keys	Clean	Rotate	JPEG	C&S	Blur	Noise	Bright	Avg(w/o C&S)
Tree-Ring	32	0.790	0.020	0.420	0.040	0.610	0.530	0.420	0.465
Tree-Ring	2048	0.200	0.000	0.040	0.000	0.090	0.070	0.060	0.077
RingID	32	1.000	1.000	1.000	0.530	0.990	1.000	0.960	0.992
RingID	2048	1.000	0.860	1.000	0.080	0.970	0.950	0.870	0.942

Verification task AUC: RingID (0.995) vs Tree-Ring (0.975), while maintaining similar CLIP scores (0.365 vs 0.364) and FID (26.13 vs 25.93).

Ablation Study¶

Configuration	Clean	Rotate	JPEG	Blur	Noise	Avg
Full RingID	1.000	0.860	1.000	0.970	0.950	0.819
w/o Spatial Shift	1.000	0.000	1.000	0.990	0.930	0.701
w/o Lossless Embedding	1.000	0.010	0.970	0.950	0.980	0.700
w/o Circular Ring	1.000	0.620	0.990	0.890	0.970	0.774
w/o Discretization	0.980	0.120	0.380	0.450	0.650	0.427
w/o Heterogeneous Watermark	1.000	0.820	0.940	0.960	0.710	0.740

Key Findings¶

Discretization contributes the most: Removing it causes the average accuracy to plummet from 0.819 to 0.427, indicating that key distinctiveness is the core of multi-key identification.
Spatial shift and lossless embedding are crucial for rotation: Removing either drops the accuracy under rotation to near 0.
Crop & Scale is a common weakness for all methods: Frequency-domain scaling directly destroys the pattern and cannot be resolved by the current scheme.
Heterogeneous watermarking successfully blends the advantages of noise watermarking (robust to JPEG) and ring watermarking (robust to rotation).

Highlights & Insights¶

Discovery of distribution shift is highly insightful: It reveals for the first time that the distribution shift introduced by discarding the imaginary part acts as a "hidden helper" for Tree-Ring's verification robustness, mathematically proving the shift factor is \(\frac{\sqrt{3}}{2}\). This finding changes the understanding of this method.
Transferable multi-channel heterogeneous design: The idea of placing different types of watermarks in different channels can be generalized to any watermarking system that needs to resist multiple attacks.
Simple and effective discretization trick: Changing from continuous Gaussian to binary \(\pm\alpha\) greatly enhances identification capability with minimal cost.

Limitations & Future Work¶

The identification accuracy under Crop & Scale attack is still very low (0.08 @ 2048 keys), which is an inherent challenge for frequency-domain watermarks.
The capacity of the ring pattern is limited by the number of rings (radius range); large-scale user scenarios require multi-channel scaling.
Verified only on StableDiffusion-V2; the applicability to other diffusion model architectures remains unknown.

Rating¶

Novelty: ⭐⭐⭐⭐ The distribution shift analysis is a novel contribution, although individual improvement modules are relatively engineering-oriented.
Experimental Thoroughness: ⭐⭐⭐⭐⭐ Evaluation on both verification and identification tasks, across multiple attacks, with thorough ablation studies.
Writing Quality: ⭐⭐⭐⭐ Problem definitions are clear, with rigorous analytical logic.
Value: ⭐⭐⭐⭐ Significantly drives the field of diffusion model watermarking, serving as a pioneering work in multi-key identification.