Watermarking Autoregressive Image Generation¶
Conference: NeurIPS 2025 arXiv: 2506.16349 Code: https://github.com/facebookresearch/wmar Area: Image Generation / AI Watermarking Keywords: Autoregressive image generation, watermarking, reverse cycle consistency, token-level watermarking, LLM watermark adaptation
TL;DR¶
This paper is the first to adapt LLM watermarking (KGW green/red scheme) to the token level of autoregressive image generation models. It identifies and addresses the key challenge of insufficient Reverse Cycle Consistency (RCC) through tokenizer–detokenizer fine-tuning and a watermark synchronization layer, achieving robust image watermark detection with theoretical guarantees.
Background & Motivation¶
Autoregressive image generation models (DALL-E, Chameleon, RAR, etc.) discretize images into token sequences and generate them with Transformers, making them a significant alternative to diffusion models. However, no effective provenance tracking solution exists for their outputs.
Limitations of existing watermarking schemes: - Post-processing watermarks (pixel modification): model-agnostic but vulnerable to adversarial attacks and lacking theoretical p-value guarantees. - Diffusion model watermarks: designed specifically for diffusion-based generation and inapplicable to autoregressive models. - LLM watermarks (KGW): effective on text tokens but never adapted to image tokens.
Core challenge — Reverse Cycle Consistency (RCC): LLM watermark detection requires re-tokenizing generated content and checking the proportion of green tokens. For text, BPE tokenizers achieve very high RCC (token match ≈ 0.995). For image VQ tokenizers, however, the cycle of generated tokens → decoded image → re-encoded tokens changes approximately one-third of tokens (TM ≈ 0.66). This further degrades to 0.31 under JPEG compression and approaches zero under geometric transformations (flip, rotation). The root causes are: (1) VQ tokenizers are trained for forward cycle consistency (FCC), leaving decoded images off-manifold; and (2) spatial sensitivity causes even semantics-preserving edits to alter the majority of tokens.
Method¶
Overall Architecture¶
- Generation: apply KGW watermarking directly to the autoregressive token sequence (adding \(\delta\) to the logits of green tokens).
- Detection: image → re-tokenize → count green tokens → compute p-value.
- Core improvements: (a) fine-tune the detokenizer/encoder to improve RCC; (b) apply a watermark synchronization layer to handle geometric transformations.
Key Designs¶
-
RCC Fine-tuning (Section 3.1):
-
The encoder \(E\), quantizer \(Q_C\), and codebook \(C\) are kept frozen (to avoid retraining the autoregressive model).
- Only the decoder \(D\) and a separate encoder copy \(E'\) (used exclusively at detection time) are fine-tuned.
- RCC loss: \(\mathcal{L}_{RCC}(s) = \mathbb{E}_{a \sim \mathcal{A}} \| \hat{z} - E'(a(D(\hat{z}))) \|_2^2\), targeting alignment of the soft latents after the decode–encode cycle with the original hard latents \(\hat{z} = C_s\).
- Data augmentations (JPEG, brightness, slight rotations, etc.) are sampled randomly during training to make RCC robust to valuemetric transformations.
- Regularization: \(\mathcal{L}_{reg} = \|D(\hat{z}) - D_0(\hat{z})\|_2^2 + \mathcal{L}_{LPIPS}\), preserving decoding quality.
-
Total loss: \(\mathcal{L} = \mathcal{L}_{RCC} + \lambda \cdot \mathcal{L}_{reg}\)
-
Watermark Synchronization Layer (Section 3.2):
-
Geometric transformations (flip, rotation) completely disrupt token correspondence and cannot be addressed by RCC fine-tuning alone.
- Solution: leverage localized watermarking [Sander et al.] to embed four fixed 32-bit synchronization messages in the four image quadrants.
- At detection time: search over a grid of rotation angles to find the optimal pair of orthogonal lines that best separates the four messages, thereby estimating and inverting the geometric transformation.
-
The token-level watermark detector is then applied to the rectified image to compute the p-value.
-
Cross-modal Joint Detection:
-
For mixed-modality outputs (e.g., interleaved text and images from Chameleon), scores \(S^{(i)}\), \(T^{(i)}\), and \(h^{(i)}\) across samples are summed, deduplicated, and used to compute a unified p-value.
- Joint detection across text and image tokens further improves detection confidence.
Loss & Training¶
Training is performed on 50,000 ImageNet training images for 10 epochs. Training times: Taming: 22 h / 16 V100s; Chameleon: 2.5 h / 8 H200s; RAR-XL: 0.5 h / 8 H200s. Watermark parameters: \(\delta=2\), \(\gamma=0.25\).
Key Experimental Results¶
Main Results (TPR @ 1% FPR)¶
| Variant | No Transform | Valuemetric | Geometric | Adversarial | Neural Compression |
|---|---|---|---|---|---|
| Base | 0.99 | 0.26 | 0.01 | 0.43 | 0.48 |
| FT | 1.00 | 0.45 | 0.01 | 0.70 | 0.71 |
| FT+Augs | 1.00 | 0.92 | 0.01 | 0.70 | 0.79 |
| FT+Augs+Sync | 0.98 | 0.83 | 0.82 | 0.69 | 0.80 |
RCC fine-tuning improves valuemetric robustness from 0.26 to 0.92; the synchronization layer improves geometric robustness from 0.01 to 0.82.
Ablation Study (Token Match and Generation Quality)¶
| Configuration | Token Match (original) | Token Match (JPEG Q=25) | FID |
|---|---|---|---|
| Original tokenizer | 0.66 | 0.31 | 16.7 |
| FT | >0.80 | ~0.55 | ≤16.7 |
| FT+Augs | >0.80 | ~0.70 | ≤16.7 |
| FT+Augs+Sync | >0.80 | ~0.70 | 17.3 |
Fine-tuning substantially improves token match with negligible change in FID, confirming that watermarking does not degrade generation quality.
Key Findings¶
- RCC is the central bottleneck for watermark robustness: the original VQ tokenizer achieves TM of only 0.66, which exceeds 0.80 after fine-tuning.
- Fine-tuning not only improves valuemetric robustness but also unexpectedly enhances robustness against neural compression and diffusion purification attacks.
- The synchronization layer resolves the fundamental challenge of geometric transformations, albeit with a slight trade-off in valuemetric robustness.
- Compared to post-processing methods (CIN, MBRS, Trustmark, WAM), the proposed method is more robust to diffusion purification and neural compression.
- Consistent conclusions across three models (Taming, Chameleon, RAR-XL) demonstrate the generality of the approach.
Highlights & Insights¶
- The identification and resolution of the RCC problem constitutes the paper's most significant contribution: it precisely diagnoses the core obstacle to transferring LLM watermarking to image tokens.
- The fine-tuning scheme is extremely lightweight — only the decoder and an encoder copy are updated, with no need to retrain the autoregressive model.
- The p-value computation for cross-modal joint detection maintains theoretical rigor (binomial hypothesis test).
- The synchronization layer paradigm — using an auxiliary signal to estimate the transformation, invert it, and then detect the watermark — is broadly generalizable.
Limitations & Future Work¶
- The synchronization layer assumes that cropping preserves at least one corner; handling arbitrary crops would require more sophisticated synchronization patterns.
- A trade-off exists between the synchronization layer and valuemetric robustness, as synchronization signal corruption can cause incorrect inversion.
- Only zero-bit watermarking (presence/absence detection) is studied; multi-bit message embedding is not explored.
- Applicability to non-standard autoregressive architectures such as VAR remains to be verified.
Related Work & Insights¶
- vs. KGW (LLM watermarking): directly adapted but the paper identifies and resolves the RCC challenge, enabling cross-modal extension of watermarking from text to image tokens.
- vs. diffusion model watermarks (Tree-Ring, etc.): different paradigms — diffusion models inject watermarks in latent space, whereas this work injects them into token sequences.
- vs. post-processing methods (Trustmark, WAM): post-processing methods offer stronger valuemetric robustness but are highly vulnerable to diffusion purification and neural compression.
Rating¶
- Novelty: ⭐⭐⭐⭐⭐ First study of watermarking for autoregressive image generation; both the identification of the RCC problem and its solution are original.
- Experimental Thoroughness: ⭐⭐⭐⭐⭐ Covers 3 models and multiple attack types (valuemetric / geometric / adversarial / compression), with comparisons against post-processing baselines.
- Writing Quality: ⭐⭐⭐⭐⭐ Problem formulation is clear, challenge analysis is thorough, and experiments are comprehensive.
- Value: ⭐⭐⭐⭐⭐ Fills an important gap in watermark-based provenance tracking for the rapidly growing field of autoregressive image generation.