Skip to content

Transferable Multi-Bit Watermarking Across Frozen Diffusion Models via Latent Consistency Bridges

Conference: ICML2026
arXiv: 2603.20304
Code: To be confirmed
Area: AI Security / Diffusion Model Watermarking
Keywords: Multi-bit watermarking, Frozen diffusion models, Latent space perturbation, LCM differentiable bridge, Cross-model transferable

TL;DR

DiffMark continuously injects a learned latent space perturbation \(\delta\) into each denoising step of a frozen diffusion model, allowing the watermark signal to accumulate on the final state latent variable \(z_0\). By utilizing a Latent Consistency Model (LCM) as a differentiable training bridge to bypass the backpropagation of 50-step DDIM, the method achieves 64-bit extraction in 16.4 ms via a single forward pass, providing a plug-and-play solution across models without retraining.

Background & Motivation

Background: Currently, mainstream diffusion model watermarking follows two paths: sampling-based (Tree-Ring / RingID / Shallow Diffuse), which embeds watermarks in the initial noise \(z_T\) or intermediate latents and relies on 50-step DDIM inversion for detection; and fine-tuning-based (Stable Signature / AquaLoRA), which binds watermarks to model weights by fine-tuning the UNet or using LoRA, followed by a lightweight decoder for one-time multi-bit extraction.

Limitations of Prior Work: The inversion detection in sampling-based methods requires running the UNet for \(N=50\) steps per image, which is unacceptable for platform-level throughput. Most of these only support 0-bit (presence/absence) and cannot perform user attribution. Furthermore, noise patterns must be regenerated for each unique key. Although fine-tuning-based methods support multi-bit single-pass decoding, the watermark is strictly tied to a specific checkpoint, requiring retraining for every new SD variant—making unified governance impossible in the open-source ecosystem.

Key Challenge: Regulators require a "cross-model, attributable, and platform-scale verifiable" watermarking infrastructure. Existing methods compromise between "latency/bits" and "model transferability." The root cause lies in anchoring the watermark to \(z_T\) (forcing inversion) or UNet weights (forcing retraining).

Goal: (i) Extract \(L\) bits in a single forward pass at the detector, avoiding \(N\)-step inversion; (ii) Ensure the watermark is transparent to the frozen UNet, allowing a single set of encoder-decoder weights to work across the SD family; (iii) Enable arbitrary keys for each image without retraining.

Key Insight: The authors observe that since cumulative perturbations are amplified along the denoising trajectory and ultimately reside in \(z_0\), the watermark does not need to be hidden in \(z_T\). Instead, it can be injected as a "constant additive perturbation \(\delta\)" before each denoising step. Consequently, the decoder only needs to examine \(z_0\), making inversion unnecessary. Since \(\delta\) depends only on a lightweight encoder \(E_\phi(s)\), it is decoupled from UNet weights, inherently supporting cross-model use and per-image keys.

Core Idea: The watermark is redefined from a "pattern in noise" or a "fingerprint in weights" to a "persistent latent perturbation \(\delta\) added at every step." An LCM is then used to compress the 50-step DDIM into a 4-step differentiable path to backpropagate gradients for \(\delta\), enabling end-to-end learning of an encoder-decoder pair while the UNet remains completely frozen.

Method

Overall Architecture

The inference side of DiffMark is straightforward: given an \(L=64\) bit key \(s\), a lightweight encoder \(E_\phi\) maps it to a latent perturbation \(\delta = E_\phi(s) \in \mathbb{R}^{4 \times h \times w}\). Standard 50-step DDIM sampling proceeds as usual, with the only modification being the addition of \(\delta\) to the current latent before each UNet call: \(\tilde z_{t_k} = z_{t_k} + \delta\). The \(\tilde z_{t_k}\) is then fed into the frozen \(\epsilon_\theta\) to predict noise for the DDIM update. The final \(z_0\) is decoded into an image. During detection, the image is passed through a VAE encoder to retrieve \(z_0\), which is fed into a lightweight decoder \(D_\psi\) for a single forward pass to obtain \(\hat s = D_\psi(z_0) \in \mathbb{R}^{L \times 2}\), followed by a bit-wise argmax to recover the key.

The training side employs two parallel paths: a "short and differentiable" LCM path (\(K=4\) steps) to backpropagate encoder gradients, and a "long and high-fidelity" DDIM path (\(N=50\) steps, with stop-gradient on \(\delta\) and per-step injection scaled by \(1/N\) to match cumulative perturbation) to provide the decoder with the \(z_0^{ddim}\) seen during real inference. The UNet in the LCM path only allows gradients to pass through without updating weights; thus, the resulting \((E_\phi, D_\psi)\) can be used with any SD-family model at zero cost.

Key Designs

  1. Persistent Delta Injection:

    • Function: Encodes an \(L\)-bit key into an additive latent space perturbation \(\delta\), which is added to the current latent before each DDIM denoising step, allowing the watermark signal to accumulate into \(z_0\) along the sampling trajectory.
    • Mechanism: Replaces the original latent with \(\tilde z_{t_k} = z_{t_k} + \delta\) before feeding it to the UNet (including CFG), followed by standard DDIM updates: \(z_{t_{k+1}} = \sqrt{\bar\alpha_{t_{k+1}}}\frac{\tilde z_{t_k} - \sqrt{1-\bar\alpha_{t_k}}\hat\epsilon_{t_k}}{\sqrt{\bar\alpha_{t_k}}} + \sqrt{1-\bar\alpha_{t_{k+1}}}\hat\epsilon_{t_k}\). To ensure \(\tilde z_{t_k}\) stays within the UNet's training distribution, a magnitude loss \(\mathcal{L}_{mag} = (\sigma(\delta) - \sigma_{target})^2\) and KL divergence \(\mathcal{L}_{KL}\) are applied to pull the encoder's variational output toward a standard Gaussian, forcing \(\|\delta\| \ll \|z_T\|\).
    • Design Motivation: Completely bypasses DDIM inversion by anchoring the watermark to \(z_0\) rather than \(z_T\), requiring only one forward pass for decoding. Since \(\delta\) depends on the key rather than image content, it naturally supports per-image keys.
  2. Dual-Path with LCM Bridge:

    • Function: Provides backpropagatable gradient signals for the encoder under the constraint of a frozen UNet, while supervising the decoder on the actual 50-step DDIM output.
    • Mechanism: The LCM path uses \(K=4\) LCM forward steps to push the perturbation \(\delta\) to \(z_0^{lcm}\). The backward chain \(\mathcal{L}_{lcm} \to D_\psi \to z_0^{lcm} \to 4\,\text{LCM steps} \to \delta \to E_\phi\) is fully differentiable, with gradients passing through but not modifying the UNet. The parallel DDIM path uses \(N=50\) standard steps to obtain high-fidelity \(z_0^{ddim}\), applying stop-gradient to \(\delta\) and scaling per-step injection to \(\delta/N\) to match the LCM path's accumulation. The loss \(\mathcal{L}_{ddim} = \mathcal{L}_{CE}(D_\psi(z_0^{ddim}), s)\) updates only the decoder.
    • Design Motivation: The computational graph of 50-step DDIM is prohibitive in terms of memory and gradient stability. The 4-step LCM distillation serves as a "short differentiable approximation." However, since LCM fidelity is lower than DDIM, relying solely on LCM would cause the decoder to fail on the inference distribution. The dual-path approach decouples the two goals: "encoder learning where to place" and "decoder learning how to read."
  3. Multi-Stage Curriculum:

    • Function: Addresses the conflict between reconstruction (requiring large \(\|\delta\|\)) and imperceptibility (requiring \(\|\delta\| \to 0\)) during joint optimization.
    • Mechanism: Groups losses into a reconstruction set \(\mathcal{G}_{rec} = \{\mathcal{L}_{lcm}, \mathcal{L}_{ddim}\}\) and an imperceptibility set \(\mathcal{G}_{imp} = \{\mathcal{L}_{lafid}, \mathcal{L}_{prvl}, \mathcal{L}_{freq}, \mathcal{L}_{neg}\}\). Each loss is assigned an activation gate \(g_i(t) = \mathbb{1}[t \geq \tau_i]\), with the schedule satisfying \(\max_{i \in \mathcal{G}_{rec}} \tau_i \leq \min_{j \in \mathcal{G}_{imp}} \tau_j\). Total loss is \(\mathcal{L}(t) = \sum_i g_i(t) \cdot w_i(t) \cdot \mathcal{L}_i\). Here, \(\mathcal{L}_{lafid}\) constrains \(z_0\) displacement, \(\mathcal{L}_{prvl}\) spreads watermark energy, \(\mathcal{L}_{freq}\) pushes perturbations to high frequencies, and \(\mathcal{L}_{neg}\) applies negative entropy to non-watermarked images to force uniform decoder output, suppressing false positives.
    • Design Motivation: If both loss groups are enabled simultaneously, imperceptibility terms would push \(\delta\) to 0 before the watermark signal is established. Strictly establishing a decodable watermark before refining imperceptibility is the only stable path for such adversarial objectives.

Loss & Training

In the pre-training phase, the encoder-decoder is jointly trained for 50,000 steps independent of the DM using \(\mathcal{L}_{CE}\) (per-bit cross-entropy) + orthogonality loss \(\mathcal{L}_{orth} = \frac{1}{B(B-1)}\sum_{i \neq j} \frac{\langle\delta_i, \delta_j\rangle_F}{\|\delta_i\|_F \|\delta_j\|_F}\) (batch 64, AdamW, encoder lr \(3 \times 10^{-4}\), decoder lr \(1 \times 10^{-4}\)), ensuring unique keys map to orthogonal perturbations robust to \(\delta + \epsilon\) noise. This is followed by 10,000 steps of joint fine-tuning with SD v1.5 + LCM_Dreamshaper_v7 (\(K=4\)) (batch 16, encoder lr \(5 \times 10^{-5}\), decoder lr \(3 \times 10^{-4}\)), utilizing curriculum gating for \(\mathcal{G}_{rec}\) and \(\mathcal{G}_{imp}\). Inference uses standard 50-step DDIM with zero extra overhead compared to the base model.

Key Experimental Results

Main Results

Comparison with 6 baselines (StegaStamp / Stable Signature / AquaLoRA / Tree-Ring / RingID / Shallow Diffuse) on DiffusionDB across detection accuracy, consistency, and quality:

Method Type Plug&Play bits Bit Acc TPR@0.1%FPR PSNR↑ FID↓ CLIP-FID↓
StegaStamp Post-proc 100 0.9994 1.0 11.34 54.82 10.26
Stable Signature FT 48 0.9950 0.9900 16.23 46.81 4.61
AquaLoRA FT 48 0.9355 0.9910 20.59 32.32 1.98
Tree-Ring Sampling 0 1.0 11.02 47.09 4.65
RingID Sampling 11 1.0 10.74 47.18 4.77
Shallow Diffuse Sampling 0 1.0 11.01 43.37 4.10
DiffMark Sampling 64 0.9381 1.0 11.01 38.07 2.20

Cross-model Transferability: Trained only on SD 1.5, zero-shot testing on SD-2.1 / DreamShaper 8 / Realistic Vision 5.1 / OpenJourney v4 showed stable bit accuracy between 93.3–95.5%, proving generalizability across the SD family. Detection Latency: DiffMark takes 16.4 ms/img on an L40S GPU, compared to 754.9 ms for Tree-Ring and 753.2 ms for RingID, a \(45\times\) speedup. Ownership Attribution: 64-bit capacity achieves 100% Top-1 accuracy for \(10^6\) users and \(\geq 99.97\%\) for \(10^8\) users.

Ablation Study

Config Key Finding Description
K=2 / 4 / 8 LCM Steps K=4 for main results; K=2 is faster with similar accuracy. K=2 is likely the optimal default; increasing K provides no extra benefit.
L=48 / 64 / 128 / 256 bits Training collapses at L=128; \(\mathcal{L}_{orth}\) fails diversity. L=64 is the sweet spot; L=256 causes LPIPS to spike to 0.508.
Random vs Fixed key (DiffMark) BER distributions are nearly identical. Generalizes to \(2^{64}\) key space without local interpolation.
Random vs Fixed key (AquaLoRA) BER jumps from 6.42% to 28.16%. Highlights overfitting of fine-tuning methods to training keys.
Robustness (13 Attacks) Avg 0.70 TPR@0.1%FPR; 1.00 for brightness/JPEG. Fails (0.00) on rotation/cropping/Adv-KLVAE8 due to latent corruption.

Key Findings

  • The LCM path acts as a "gradient bridge" rather than a "sampler." Ablation shows \(K=2\) is sufficient, and more steps only slow down training.
  • Curriculum training, specifically delaying imperceptibility losses until after decodability is achieved, is critical for stability.
  • Latent-space watermarking systematically fails against geometric transformations (rotation/cropping) and gray-box adversarial attacks on VAE, which is a structural limitation of this paradigm.
  • A 64-bit capacity maintains 99.97% Top-1 attribution at a \(10^8\) user scale, a critical threshold for usable governance infrastructure.

Highlights & Insights

  • Anchoring the watermark to \(z_0\) instead of \(z_T\) solves latency, key flexibility, and cross-model transferability simultaneously. This proves that hiding watermarks in initial noise is not a necessary assumption.
  • The dual-path paradigm (LCM for "differentiable short path" + DDIM for "high-fidelity inference path") is a general template for training small modules within frozen large models.
  • The "decode first, hide later" curriculum reveals a rule: when joint-training adversarial objectives, a stable attractor for the "hard goal" must be established before introducing "soft goals."
  • Section 5 maps technical capabilities directly to regulatory requirements (EU AI Act / SB-53), showing how single-pass 64-bit decoding translates to "user attribution" and "auditability."

Limitations & Future Work

  • Limitations: Structural failure under rotation, blur, and random cropping as these disrupt the latent representation \(z_0 = \mathcal{E}(x) \cdot f_s\).
  • Further Constraints: (i) Only tested on SD 1.x/2.x; architecture differences in SDXL/Flux/SD3 remain unverified. (ii) Efficacy of "same \(\delta\) at every step" on ultra-short trajectories (1-4 steps) is unknown. (iii) Fixed 64-bit key space requires retraining for variable lengths.
  • Future Work: Injecting \(\delta\) using joint space-frequency transforms for geometric invariance; utilizing time-dependent \(\delta_t = E_\phi(s, t)\); and training a robust VAE adapter to defend against gray-box attacks.
  • vs Sampling-based (Tree-Ring, etc.): These write to \(z_T\), requiring 50-step inversion. DiffMark writes to \(\delta\) injected at each step, requiring only one VAE + one decoder forward pass, achieving \(45\times\) acceleration and multi-bit capacity.
  • vs Fine-tuning-based (Stable Signature, etc.): These modify weights, binding them to specific checkpoints. DiffMark is cross-model (93.3% accuracy on unseen models) and avoids the key-overfitting seen in methods like AquaLoRA.
  • vs StegaStamp (Post-generation): StegaStamp processes pixels after generation, harming quality (PSNR 11.34). DiffMark embeds during generation, yielding superior quality metrics.
  • Insight: Using LCM as a "differentiable gradient bridge" can be extended to any task requiring end-to-end learning of lightweight condition modules (e.g., conditional encoders or reward heads) on frozen diffusion models while avoiding memory explosion.

Rating

  • Novelty: ⭐⭐⭐⭐ (Combining \(z_0\) anchoring with LCM bridges is a clean, effective shift.)
  • Experimental Thoroughness: ⭐⭐⭐⭐ (Covers many baselines, models, and attacks.)
  • Writing Quality: ⭐⭐⭐⭐⭐ (Excellent policy-to-technology argumentation.)
  • Value: ⭐⭐⭐⭐⭐ (\(45\times\) speedup and transferability make this a viable infrastructure candidate.)