PMark: Towards Robust and Distortion-free Semantic-level Watermarking with Channel Constraints¶

Conference: ICLR 2026 arXiv: 2509.21057 Code: Coming soon Area: AI Safety / Watermarking Keywords: LLM watermarking, semantic-level watermarking, distortion-free, multi-channel constraints, robustness theory

TL;DR¶

PMark is a theoretically distortion-free and paraphrase-robust semantic-level watermarking method for LLMs. It employs cascaded binary filtering over candidate sentences using multiple orthogonal pivot vectors, with median-based sampling to guarantee distortion-freeness. Multi-channel design increases watermark evidence density and enhances robustness. Under paraphrase attacks, TP@FP1% reaches 95%+, outperforming prior SWM methods by 14.8%.

Background & Motivation¶

Background: LLM watermarking falls into two categories: token-level (e.g., Green-Red watermarking) and semantic-level (SWM). SWM embeds watermark signals in the sentence semantic space to improve robustness against paraphrase attacks.

Limitations of Prior Work: - Existing SWM methods (SemStamp/k-SemStamp) rely on rejection sampling, introducing distributional distortion. - Single-channel watermarks have sparse evidence density, making them easy to break under paraphrase attacks. - No rigorous theoretical framework exists for analyzing watermark properties (distortion-free conditions, robustness bounds).

Key Challenge: A fundamental trade-off between distortion-freeness (preserving generation quality) and robustness (resisting paraphrase attacks).

Goal: Simultaneously achieve theoretical distortion-free guarantees and strong practical robustness against paraphrase attacks.

Key Insight: Multi-channel orthogonal pivot vectors — each sentence embeds multiple independent watermark bits, multiplying evidence density.

Core Idea: Distortion-free median sampling + cascaded multi-orthogonal-channel filtering = high-density watermark evidence → robustness.

Method¶

Overall Architecture¶

During generation: for each sentence to be generated, sample \(N\) candidates → apply \(b\) orthogonal pivot vectors to successively bisect the candidate set → uniformly sample from the final subset. During detection: re-sample \(N\) candidates per sentence to reconstruct the median → apply a soft z-test for statistical hypothesis testing. The offline variant simplifies to a zero-median prior, eliminating the need for re-sampling at detection time.

Key Designs¶

Proxy Function Theoretical Framework:
- Function: Unifies the theoretical analysis of semantic-level watermarking.
- Core Theorem: The watermarked distribution is distortion-free if and only if \(q(u) = 1/M\) (i.e., the proxy value distribution is uniform), which is difficult to satisfy in practice.
- Design Motivation: Provides a theoretical tool for analyzing the sources of distortion in existing methods.
Single-Channel Distortion-Free Sampling:
- Function: Ensures that single-channel watermark sampling introduces no distributional shift.
- Mechanism: Given pivot \(v\), compute cosine similarities \(\langle v, \mathcal{T}(s) \rangle\) for \(N\) candidates and find the median to split them into two halves. A key bit selects one half, from which a candidate is sampled uniformly. Since each candidate has selection probability \(1/N\), it follows that \(P_M^w(s|\pi) = P_M(s|\pi)\).
- Theoretical Guarantee (Theorem 3): Strictly distortion-free.
Multi-Channel Cascaded Filtering (Online PMark):
- Function: Uses \(b\) orthogonal pivot vectors to multiply evidence density by a factor of \(b\).
- Mechanism: \(b\) orthogonal pivots are generated via QR decomposition. For \(N\) candidates, median-based bisection is applied sequentially at each channel, retaining one half per channel (determined by a key bit): \(V^{(0)} \to V^{(1)} \to \cdots \to V^{(b)}\). The final candidate is sampled uniformly from \(V^{(b)}\) (containing \(N/2^b\) candidates).
- Robustness Theory (Theorem 7): If attacks corrupt per-channel evidence with probability \(\epsilon\), the SNR satisfies \(\text{SNR} \geq \frac{(1-2\epsilon)\sqrt{bT}}{2\sqrt{\epsilon(1-\epsilon)}}\), growing with channel count \(b\) and sentence count \(T\).
- Design Motivation: Single-channel embeds only 1 bit of evidence per sentence; multi-channel embeds \(b\) bits per sentence, multiplying evidence density.
Offline PMark (Simplified Variant):
- Function: An efficient variant that requires no re-sampling at detection time.
- Mechanism: In high-dimensional space, random vectors are nearly orthogonal and proxy function values concentrate around \([-\epsilon, \epsilon]\), so the median is close to zero. Zero is used directly as the prior median, eliminating re-sampling overhead at detection.
- Distortion Bound (Theorem 8): \(\delta_{TV} \leq \epsilon\), with \(\epsilon \leq 0.08\) in practice.

Loss & Training¶

No training required — purely a sampling-based algorithm.
Generation requires \(N\) samples per sentence (\(N = 16\)–\(64\)); the Online variant requires re-sampling at detection time to estimate the median.

Key Experimental Results¶

Main Results: TP@FP1% Under Paraphrase Attack¶

Method	No Attack	Doc-P (GPT Paraphrase)	Gain
SemStamp (C4/Mistral)	~99%	73.5%	—
k-SemStamp	100%	~80%	—
PMark Online	100%	97.8%	+24.3%
PMark Offline	99.7%	92.6%	+19.1%

Ablation Study: Channel Count \(b\) and Sample Count \(N\)¶

N\b	b=1	b=2	b=3	b=4
N=8 (Online)	81.0	97.0	98.0	—
N=16	84.0	100.0	100.0	100.0
N=64	99.0	100.0	100.0	100.0

Key Findings¶

Multi-channel is the key: Detection rate jumps from 81% to 97% when increasing from \(b=1\) to \(b=2\).
Text quality improves rather than degrades: PMark achieves lower PPL (4.37) than k-SemStamp (~5.0), as distortion-free sampling introduces no distributional shift.
Robust to GPT-level paraphrase: Even under heavy GPT-based paraphrasing (Doc-P), TP@FP1% remains above 95%.

Highlights & Insights¶

Elegant unification of theory and practice: The paper rigorously proves distortion-free conditions and a robustness bound where SNR grows as \(\sqrt{bT}\) — a rare contribution in the watermarking literature. Theory directly drives method design.
Core intuition of multi-channel evidence density: Analogous to redundancy in error-correcting codes — embedding multiple independent bits per sentence allows overall signal recovery even when some bits are corrupted by attacks.
The offline simplification is remarkably clever: It exploits the near-orthogonality of high-dimensional random vectors to approximate the median as zero, eliminating re-sampling overhead at detection time.

Limitations & Future Work¶

Sampling overhead: Each sentence requires \(N\) samples (\(N = 16\)–\(64\)), which introduces latency for real-time applications.
Dependency on semantic encoder: A fixed encoder (e.g., RoBERTa) is used; encoder quality directly affects watermarking performance.
Sentence-level embedding only: Reliable detection requires texts of at least ~10 sentences; short texts are not well supported.
Future directions: A hybrid scheme combining token-level and semantic-level watermarking — token-level for short texts, PMark for long texts — is a promising direction.

vs. SemStamp/k-SemStamp: These methods introduce distortion via rejection sampling; PMark achieves strict distortion-freeness via median-based sampling, with a 14.8% robustness improvement.
vs. Green-Red token-level watermarking: Token-level methods are fragile under paraphrasing (each token substitution represents information loss); PMark embeds watermarks at the semantic level, providing robustness to synonym-level paraphrasing.
vs. UPV (best token-level method): PMark improves paraphrase robustness by 44.6%.

Rating¶

Novelty: ⭐⭐⭐⭐⭐ Both the theoretical framework and the multi-channel distortion-free design are significant contributions.
Experimental Thoroughness: ⭐⭐⭐⭐ Covers multiple models, datasets, and attack types, though experiments at larger LLM scales are lacking.
Writing Quality: ⭐⭐⭐⭐⭐ Theoretical derivations are rigorous and method descriptions are clear.
Value: ⭐⭐⭐⭐⭐ Addresses two core challenges in semantic watermarking (distortion and robustness) with high theoretical and practical value.