Skip to content

Efficiently Identifying Watermarked Segments in Mixed-Source Texts

Conference: ACL 2025
arXiv: 2410.03600
Code: https://github.com/XuandongZhao/llm-watermark-location
Area: AI Safety / Text Watermark Detection
Keywords: watermark detection, mixed-source text, geometric cover, online learning, watermark localization

TL;DR

Proposes two efficient methods (Geometric Cover Detector and Adaptive Online Locator) to detect and precisely locate watermarked segments in long mixed-source texts, reducing the time complexity from \(O(n^2)\) to \(O(n \log n)\), significantly outperforming baselines across three mainstream watermarking techniques.

Background & Motivation

  • Watermarking techniques for LLM-generated texts are increasingly used to detect synthetic text, preventing abuses such as fake news and academic misconduct.
  • Existing watermark detection methods primarily focus on document-level classification (whether it contains a watermark), but overlook a common and important scenario: identifying which specific segments are generated by an LLM within longer, mixed-source documents.
  • Practical application scenario: Malicious actors might use LLMs to modify certain paragraphs of news articles to spread misinformation, requiring functionality similar to Turnitin plagiarism detection systems to locate these segments.
  • Key Challenge:
    • The watermark signal is "diluted" by the unwatermarked parts in long texts, making document-level detection methods ineffective.
    • Brute-force searching all possible subsequence intervals requires an \(O(n^2)\) time complexity, which is computationally prohibitive.
    • The watermark score of a single token is highly noisy, making direct threshold-based determination unreliable.

Method

Overall Architecture

The paper proposes two complementary methods: 1. Geometric Cover Detector (GCD): Determines whether the document contains watermarked text (classification task). 2. Adaptive Online Locator (AOL): Precisely locates the start and end positions of the watermarked text (localization task).

Both methods are based on the Geometric Cover technique, with a time complexity of \(O(n \log n)\).

Key Designs

Three Target Watermarking Schemes

  • KGW-Watermark: Splits the vocabulary into green/red groups based on prefix hashes, boosting the logits of green tokens during generation.
  • Unigram-Watermark: Fixes the green/red lists independently of prefixes, offering stronger robustness.
  • Gumbel-Watermark: Performs deterministic sampling based on Gumbel trick, achieving distortion-free watermarking.

Geometric Cover Detector (GCD)

  • Utilizes the Geometric Cover (GC) technique to partition long text into multi-scale interval sets.
  • GC Definition: \(\mathcal{I} = \bigcup_{k \in \mathbb{N} \cup 0} \mathcal{I}^{(k)}\), where \(\mathcal{I}^{(k)}\) is the set of all contiguous intervals of length \(2^k\).
  • Each token belongs to \(\lfloor \log n \rfloor + 1\) different intervals, totaling \(O(n)\) intervals.
  • Key guarantee (Daniely et al., 2015, Lemma 5): For any unknown watermarked interval, there always exists an interval in the GC that is completely contained within it and has at least one-quarter of its length.
  • Performs watermarking detection independently for each interval; if any interval is detected, the entire document is classified as containing watermarked text.
  • Controls the Family-Wise Error Rate (FWER) via Bonferroni correction: segment-level FPR \(\tau \rightarrow\) document-level FWER \(\le n\tau\).
  • In practice, it starts from higher-order intervals (e.g., \(\mathcal{I}^{(5)}\), intervals with 32+ tokens) to avoid unreliable judgments from overly short intervals.

Adaptive Online Locator (AOL)

  • Formulates the watermark localization problem as an online sequence denoising problem (online denoising / nonparametric regression).
  • Core Idea: The watermark score of each token is a noisy observation of the expected score, requiring estimation of the expected score sequence.
    • Red-Green Watermark: \(s_t = \mathbf{1}(y_t \in \text{Green})\), with the expected value in the watermarked segment \(> \gamma\), and in the non-watermarked segment \(= \gamma\).
    • Gumbel Watermark: \(s_t = \log(1/(1-r_{y_t}))\), with the expected value in the watermarked segment \(> 1\), and in the non-watermarked segment \(= 1\).
  • Employs the Aligator algorithm (Baby et al., 2021) for denoising.
    • Aligator internally utilizes a Geometric Cover structure.
    • Provides a minimax optimal estimation guarantee: competing with an oracle that knows the watermarked segment locations in terms of mean squared error.
    • Time complexity is \(O(n \log n)\).
  • Circular Aligator: Designed to address the boundary effects of online learning.
    • Treats the text as a circular buffer, randomly selecting a starting point to traverse the entire sequence each time.
    • Performs \(m\) iterations with different starting points, and finally takes the average of all predictions for each token.
    • Effectively eliminates prediction inaccuracies at the start and end boundaries.
  • Finally, applies a threshold \(\zeta\) to the denoised average scores; tokens with scores exceeding the threshold are identified as watermarked text.

Loss & Training

This paper presents a detection method and does not involve model training. The key aspects are: - The threshold for GCD is determined by the FPR calibration function \(F\). - The threshold \(\zeta\) for AOL is set according to the watermarking scheme (e.g., \(\zeta = 1.3\) in Gumbel watermarking). - The number of iterations for Circular Aligator is \(m = 10\) random starting points.

Key Experimental Results

Experimental Setup

  • Datasets: C4 and Arxiv (real text serves as the non-watermarked part, with LLM-generated watermarked text embedded within).
  • Models: LLaMA-7B and Mistral-7B.
  • Watermarking Schemes: KGW-Watermark, Unigram-Watermark, Gumbel-Watermark.
  • Evaluation Metrics: TPR (classification), IoU (localization).

Main Results — Classification Task (C4 + LLaMA-7B)

Method KGW TPR Unigram TPR Gumbel TPR
Vanilla (Document-level) 0.602-0.692 0.006-0.058 0.650-0.918
GCD 0.912-0.934 0.874-0.958 1.000
  • The Vanilla method almost completely fails on Unigram watermarking (TPR only 0.006) because the watermark signal is diluted by the long text.
  • GCD significantly outperforms the Vanilla method across all watermarking schemes.
  • Gumbel watermarking is the easiest to detect, with GCD achieving 100% TPR.
  • The improvement for Unigram watermarking is the most significant: rising from near-random (0.006) to 0.874+.

Main Results — Localization Task

  • AOL achieves an average IoU \(> 0.55\) across the three watermarking schemes, performing far exceeding baseline methods.
  • The Circular initialization strategy (\(m=10\) random starting points) significantly improves localization accuracy at the boundaries.
  • A single Aligator pass exhibits noticeable boundary artifacts, which are effectively eliminated by multiple circular passes.

Key Findings

  1. Document-level detection severely fails in mixed-source scenarios: Particularly for Unigram watermarking, where all signals are overwhelmed by non-watermarked text.
  2. Multi-scale analysis is key: The multi-scale interval partitioning of GC ensures that watermarked segments of varying lengths can be effectively captured.
  3. Online denoising is superior to simple thresholding: Individual token scores are highly noisy and must be denoised through appropriate window averaging.
  4. The Circular strategy is crucial for eliminating boundary effects: A single linear pass yields inaccurate predictions for the start and end positions.
  5. The method is highly generalizable across watermarking schemes: The same framework is applicable to three watermarking schemes with completely different mechanisms.

Highlights & Insights

  1. The problem formulation is highly practical: Advancing from "whether the entire document contains a watermark" to "which paragraphs contain a watermark" directly addresses real-world demands such as plagiarism detection.
  2. Elegant algorithmic design: Borrowing mature tools from Geometric Cover and online learning fields reduces the time complexity from \(O(n^2)\) to \(O(n \log n)\).
  3. Clever design of Circular Aligator: It simply but effectively solves the boundary effect problem inherent in online algorithms.
  4. Solid theoretical guarantees: Aligator's estimation error has a clear theoretical upper bound and possesses strong adaptivity at the segment level.
  5. Highly versatile framework: Does not rely on a specific watermarking scheme, making it extensible to future watermarking technologies.

Limitations & Future Work

  1. Assumption of contiguous watermarked segments: In practice, there may be multiple non-contiguous watermarked segments, and the current framework might require post-processing for segmentation.
  2. Threshold \(\zeta\) requires manual setting based on the watermarking scheme: The optimal thresholds vary across different watermarking schemes.
  3. Robustness under watermark attacks (e.g., paraphrasing) is not considered: In real-world scenarios, the watermarked text might be paraphrased.
  4. Evaluated only on 7B models: The effectiveness on watermarked texts generated by larger models (e.g., 70B, GPT-4 level) has not been verified.
  5. FPR control uses a conservative union bound: Tighter control methods might exist.
  6. Localization of multiple scattered watermarked segments is not discussed: Performance when there are multiple dispersed watermarked segments in a document is unaddressed.
  • Watermarking Schemes: KGW (Kirchenbauer et al., 2023), Unigram (Zhao et al., 2023), Gumbel (Aaronson, 2023, Kuditipudi et al., 2023).
  • Partial Watermark Detection: Aaronson's plausibility score (\(O(n^{3/2})\)), Christ et al.'s Substring Completeness (\(O(n^2)\)), WinMax (\(O(n^2)\)).
  • Online Learning: Strong adaptivity guarantees of Aligator (Baby et al., 2021).
  • Insights: The framework can be extended to (1) localization and detection of code watermarks; (2) integration with plagiarism detection systems; (3) fine-grained annotation of AI-generated content; (4) adversarial evaluation of watermark robustness.

Rating

  • Novelty: ⭐⭐⭐⭐⭐ — First to achieve efficient watermarked segment localization in mixed-source texts.
  • Technical Depth: ⭐⭐⭐⭐⭐ — Elegantly combines GC and online learning theory with solid theoretical guarantees.
  • Experimental Thoroughness: ⭐⭐⭐⭐ — Covers three watermarking schemes and two datasets, though the model size is limited.
  • Practical Value: ⭐⭐⭐⭐⭐ — Directly addresses the needs of plagiarism detection and AI-generated content regulation.
  • Writing Quality: ⭐⭐⭐⭐⭐ — Clear logic, exquisite illustrations, and excellent alignment between theory and experiments.
  • Overall Rating: 9.0/10