Skip to content

MOSAIC: Multiple Observers Spotting AI Content

Conference: ACL 2025 (Findings)
arXiv: 2409.07615
Code: GitHub
Area: Other
Keywords: AI-generated text detection, ensemble LLM, information theory, Binoculars, zero-shot detection

TL;DR

Based on the universal compression principle in information theory, this paper proposes MOSAIC, an AI-generated text detection method that enensembles multiple LLMs. By using the Blahut-Arimoto algorithm to compute optimal combination weights for multiple detector LLMs, it constructs a mixture distribution as an observer. It determines whether text is AI-generated by comparing the actual surprisal of the text with the expected cross-entropy difference of the mixture model, robustly outperforming single-model and two-model (such as Binoculars) approaches across multiple domains, languages, and generators.

Background & Motivation

Background: The detection of LLM-generated text has become an urgent need (e.g., fake news, academic plagiarism, harmful content). Zero-shot detection methods (such as GPTZero, Binoculars) distinguish text based on the perplexity/surprisal of a detector LLM, but heavily rely on the choice of a single or fixed detector model pair.

Limitations of Prior Work: (a) The detection performance of a single detector fluctuates significantly across different domains, languages, and generators; (b) two-model methods such as Binoculars require searching for the optimal model pair, and different validation sets yield different optimal pairs; (c) as the number of models increases, enumerating all possible pair combinations suffers from exponential explosion.

Key Challenge: How to robustly detect outputs from multiple generators without relying on the selection of a specific detector?

Goal: To design a theoretically grounded multi-LLM ensemble detection method that automatically allocates optimal weights to each LLM.

Key Insight: Connect the detection problem to universal compression—the optimal multi-model mixture is the one that can best "compress" (i.e., describe with the lowest perplexity) human text. The Blahut-Arimoto algorithm is used to solve this information-theoretic optimization problem.

Core Idea: \(q^*(y_t|\mathbf{y}_{<t}) = \sum_{m \in \mathcal{M}} \mu^*(m|\mathbf{y}_{<t}) p_m(y_t|\mathbf{y}_{<t})\), where the BA algorithm is first used to find the optimal weights \(\mu^*\), and then the mixture model \(q^*\) is used to replace the single detector \(q\) in Binoculars.

Method

Overall Architecture

Input: Text to be detected \(\mathbf{y}\), a set of detector LLMs \(\mathcal{M} = \{m_1, ..., m_K\}\), and a reference model \(m^*\). Output: The MOSAIC score \(S_\mathcal{M}(\mathbf{y})\) (low = AI-generated, high = human-written).

Key Designs

  1. Optimal Mixture Distribution (Blahut-Arimoto)

    • Function: Compute position-adaptive optimal weights \(\mu^*(m|\mathbf{y}_{<t})\) for \(K\) detector LLMs.
    • Mechanism: Alternately run the Blahut-Arimoto (BA) algorithm to maximize the mutual information \(\mathcal{I}(\mathbb{M}; Y_t|\mathbf{y}_{<t})\), ensuring that the mixture model \(q^*\) is the most discriminative model combination for the current context.
    • Design Motivation: Derived from universal compression theory—the optimal mixture distribution is the one among all models that can encode (describe with the lowest perplexity) the observed text with the shortest length.
  2. MOSAIC Scoring

    • Function: Calculate the difference between surprisal and expected cross-entropy for each token.
    • Formula: \(s_t(\mathbf{y}) = \mathcal{L}_{q^*}(y_t|\mathbf{y}_{<t}) - \sum_{y \in \Omega} p_{m^*}(y|\mathbf{y}_{<t}) \mathcal{L}_{q^*}(y|\mathbf{y}_{<t})\)
    • Intuition: AI-generated text has low surprisal under the mixture model (because some LLM "predicted" it well), but the expected cross-entropy of the reference model is also low \(\rightarrow\) the difference is small \(\rightarrow\) detected as AI.
    • Final Score: \(S_\mathcal{M}(\mathbf{y}) = \frac{1}{T}\sum_t s_t(\mathbf{y})\)
  3. Reference Model Selection

    • Function: Select the model with the lowest perplexity on human text from the ensemble as \(m^*\).
    • Mechanism: \(m^* = \arg\min_{m \in \mathcal{M}} -\sum_t \log p_m(y_t|\mathbf{y}_{<t})\) on human text.
    • Practice: Usually the largest LLM in the ensemble (such as Llama-3-70B).
  4. Unified Perspective with Binoculars/FastDetectGPT

    • Binoculars = Special case of MOSAIC (\(|\mathcal{M}|=1\), fixed \(q\)).
    • FastDetectGPT = Differenced version of Binoculars (token-level normalization instead of whole-sentence averaging).
    • MOSAIC generalizes this to an ensemble of an arbitrary number of detectors with adaptive weights.

Key Experimental Results

Multi-Domain Multi-Generator Detection (AUROC)

Method ChatGPT GPT-4 Llama Mistral Average
Log-likelihood 0.82 0.78 0.85 0.83 0.82
Binoculars (best pair) 0.996 0.969 1.000 0.999 0.99
Binoculars (fixed pair) 0.94 0.91 0.97 0.95 0.94
MOSAIC 0.995 0.972 0.999 0.998 0.99

Ablation Study: Number of Ensemble Models

Ensemble Size Average AUROC
1 Model 0.91
2 Models 0.95
4 Models 0.98
8 Models (MOSAIC) 0.99

Key Findings

  • MOSAIC achieves performance close to Binoculars-best-pair without requiring an optimal pair search—significantly improving robustness.
  • Binoculars-fixed-pair degrades severely: The AUROC of a fixed model pair drops to 0.91 on some generators, whereas MOSAIC stabilizes at 0.99.
  • Incremental ensembling continuously improves: Adding each model monotonically enhances detection performance—more "observers" lead to greater robustness.
  • Robust across domains and languages: Consistently effective across news, academic, and social media domains, as well as English, French, and German languages.
  • Explainable BA algorithm weights: Weights automatically converge to the detector most relevant to the current text, providing helper "model selection" natively.

Highlights & Insights

  • Elegant information-theoretic foundation: The theoretical pipeline flows from universal compression to the BA algorithm and then to token-level scoring—it is not a matter of "discovering ensembling works by trying," but rather "theoretically deriving how to ensemble."
  • Validation-set-free automatic model selection: MOSAIC automatically finds the optimal detector combination for each text segment using the BA algorithm, without requiring a prior search on a validation set.
  • Natural generalization from Binoculars to MOSAIC: Demonstrates a clear theoretically unified perspective from a single model to dual models to multi-models.
  • Novel NLP application of Blahut-Arimoto: The BA algorithm, a classic in communications, is applied in a novel way to NLP.

Limitations & Future Work

  • High computational cost: Requires computing logits for all detectors at each token and executing BA iterations.
  • Requirement of open-weight models: Requires access to the logit distributions of the models, which is inapplicable to API-only models.
  • Shared tokenizer assumption: Currently requires all detectors to share a tokenizer, limiting cross-family model ensembling.
  • Robustness to sampling strategies: Text randomly sampled with high temperature may make the detector overly cautious.
  • Lack of consideration for mixed text: Human-written and AI-edited mixed text is a more realistic scenario.
  • vs Binoculars (Hans et al., 2024): Binoculars uses a fixed model pair, while MOSAIC generalizes to an optimal multi-model ensemble—eliminating the fragility of model pair selection.
  • vs FastDetectGPT (Bao et al., 2024): FastDetectGPT is also based on surprisal-crossentropy comparison but uses a single model; the multi-model version of MOSAIC is more robust.
  • vs DetectGPT (Mitchell et al., 2023): Perturbation-based method, which has high computational cost and relies on perturbation quality; MOSAIC does not require perturbations.

Rating

  • Novelty: ⭐⭐⭐⭐⭐ Elegant theory deriving multi-LLM ensemble detection from information-theoretic universal compression.
  • Experimental Thoroughness: ⭐⭐⭐⭐⭐ Comprehensive analysis across multiple generators, domains, and languages, alongside thorough ablation and robustness validation.
  • Writing Quality: ⭐⭐⭐⭐⭐ Rigorous theoretical derivation, clear representation in Algorithm 1, and an ingenious unified perspective with Binoculars.
  • Value: ⭐⭐⭐⭐⭐ Addresses the key pain point of generator-agnostic detection robustness with strong practical applicability.