MOSAIC: Multiple Observers Spotting AI Content¶
Conference: ACL 2025 (Findings)
arXiv: 2409.07615
Code: GitHub
Area: Other
Keywords: AI-generated text detection, ensemble LLM, information theory, Binoculars, zero-shot detection
TL;DR¶
Based on the universal compression principle in information theory, this paper proposes MOSAIC, an AI-generated text detection method that enensembles multiple LLMs. By using the Blahut-Arimoto algorithm to compute optimal combination weights for multiple detector LLMs, it constructs a mixture distribution as an observer. It determines whether text is AI-generated by comparing the actual surprisal of the text with the expected cross-entropy difference of the mixture model, robustly outperforming single-model and two-model (such as Binoculars) approaches across multiple domains, languages, and generators.
Background & Motivation¶
Background: The detection of LLM-generated text has become an urgent need (e.g., fake news, academic plagiarism, harmful content). Zero-shot detection methods (such as GPTZero, Binoculars) distinguish text based on the perplexity/surprisal of a detector LLM, but heavily rely on the choice of a single or fixed detector model pair.
Limitations of Prior Work: (a) The detection performance of a single detector fluctuates significantly across different domains, languages, and generators; (b) two-model methods such as Binoculars require searching for the optimal model pair, and different validation sets yield different optimal pairs; (c) as the number of models increases, enumerating all possible pair combinations suffers from exponential explosion.
Key Challenge: How to robustly detect outputs from multiple generators without relying on the selection of a specific detector?
Goal: To design a theoretically grounded multi-LLM ensemble detection method that automatically allocates optimal weights to each LLM.
Key Insight: Connect the detection problem to universal compression—the optimal multi-model mixture is the one that can best "compress" (i.e., describe with the lowest perplexity) human text. The Blahut-Arimoto algorithm is used to solve this information-theoretic optimization problem.
Core Idea: \(q^*(y_t|\mathbf{y}_{<t}) = \sum_{m \in \mathcal{M}} \mu^*(m|\mathbf{y}_{<t}) p_m(y_t|\mathbf{y}_{<t})\), where the BA algorithm is first used to find the optimal weights \(\mu^*\), and then the mixture model \(q^*\) is used to replace the single detector \(q\) in Binoculars.
Method¶
Overall Architecture¶
Input: Text to be detected \(\mathbf{y}\), a set of detector LLMs \(\mathcal{M} = \{m_1, ..., m_K\}\), and a reference model \(m^*\). Output: The MOSAIC score \(S_\mathcal{M}(\mathbf{y})\) (low = AI-generated, high = human-written).
Key Designs¶
-
Optimal Mixture Distribution (Blahut-Arimoto)
- Function: Compute position-adaptive optimal weights \(\mu^*(m|\mathbf{y}_{<t})\) for \(K\) detector LLMs.
- Mechanism: Alternately run the Blahut-Arimoto (BA) algorithm to maximize the mutual information \(\mathcal{I}(\mathbb{M}; Y_t|\mathbf{y}_{<t})\), ensuring that the mixture model \(q^*\) is the most discriminative model combination for the current context.
- Design Motivation: Derived from universal compression theory—the optimal mixture distribution is the one among all models that can encode (describe with the lowest perplexity) the observed text with the shortest length.
-
MOSAIC Scoring
- Function: Calculate the difference between surprisal and expected cross-entropy for each token.
- Formula: \(s_t(\mathbf{y}) = \mathcal{L}_{q^*}(y_t|\mathbf{y}_{<t}) - \sum_{y \in \Omega} p_{m^*}(y|\mathbf{y}_{<t}) \mathcal{L}_{q^*}(y|\mathbf{y}_{<t})\)
- Intuition: AI-generated text has low surprisal under the mixture model (because some LLM "predicted" it well), but the expected cross-entropy of the reference model is also low \(\rightarrow\) the difference is small \(\rightarrow\) detected as AI.
- Final Score: \(S_\mathcal{M}(\mathbf{y}) = \frac{1}{T}\sum_t s_t(\mathbf{y})\)
-
Reference Model Selection
- Function: Select the model with the lowest perplexity on human text from the ensemble as \(m^*\).
- Mechanism: \(m^* = \arg\min_{m \in \mathcal{M}} -\sum_t \log p_m(y_t|\mathbf{y}_{<t})\) on human text.
- Practice: Usually the largest LLM in the ensemble (such as Llama-3-70B).
-
Unified Perspective with Binoculars/FastDetectGPT
- Binoculars = Special case of MOSAIC (\(|\mathcal{M}|=1\), fixed \(q\)).
- FastDetectGPT = Differenced version of Binoculars (token-level normalization instead of whole-sentence averaging).
- MOSAIC generalizes this to an ensemble of an arbitrary number of detectors with adaptive weights.
Key Experimental Results¶
Multi-Domain Multi-Generator Detection (AUROC)¶
| Method | ChatGPT | GPT-4 | Llama | Mistral | Average |
|---|---|---|---|---|---|
| Log-likelihood | 0.82 | 0.78 | 0.85 | 0.83 | 0.82 |
| Binoculars (best pair) | 0.996 | 0.969 | 1.000 | 0.999 | 0.99 |
| Binoculars (fixed pair) | 0.94 | 0.91 | 0.97 | 0.95 | 0.94 |
| MOSAIC | 0.995 | 0.972 | 0.999 | 0.998 | 0.99 |
Ablation Study: Number of Ensemble Models¶
| Ensemble Size | Average AUROC |
|---|---|
| 1 Model | 0.91 |
| 2 Models | 0.95 |
| 4 Models | 0.98 |
| 8 Models (MOSAIC) | 0.99 |
Key Findings¶
- MOSAIC achieves performance close to Binoculars-best-pair without requiring an optimal pair search—significantly improving robustness.
- Binoculars-fixed-pair degrades severely: The AUROC of a fixed model pair drops to 0.91 on some generators, whereas MOSAIC stabilizes at 0.99.
- Incremental ensembling continuously improves: Adding each model monotonically enhances detection performance—more "observers" lead to greater robustness.
- Robust across domains and languages: Consistently effective across news, academic, and social media domains, as well as English, French, and German languages.
- Explainable BA algorithm weights: Weights automatically converge to the detector most relevant to the current text, providing helper "model selection" natively.
Highlights & Insights¶
- Elegant information-theoretic foundation: The theoretical pipeline flows from universal compression to the BA algorithm and then to token-level scoring—it is not a matter of "discovering ensembling works by trying," but rather "theoretically deriving how to ensemble."
- Validation-set-free automatic model selection: MOSAIC automatically finds the optimal detector combination for each text segment using the BA algorithm, without requiring a prior search on a validation set.
- Natural generalization from Binoculars to MOSAIC: Demonstrates a clear theoretically unified perspective from a single model to dual models to multi-models.
- Novel NLP application of Blahut-Arimoto: The BA algorithm, a classic in communications, is applied in a novel way to NLP.
Limitations & Future Work¶
- High computational cost: Requires computing logits for all detectors at each token and executing BA iterations.
- Requirement of open-weight models: Requires access to the logit distributions of the models, which is inapplicable to API-only models.
- Shared tokenizer assumption: Currently requires all detectors to share a tokenizer, limiting cross-family model ensembling.
- Robustness to sampling strategies: Text randomly sampled with high temperature may make the detector overly cautious.
- Lack of consideration for mixed text: Human-written and AI-edited mixed text is a more realistic scenario.
Related Work & Insights¶
- vs Binoculars (Hans et al., 2024): Binoculars uses a fixed model pair, while MOSAIC generalizes to an optimal multi-model ensemble—eliminating the fragility of model pair selection.
- vs FastDetectGPT (Bao et al., 2024): FastDetectGPT is also based on surprisal-crossentropy comparison but uses a single model; the multi-model version of MOSAIC is more robust.
- vs DetectGPT (Mitchell et al., 2023): Perturbation-based method, which has high computational cost and relies on perturbation quality; MOSAIC does not require perturbations.
Rating¶
- Novelty: ⭐⭐⭐⭐⭐ Elegant theory deriving multi-LLM ensemble detection from information-theoretic universal compression.
- Experimental Thoroughness: ⭐⭐⭐⭐⭐ Comprehensive analysis across multiple generators, domains, and languages, alongside thorough ablation and robustness validation.
- Writing Quality: ⭐⭐⭐⭐⭐ Rigorous theoretical derivation, clear representation in Algorithm 1, and an ingenious unified perspective with Binoculars.
- Value: ⭐⭐⭐⭐⭐ Addresses the key pain point of generator-agnostic detection robustness with strong practical applicability.