Skip to content

🔎 AIGC Detection

🧪 ICML2026 · 4 paper notes

📌 Same area in other venues: 💬 ACL2026 (8) · 📷 CVPR2026 (1) · 🔬 ICLR2026 (6) · 🤖 AAAI2026 (3) · 🧠 NeurIPS2025 (8)

Black-Box Detection of LLM-Generated Text Using Generalized Jensen-Shannon Divergence

SurpMark reformulates "AI text detection" as likelihood-free hypothesis testing: it computes token surprisal using a proxy LM, discretizes it into \(k\) states via k-means, estimates a first-order Markov transition matrix, and compares it with pre-built "human-written/machine-written" reference matrices using Generalized Jensen-Shannon Divergence (GJS). This approach provides black-box detection scores without retraining or per-instance resampling, requiring only a single forward pass.

DGS-Net: Distillation-Guided Gradient Surgery for CLIP Fine-Tuning in AI-Generated Image Detection

This paper addresses the issue of "catastrophic forgetting of transferable priors when fine-tuning CLIP for AI-generated image detection" by proposing DGS-Net: the gradient of the classification loss is decomposed by coordinate into harmful positive components \(g^+\) and beneficial negative components \(g^-\). The image gradient of the training network is first orthogonally projected onto the complement space of the harmful direction of the frozen CLIP text gradient (Orthogonal Suppression, removing task-irrelevant semantics), and then further aligned to the beneficial direction of the frozen CLIP image gradient (Prior Alignment, preserving pre-trained priors). As a result, the average detection accuracy across 50 generative models surpasses SOTA by 6.6%.

Feature-Augmented Transformers for Robust AI-Text Detection Across Domains and Generators

This paper systematically exposes the vulnerability of AI text detectors under cross-dataset/cross-generator shifts within a "single threshold fixed protocol" and proposes integrating learnable attention-weighted handcrafted linguistic features with transformer [CLS] representations. Using a DeBERTa-v3 backbone, the method achieves 85.9% balanced accuracy on the M4 multi-domain multi-generator benchmark, outperforming strong zero-shot baselines (Fast-DetectGPT, RADAR, Log-Rank) by up to +7.22.

PRPO: Paragraph-level Policy Optimization for Vision-Language Deepfake Detection

The authors introduce a 115k-sample DF-R5 dataset with reasoning annotations, replace CLIP ViT with ConvNeXT in the DX-LLaVA architecture, and propose PRPO—a paragraph-level GRPO variant. Each paragraph is rewarded based on CLIP text-image similarity (VCR) and majority-vote consistency between reasoning and conclusion (PCR). This approach boosts cross-domain deepfake detection F1 from SOTA 75.26% to 89.91%, and reasoning quality from 4.2/5 to 4.55/5.