🔎 AIGC Detection¶

🧪 ICML2026 · 11 paper notes

📌 Same area in other venues: 📷 CVPR2026 (10) · 🔬 ICLR2026 (30) · 💬 ACL2026 (17) · 🤖 AAAI2026 (2) · 🧠 NeurIPS2025 (9) · 💬 ACL2025 (15)

🔥 Top topics: LLM ×5 · Adversarial Robustness ×3 · Multimodal/VLM ×2

AutoBaxBuilder: Bootstrapping Code Security Benchmarking: AUTOBAXBUILDER utilizes an LLM agent pipeline to automatically generate web backend security evaluation scenarios, functional tests, and end-to-end security tests. It reduces the cost of manually constructing BAXBENCH-style tasks by approximately 12x and constructs AUTOBAXBENCH, comprising 40 new scenarios, to evaluate the gap between functional correctness and security in contemporary code models.
Black-Box Detection of LLM-Generated Text Using Generalized Jensen-Shannon Divergence: SurpMark reformulates "AI text detection" as a likelihood-free hypothesis test: it uses a proxy LM to calculate token surprisal, discretizes them into \(k\) states via k-means, estimates a first-order Markov transition matrix, and compares it with pre-built "human-written / machine-written" reference matrices using Generalized Jensen-Shannon Divergence (GJS). It provides black-box, zero-retraining, and zero-per-instance-resampling discriminant scores in a single forward pass.
CORE: Conflict-Oriented Reasoning for General Multimodal Manipulation Detection: This work redefines "multimodal fake news detection" as a task of "explicitly capturing conflicts between modalities or with world knowledge." The authors construct CAC, a corpus of 14k samples with fine-grained conflict annotations, and propose the CORE framework. CORE reshapes the conceptual boundaries of MLLMs through Conflict-Perception Training (CPT), enabling the model to significantly outperform dedicated SOTA methods on four datasets (DGM4, MDSM, MMFakeBench, NewsCLIPpings) using only 100–750 samples.
Deep Residual Injection for Full-Spectrum Forensic Signal Perception in Multimodal Large Language Models: This paper discovers that directly fine-tuning MLLMs to learn low-level artifacts left by generators damages their early-formed semantic representations (catastrophic forgetting). To address this, the authors propose Deep-VRM, which freezes the early and middle layers to preserve semantics while utilizing a LoRA-based bypass to "residually inject" artifact features into the deep layers of the LLM. This allows a single MLLM to achieve SOTA performance on most AIGI benchmarks without relying on any external expert detectors.
Dissect and Prune: Enhancing Robustness in AI-Generated Image Detection: Addressing the "prediction asymmetry" issue where existing AI-generated image (AIGI) detectors appear accurate but primarily classify images as real, this paper proposes DEAR. By using inpainting images as probes and "dissecting" the model based on the Regional Activation Discrepancy (RAD) between channel activations and generated areas, the method prunes extreme channels on both sides and retrains only the linear classification head. This forces the detector to discard fragile shortcut features, significantly enhancing robustness against unseen generators and post-processing.
Distributional Open-Ended Evaluation of LLM Cultural Value Alignment Based on Value Codebook: DOVE utilizes rate-distortion variational optimization to automatically construct a compact "Value Codebook" from 10,000 human texts. It then uses Unbalanced Optimal Transport (UOT) to measure distribution differences between human and LLM long-form texts in the value space, improving the "Evaluation-Downstream Task" correlation from \(\le 24\%\) in baselines to \(31.56\%\) across 12 LLMs.
Feature-Augmented Transformers for Robust AI-Text Detection Across Domains and Generators: This paper systematically exposes the vulnerability of AI text detectors under cross-dataset and cross-generator shifts using a "single-threshold fixed protocol." It proposes fusing hand-crafted linguistic features—weighted by learnable dynamic attention—with transformer [CLS] representations. Built on a DeBERTa-v3 backbone, the method achieves 85.9% balanced accuracy on the M4 multi-domain multi-generator benchmark, outperforming strong zero-shot baselines (Fast-DetectGPT, RADAR, Log-Rank) by up to +7.22.
ForensicConcept: Transferable Forensic Concepts for AIGI Detection: Addressing the issues where AI-Generated Image (AIGI) detectors are "highly accurate within the training distribution but fail on unseen generators" and remain entirely black-box, this paper explicitly extracts dispersed evidence relied upon by detectors into a "forensic concept codebook." It uses diffusion features (CleanDIFT) as external generative trace references and employs the neighborhood-structure consistency metric CKNNA to measure the geometric alignment between backbone evidence and diffusion traces. By injecting the diffusion codebook into a target backbone, cross-generator transfer is achieved; the average accuracy on GenImage reaches 92.0%, and higher CKNNA correlates with greater transfer gains.
Generating Robust Portfolios of Optimization Models using Large Language Models: This paper proposes a lightweight, training-free algorithm that utilizes a single LLM to act simultaneously as a "stochastic generator" and a "scoring evaluator." By packaging candidate optimization models into a portfolio until the cumulative generation probability reaches \(1-\alpha\), it theoretically proves that as long as either the generator or the evaluator aligns with human preferences, the portfolio will contain high-quality models. Experiments on NL4LP using GPT verify that the portfolio consistently outperforms random sampling even in the worst-case scenarios.
LLM Self-Recognition: Steering and Retrieving Activation Signatures: Instead of watermarking at the token level, this paper injects a random sparse steering vector into the LLM residual stream during generation, creating a detectable "activation signature." The signature is retrieved by re-feeding the text into the same model and calculating cosine similarity or using a lightweight classifier, achieving over 98% accuracy across multiple detection settings with negligible impact on text quality.
On the Salience of Low-Probability Tokens for AI-Generated Text Detection: A Multiscale Uncertainty Perspective: To address the chronic issues of "high-frequency boilerplate signal dilution" and "brittle point estimates" in zero-shot AI-generated text detection, the authors propose the Uncertainty / Uncertainty++ detectors. These detectors aggregate log-probs only on low-probability tokens at the bottom \(\rho\)-percentile of each text segment and overlay Rényi entropy from the same positions as a distribution shape signal. This approach improves the average AUROC from 86.49 (Lastde) to 88.74 across 12 generators and 7 datasets, demonstrating significantly greater stability under perturbations such as paraphrasing or modified decoding strategies.