🔎 AIGC Detection¶
🔬 ICLR2026 · 30 paper notes
📌 Same area in other venues: 📷 CVPR2026 (7) · 💬 ACL2026 (17) · 🧪 ICML2026 (11) · 🤖 AAAI2026 (2) · 🧠 NeurIPS2025 (9)
🔥 Top topics: LLM ×5 · Adversarial Robustness ×4 · Watermarking ×2 · Diffusion Models ×2 · Reasoning ×2
- A Rich Knowledge Space for Scalable Deepfake Detection
-
This paper integrates 11 deepfake and real face sources into the MMI-DD dataset, scaling to 3.6 million images. It proposes SD2, which utilizes CLIP's hierarchical visual features, fine-grained textual forgery labels, and VLM-generated descriptions for joint training. This ensures that the deepfake detector gains stronger cross-domain and AIGC generalization capabilities on large-scale heterogeneous data instead of suffering from performance degradation.
- All Patches Matter, More Patches Better: Enhance AI-Generated Image Detection via Panoptic Patch Learning
-
This paper proposes the detection principles "All Patches Matter, More Patches Better," identifying that existing AI-generated image (AIGI) detectors suffer from a "Few-Patch Bias"—focusing only on a minimal set of patches. A Panoptic Patch Learning (PPL) framework is designed, using Randomized Patch Reconstruction (RPR) and Patch-wise Contrastive Learning (PCL) to spread discriminative power across all patches. This significantly improves cross-generator generalizability and robustness on GenImage, DRCT-2M, AIGCDetectBenchmark, and real-world Chameleon datasets (e.g., CLIP backbone achieves 97.2% mAcc on GenImage with a std of only 1.7).
- Attack-Resistant Watermarking for AIGC Image Forensics via Diffusion-based Semantic Deflection
-
Ours proposes PAI—a training-free, plug-and-play inherent watermarking framework for diffusion models. By combining "initialization embedding" and "key-guided denoising trajectory deflection," user identity is deeply entangled with image semantics. The "initialization bias" obtained via DDIM inversion serves as a unified forensic signal for copyright verification, attack detection, and semantic-level tamper localization. PAI achieves an average verification accuracy of 98.43% under 12 types of attacks, outperforming SOTA by 37.25%.
- Beyond Raw Detection Scores: Markov-Informed Calibration for Boosting Machine-Generated Text Detection
-
This paper argues that token-level scores of mainstream "metric-based" machine-generated text (MGT) detectors are contaminated by the randomness of LLM sampling. It utilizes Markov Random Fields (MRF) to characterize two patterns: "neighbor similarity" and "initial instability." Through mean-field approximation, this is implemented as a lightweight iterative component with only 2x2 parameters that can be layered onto any existing detector. It significantly boosts the AUROC of various baseline detectors with almost no additional overhead (e.g., increasing DetectGPT's AUROC on the Essay dataset from 44% to 92%).
- Calibrating Verbalized Confidence with Self-Generated Distractors
-
The DiNCo method is proposed to expose "suggestibility bias" by having LLMs independently evaluate automatically generated distractors (plausible but incorrect alternative answers). By normalizing with the total confidence across distractors and fusing two complementary dimensions—generation consistency and verification consistency—it significantly improves confidence calibration in short-form QA and long-form generation tasks.
- CLARC: C/C++ Benchmark for Robust Code Search
-
Constructs CLARC, the first compilable C/C++ code retrieval benchmark (6,717 query-code pairs), using an automated pipeline to extract code from GitHub and generate/validate queries via LLM with hypothesis testing. It covers four retrieval scenarios—Standard, Anonymized, Assembly, and WebAssembly—revealing that existing code embedding models rely excessively on lexical features (NDCG@10 drops from 0.89 to 0.67 after anonymization) and significantly underperform in binary-level retrieval.
- Data Provenance for Image Auto-Regressive Generation
-
Without altering the generation process or requiring watermarks, this paper leverages the "features left by Image Autoregressive (IAR) models in the codebook quantization space." By utilizing a trained inverse decoder and two complementary signals—QuantLoss and EncLoss—it achieves nearly 100% TPR@1%FPR for post-hoc provenance detection across mainstream IAR models including VAR, RAR, LlamaGen, and Infinity.
- Death of the Novel(ty): Beyond n-Gram Novelty as a Metric for Textual Creativity
-
Through close reading annotations of 8,618 expressions by 26 professional writers, this study reveals that n-gram novelty is insufficient to measure textual creativity—approximately 91% of expressions with high n-gram novelty are not considered creative, and a negative correlation exists between high n-gram novelty and low pragmaticality in open-source LLMs.
- DMAP: A Distribution Map for Text
-
Ours proposes DMAP (Distribution Map), a mathematical framework that maps text to \(i.i.d.\) samples in the range \([0,1]\) via next-token probability ranking of language models. It theoretically proves that pure sampling produces a uniform distribution, enabling the use of \(\chi^2\) tests to verify generation parameters, uncovering the root cause of why "probability curvature" detectors fail under pure sampling, and visualizing statistical fingerprints left by post-training (SFT/RLHF) in downstream models.
- D&R: Recovery-based AI-Generated Text Detection via a Single Black-box LLM Call
-
D&R randomly shuffles the text to be tested within local chunks separated by punctuation (Within-Chunk Shuffling) and calls a black-box LLM only once to recover it. It then measures the semantic and structural similarity between the recovered text and the original. AI-generated text is more likely to be "recovered" almost identically, while human-written text remains more dispersed. Feeding this similarity gap into a lightweight classifier enables detection, achieving an AUROC of 0.96 for long texts and 0.87 for short texts, without requiring probability access and using only a single call.
- EditLens: Quantifying the Extent of AI Editing in Text
-
EditLens moves beyond binary "Human vs. AI" classification by using lightweight similarity metrics (cosine distance, soft n-grams) as intermediate supervision to fine-tune a regression model. It continuously predicts "how much the text was edited by AI," achieving SOTA performance in both binary (F1=95.6%) and ternary (macro-F1=90.4%) classification tasks.
- Enabling Your Forensic Detector Know How Well It Performs on Distorted Samples
-
DACOM (Distortion-Aware Confidence Model) is proposed to enable AI-generated image detectors to output sample-level reliability scores. This allows detectors to actively refuse decisions or route inputs to more reliable detectors when distortions are severe, addressing the "silent failure" problem in wild deployments.
- Exploring Specular Reflection Inconsistency for Generalizable Face Forgery Detection
-
Starting from the physical principles of facial imaging, this paper notes that the "specular reflection" component in the Phong illumination model possesses the most parameters, the strongest nonlinearity, and is the hardest to replicate by forgery methods. Consequently, it employs Retinex texture estimation to accurately isolate specular reflection and uses a two-stage cross-attention network, SRI-Net, to model the inconsistencies among "specular reflection \(\leftrightarrow\) texture \(\leftrightarrow\) direct light." This approach achieves SOTA results on both traditional deepfakes and diffusion-generated faces.
- FakeXplain: AI-Generated Image Detection via Human-Aligned Grounded Reasoning
-
By constructing the FakeXplained dataset with human-annotated bounding boxes and descriptions and fine-tuning an MLLM using SFT + progressive GRPO, the model detects AI-generated images while providing spatially grounded, human-aligned explanations of "where and why" it is fake, achieving 98.2% detection accuracy and 36.0% IoU.
- HLD: Approximate Hierarchical Linguistic Distribution Modeling for LLM-Generated Text Detection
-
HLD uses n-grams to estimate the distributions of Human-Written Text (HWT) and Machine-Generated Text (MGT) across three linguistic levels: lexical, syntactic, and semantic. By feeding Bayesian log-likelihood ratios of these hierarchical differences into XGBoost for classification, the method avoids reliance on proxy LLMs to approximate the token distributions of black-box source models. It proves more robust than single-level methods and achieves SOTA results on the DetectRL benchmark.
- HSIC Bottleneck for Cross-Generator and Domain-Incremental Synthetic Image Detection
-
To address the challenges of synthetic image detectors failing to generalize across generators and the need to continuously expand with new generation paradigms, this paper introduces an HSIC Information Bottleneck loss on intermediate CLIP ViT features to suppress "authentication-irrelevant" vision-language alignment semantics. Combined with an HSIC-Guided Rehearsal sampling strategy (HGR), it achieves mutual transfer between diffusion↔GAN while incrementally adapting to 3DGS rendered faces.
- Is Your Paper Being Reviewed by an LLM? Benchmarking AI Text Detection in Peer Review
-
Constructed the largest AI-generated peer review dataset to date (788,984 reviews), systematically evaluated the performance of 18 AI text detection methods in peer review scenarios, and proposed the Anchor detection method leveraging original papers as context, significantly outperforming all baselines at low False Positive Rates.
- Learn-to-Distance: Distance Learning for Detecting LLM-Generated Text
-
This paper explains the effectiveness of "rewrite-based" LLM text detection methods from a geometric projection perspective and proposes L2D. Instead of using fixed distances to measure the difference between original and rewritten text, L2D adaptively learns a distance function, achieving an average improvement of 41.5%~75.4% over the strongest baselines across 100+ settings.
- Learning From Dictionary: Enhancing Robustness of Machine-Generated Text Detection in Zero-Shot Language via Adversarial Training
-
To address the sharp drop in robustness of Machine-Generated Text (MGT) detectors on unseen languages, this paper proposes the TASTE framework: it uses translation dictionaries to perform "code-switching" on MGTs to generate multilingual adversarial samples. Combined with a gradient-reversal language discriminator (LAAL loss), it forces the detector to learn language-agnostic features. Using only single-language annotations and translation dictionaries, it improves the average F1 on zero-shot languages to 0.773 and suppresses the average Attack Success Rate to 18.0%.
- No Pixel Left Behind: A Detail-Preserving Architecture for Robust High-Resolution AI-Generated Image Detection
-
The authors propose HiDA-Net: a dual-path input architecture using "global thumbnail + full-resolution patches covering the whole image," combined with feature aggregation, token-level forgery localization, and JPEG quality factor estimation. It achieves "no pixel left behind" and significantly advances the SOTA in high-resolution AIGI detection.
- Omni-IML: Towards Unified Interpretable Image Manipulation Localization
-
This paper proposes Omni-IML—the first universal model capable of achieving SOTA performance across four major Image Manipulation Localization (IML) tasks (natural images, documents, faces, and scene text) using a single model. It addresses the performance degradation in joint training via three sample-adaptive modules: the Modal Gated Encoder, Dynamic Weight Decoder, and Anomaly Enhancement. Additionally, it constructs the Omni-273k dataset and an interpretability module to provide natural language descriptions of manipulation traces.
- PoliCon: Evaluating LLMs on Achieving Diverse Political Consensus Objectives
-
The PoliCon benchmark was constructed based on 2,225 high-quality deliberation records from the European Parliament (2009-2022) to evaluate the ability of LLMs to draft consensus resolutions under diverse voting mechanisms, power structures, and political objectives. Results indicate that frontier models perform reasonably well on simple majority tasks but fall significantly short on 2/3 majority and security-related issues.
- Preserving Forgery Artifacts: AI-Generated Video Detection at Native Scale
-
Addressing the issue where existing AI-generated video detectors destroy critical forgery artifacts by scaling or cropping input frames to a fixed low resolution (e.g., 224×224), this paper proposes a "native-scale" detection framework. Based on Qwen2.5-VL, the visual Transformer directly processes videos at arbitrary original resolutions and durations. The work also constructs a 140k training set covering 15 generators and a high-fidelity benchmark, Magic Videos, achieving new SOTA results across multiple benchmarks.
- RelayFormer: A Unified Local-Global Attention Framework for Scalable Image and Video Manipulation Localization
-
RelayFormer partitions images/videos of arbitrary resolutions into fixed-size sub-images and utilizes a small set of [GLR] relay tokens to propagate scene-level global consistency cues across sub-images. Without interpolation or dense full-resolution attention, this unified architecture achieves SOTA performance on both image and video manipulation localization benchmarks, with FLOPs that scale dynamically with the input.
- Semantic Visual Anomaly Detection and Reasoning in AI-Generated Images
-
Focusing on "looks real but defies logic" semantic anomalies in AI-generated images (violating physics, common sense, or anatomy), this paper formalizes the task of "detection + explanation + scoring." By utilizing the multi-agent pipeline AnomAgent and lightweight human verification, the authors curate the AnomReason benchmark with 21.5K images and hundreds of thousands of structured quadruplet annotations. They propose semantic matching metrics SemAP/SemF1; the fine-tuned AnomReasonor-7B outperforms all open-source VLMs in semantic detection, approaching the performance of GPT-4o.
- Spherical Watermark: Encryption-Free, Lossless Watermarking for Diffusion Models
-
This paper proposes Spherical Watermark: an encryption-free, lossless watermarking framework for diffusion models. It mixes binary watermarks into high-entropy codes, which are then precisely transformed into standard Gaussian noise via "projection to the unit sphere → orthogonal rotation → Chi-squared radius scaling" to serve as the initial noise. This method requires no weight modification or per-image key storage, outperforming both lossy and lossless baselines in fidelity, provenance accuracy, computational efficiency, and robustness against attacks.
- Tell me Habibi, is it Real or Fake?
-
This paper introduces ArEnAV, the first large-scale audio-visual deepfake dataset targeting "Arabic-English intra-sentential code-switching (CSW)" (387k videos, 765+ hours). Utilizing an integrated generation pipeline with 4 TTS paths and 2 lip-sync models, the authors perform "content-driven" semantic manipulation of real YouTube videos. They systematically demonstrate that existing SOTA detection/localization models and human evaluators almost entirely fail in these multilingual, code-switching scenarios.
- TSM-Bench: Detecting LLM-Generated Text in Real-World Wikipedia Editing Practices
-
The authors argue that existing Machine-Generated Text (MGT) detection benchmarks rely on free-form prompts like "write an article about machine learning." In contrast, real Wikipedia editing involves constrained task-specific generation such as summarization, continuation, and neutralization. Such texts are more similar to human-written text. The authors constructed TSM-Bench, covering 3 languages, 4 tasks, 6 generators, and 12 detectors with 152,910 parallel texts. Results show that SOTA detectors experience a 10–40% accuracy drop on task-specific data compared to generic data, revealing a "generalization asymmetry" where task-specific data generalizes to generic data, but not vice-versa.
- Untraceable DeepFakes via Traceable Fingerprint Elimination
-
This paper points out that existing attacks to evade attribution are "additive"—they only obscure but cannot eliminate the model fingerprints left by generative models in images, making them vulnerable to adversarial training. The authors propose a "multiplicative attack" that uses an adversarial network trained solely on real data to eliminate fingerprints at the source. It achieves an average Attack Success Rate (ASR) of 97.08% across 12 generative models and 6 attribution models, exceeding 72.39% even when facing defenses.
- Unveiling Perceptual Artifacts: A Fine-Grained Benchmark for Interpretable AI-Generated Image Detection
-
Aiming at the issue that existing AI-generated image (AIGI) detectors only output binary "real/fake" labels without providing a basis, this paper constructs X-AIGD, a benchmark of paired real-fake images with pixel-level annotations across three levels and seven categories of artifacts. It systematically diagnoses that current detectors "hardly look at perceptual artifacts" and proposes a training method to explicitly align classification attention with artifact regions, resulting in significant gains in cross-dataset generalization.