👻 Hallucination Detection¶

🤖 AAAI2026 · 15 paper notes

📌 Same area in other venues: 📷 CVPR2026 (33) · 🔬 ICLR2026 (40) · 💬 ACL2026 (28) · 🧪 ICML2026 (21) · 🧠 NeurIPS2025 (17) · 📹 ICCV2025 (5)

🔥 Top topics: LLM ×4 · Agents ×2 · Multimodal/VLM ×2

Beyond Hallucinations: A Composite Score for Measuring Reliability in Open-Source Large Language Models: This paper proposes the Composite Reliability Score (CRS), which unifies calibration, robustness, and uncertainty quantification into a single interpretable metric. A systematic evaluation of 10 open-source LLMs across 5 QA datasets reveals that Mistral-8x22B achieves the highest overall reliability (CRS=0.81), and that model size does not directly determine reliability.
Bridging Day and Night: Target-Class Hallucination Suppression in Unpaired Image Translation: This paper is the first to systematically address the "target-class hallucination" problem in unpaired day-to-night image translation. By combining a dual-head discriminator (style head + SAM2 pseudo-label segmentation head) for hallucination detection and class-prototype contrastive learning for suppression, the method improves mAP from 15.08 to 17.40 (+15.5%) on BDD100K day-to-night domain adaptation detection, with traffic light AP improving by 31.7%.
Causally-Grounded Dual-Path Attention Intervention for Object Hallucination Mitigation in LVLMs: This paper proposes Owl, a framework that models visual and textual attention as mediating variables within a structural causal model, introduces the VTACR metric to quantify cross-modal attention imbalance, and designs VTACR-guided adaptive attention modulation combined with a dual-path contrastive decoding strategy, achieving state-of-the-art hallucination mitigation on POPE and CHAIR benchmarks.
Does Less Hallucination Mean Less Creativity? An Empirical Investigation in LLMs: This paper systematically investigates how three hallucination mitigation methods (CoVe, DoLa, RAG) affect LLM creativity, finding that they exert diametrically opposite effects on divergent creativity—CoVe enhances it, DoLa suppresses it, and RAG has no significant impact—while convergent creativity remains largely unaffected. These patterns hold consistently across model families and parameter scales.
ESG-Bench: Benchmarking Long-Context ESG Reports for Hallucination Mitigation: This paper constructs ESG-Bench — 270 manually annotated QA pairs from 94 real ESG reports (2020–2024) — and proposes a three-stage hallucination mitigation pipeline: SFT (with grounded answers + "Not Provided" abstention labels) → CoT Prompting (2/4-step prompt templates) → CoT Fine-tuning (with human-annotated reasoning chains). The 4-step CoT fine-tuned Llama-3 achieves 92.52% with-answer (WA) accuracy and 99.37% without-answer (WoA) accuracy (balanced 96%), with generalization gains on HaluEval and BioASQ.
Ground What You See: Hallucination-Resistant MLLMs via Caption Feedback, Diversity-Aware Sampling, and Conflict Regularization: This paper identifies three root causes of hallucination in RL-based MLLM training—visual misinterpretation, limited exploration diversity, and sample conflict—and addresses each with Caption Reward, reward-variance-guided sample selection, and NTK-similarity-based InfoNCE regularization, achieving significant hallucination reduction across multiple benchmarks.
Hallucinate Less by Thinking More: Aspect-Based Causal Abstention for Large Language Models: This paper proposes ABCA (Aspect-Based Causal Abstention), a pre-generation abstention framework that employs dual-agent debate to identify "aspect variables" (e.g., discipline, legal context, temporal frame) for activating distinct knowledge branches within LLMs. It applies the AIPW doubly robust estimator to compute causal effects and uses Centroid Angular Deviation (CAD) to detect knowledge conflicts (Type-1) or knowledge insufficiency (Type-2), achieving 91.4% accuracy on TruthfulQA and 96.4% unanswerable question identification rate—far surpassing the baseline of 44%.
Hallucination Stations: On Some Basic Limitations of Transformer-Based Language Models: This paper employs computational complexity theory to demonstrate that the per-step inference complexity of Transformer-based LLMs is \(O(N^2 \cdot d)\). Grounded in the Hartmanis–Stearns Time Hierarchy Theorem, it proves that any computational task exceeding this complexity bound—such as \(O(n^3)\) matrix multiplication, \(O(n^k)\) token enumeration, or TSP verification—necessarily causes hallucination. Furthermore, LLM agents are shown to be incapable of verifying the correctness of such tasks.
InEx: Hallucination Mitigation via Introspection and Cross-Modal Multi-Agent Collaboration: This paper proposes InEx, a framework that iteratively verifies and corrects MLLM outputs via internal introspective reasoning (TVER-driven uncertainty-aware visual augmentation) and external cross-modal multi-agent collaboration (textual self-reflection + image editing verification + visual self-reflection), achieving an 8.9% improvement on POPE and consistently outperforming OPERA/VCD/ICD across multiple hallucination and general benchmarks.
Listen Like a Teacher: Mitigating Whisper Hallucinations using Adaptive Layer Attention and Knowledge Distillation: A two-stage framework is proposed: Adaptive Layer Attention (ALA) fuses multi-layer representations from the Whisper encoder to enhance noise robustness, while Multi-Objective Knowledge Distillation (MOKD) aligns the semantic and attention distributions of a clean-speech teacher with a noisy-speech student — achieving significant reductions in hallucination rate and WER on multilingual noisy ASR benchmarks.
LLM-CAS: Dynamic Neuron Perturbation for Real-Time Hallucination Correction: LLM-CAS is the first work to formulate real-time LLM hallucination correction as a hierarchical reinforcement learning (HRL) problem. It trains an RL agent to dynamically select optimal neuron perturbation strategies at inference time — the high-level policy selects a functional network category, while the low-level policy selects perturbation type and magnitude. Combined with adaptive masking and causal tracing for precise neuron localization, LLM-CAS achieves a 10.98% improvement on StoryCloze, outperforming static and dynamic baselines such as ITI, CAA, and SADI.
MUG: Multi-agent Undercover Gaming — Hallucination Removal via Counterfactual Test for Multimodal Reasoning: MUG reframes Multi-Agent Debate (MAD) as a "Who's Undercover" social reasoning game — by introducing information asymmetry through counterfactual image editing (modifying the reference image), one agent is assigned the edited image \(I^-\) as the "undercover," while other agents hold the original image \(I^+\) and identify the undercover (i.e., the hallucination source) via reasoning and voting. On HallusionBench, Qwen2.5VL-7B improves from 46.4% to 53.8%.
PASE: Leveraging the Phonological Prior of WavLM for Low-Hallucination Generative Speech Enhancement: This paper proposes PASE, a framework that leverages robust phonological priors embedded in pretrained WavLM via Denoising Representation Distillation (DRD) to suppress linguistic hallucinations, while employing a dual-stream representation (high-level phonetic + low-level acoustic) to eliminate acoustic hallucinations, simultaneously achieving state-of-the-art performance in both perceptual quality and content fidelity.
Verb Mirage: Unveiling and Assessing Verb Concept Hallucinations in Multimodal Large Language Models: This paper presents the first systematic study of verb concept hallucinations in multimodal large language models (MLLMs), constructs a multi-dimensional benchmark, demonstrates that existing hallucination mitigation methods are ineffective against verb hallucinations, and proposes a fine-tuning baseline enriched with verb knowledge that significantly alleviates verb hallucinations.
When Hallucination Costs Millions: Benchmarking AI Agents in High-Stakes Adversarial Financial Markets: This paper introduces the CAIA benchmark, which leverages cryptocurrency markets as a natural adversarial laboratory to evaluate 17 state-of-the-art LLMs on agent capabilities in high-stakes adversarial environments. Results reveal that frontier models achieve only 67.4% accuracy (GPT-5) compared to a human baseline of 80%, and expose systematic tool selection failures.