ACL2026 Hallucination Detection AI paper notes paper summaries LLM Multimodal/VLM Speech & Audio RAG Reasoning

👻 Hallucination Detection¶

💬 ACL2026 · 28 paper notes

📌 Same area in other venues: 📷 CVPR2026 (33) · 🔬 ICLR2026 (40) · 🧪 ICML2026 (21) · 🤖 AAAI2026 (15) · 🧠 NeurIPS2025 (17) · 📹 ICCV2025 (5)

🔥 Top topics: LLM ×7 · Multimodal/VLM ×4 · Speech & Audio ×3 · RAG ×3 · Reasoning ×2

Aligning with Your Own Voice: Self-Corrected Preference Learning for Hallucination Mitigation in LVLMs: The AVES-DPO framework is proposed. It utilizes consensus-based multi-model verification (YOLO, Grounding DINO, and Qwen3-VL) to detect fine-grained hallucinations (object, attribute, and relation) in responses generated by the LVLM itself. The target LVLM then performs self-correction and detail enrichment, creating preference pairs naturally within the model's "in-distribution." With only 5.2K samples, it outperforms SOTA methods relying on GPT-4V teachers across multiple hallucination benchmarks (achieving ~25× data efficiency).
Benchmarking Deflection and Hallucination in Large Vision-Language Models: This paper proposes VLM-DeflectionBench, a multimodal benchmark with 2775 samples that systematically evaluates the deflection vs. hallucination behaviors of Large Vision-Language Models (LVLMs) when evidence is insufficient or misleading across four evaluation scenarios (Parametric/Oracle/Realistic/Adversarial). Experiments covering 20 SOTA LVLMs reveal that nearly all models fail to reliably deflect under noisy evidence.
Detecting Hallucinations in SpeechLLMs at Inference Time Using Attention Maps: Ours proposes four audio-attention-based metrics (AudioRatio, AudioConsistency, AudioEntropy, TextEntropy) to train a lightweight logistic regression classifier for detecting hallucinations in SpeechLLMs during inference, achieving a PR-AUC improvement of up to +0.23 on in-domain data.
Dialectic-Med: Mitigating Diagnostic Hallucinations via Counterfactual Adversarial Multi-Agent Debate: Ours proposes Dialectic-Med, a multi-agent medical diagnostic framework inspired by Popper’s falsificationism. Through adversarial dialectical reasoning among a Proposer (diagnostic hypothesis), an Opponent (Visual Falsification Module actively retrieving contradictory evidence), and a Mediator (weighted consensus graph decision), it achieves SOTA on MIMIC-CXR-VQA, VQA-RAD, and PathVQA, improving explanation faithfulness by 12.5% and significantly mitigating diagnostic hallucinations.
Distorted or Fabricated? A Survey on Hallucination in Video LLMs: This paper provides the first systematic classification of hallucinations in Video Large Language Models (Vid-LLMs), proposing a mechanism-driven taxonomy comprising "Dynamic Distortion" (errors in spatiotemporal relations and reference consistency) and "Content Fabrication" (driven by statistical priors and audio-visual conflicts), while surveying evaluation benchmarks, mitigation strategies, and root causes.
Enhancing Hallucination Detection via Future Context: This paper proposes utilizing sampled "future context" (subsequent sentences) to enhance hallucination detection in black-box scenarios. By leveraging the "snowball effect"—where hallucinations tend to propagate once they occur—the method consistently improves performance across various sampling-based approaches such as SelfCheckGPT and SC.
FaithLens: Detecting and Explaining Faithfulness Hallucination: This paper proposes FaithLens, an 8B parameter faithfulness hallucination detection model. It undergoes cold-start SFT using high-quality data synthesis combined with three-dimensional filtering (label correctness, explanation quality, and data diversity), followed by further optimization via rule-based reinforcement learning (prediction correctness reward + explanation quality reward). It surpasses GPT-5.2 and o3 across 12 tasks while providing high-quality explanatory outputs.
FinGround: Detecting and Grounding Financial Hallucinations via Atomic Claim Verification: FinGround is a three-stage "verify-then-ground" pipeline for financial document QA: (1) finance-aware hybrid retrieval; (2) decomposing answers into atomic claims and verifying them using a type-routed strategy across a six-category taxonomy (Numerical, Temporal, Entity Property, Comparative, Regulatory, Computational—where computational claims use formula reconstruction and arithmetic re-verification); (3) grounded rewriting of unsupported claims with paragraph/cell-level citations. By distilling GPT-4o into an 8B detector, it achieves a 91.4% F1 score with 18× acceleration, reducing the hallucination rate by 78% compared to GPT-4o+CoT.
Generating Effective CoT Traces for Mitigating Causal Hallucination: This paper proposes the Causal Hallucination Rate (CHR) metric to quantify the tendency of small LLMs to over-predict causal relationships in Event Causality Identification. Through systematic experiments, two key criteria for effective CoT data are identified (sufficient semantic explanation length + distribution alignment with the target model). A low-cost CoT data generation pipeline is designed, reducing the CHR of Qwen2.5-1.5B from 83.54% to 6.26% while improving average accuracy to 66.00%.
HalluAudio: A Comprehensive Benchmark for Hallucination Detection in Large Audio-Language Models: This paper introduces HalluAudio, the first large-scale cross-domain (speech/ambient/music) benchmark for audio hallucination detection. It features 5,000+ human-verified QA pairs and systematic adversarial prompt designs. By evaluating mainstream LALMs using multi-dimensional metrics (Accuracy, Hallucination Rate, Yes-No Bias, Refusal Rate, and Error Types), the study reveals significant deficiencies in current models regarding acoustic anchoring, temporal reasoning, and music attribute understanding.
Hallucination Detection in LLMs with Topological Divergence on Attention Graphs: TOHA treats the LLM attention matrix as a weighted graph, utilizes Manifold Topology Divergence from topological data analysis (TDA) to measure the "topological novelty of the response subgraph relative to the prompt subgraph," and discovers "hallucination-aware heads" that are stable across datasets. Averaging only 10 such heads achieves a training-free solution in RAG scenarios that is 70× faster than SelfCheckGPT with significantly leading ROC-AUC.
Logical Consistency as a Bridge: Improving LLM Hallucination Detection via Label Constraint Modeling between Responses and Self-Judgments: This work treats an LLM's self-judgment ("Does it think its previous answer was correct?") as another potentially hallucinated generation. It first trains a "meta-judgment detector" using intrinsic features to estimate self-judgment credibility. Then, by applying the logical rule "if self-judgment says True → labels are identical" and "if False → labels are opposite," the response detector and meta-judgment detector are jointly trained via Huber loss and confidence-weighted mutual learning. At inference time, only the response detector is used, absorbing knowledge from self-judgment with zero additional inference cost while achieving dual-perspective gains.
Lost in Diffusion: Uncovering Hallucination Patterns and Failure Modes in Diffusion Large Language Models: This study presents the first systematic comparison of hallucination patterns between diffusion Large Language Models (dLLMs) and their autoregressive (AR) counterparts. It reveals that current dLLMs exhibit a higher propensity for hallucination and identifies three diffusion-specific failure modes: Premature Termination, Incomplete Denoising, and Contextual Intrusion.
MeasHalu: Mitigation of Scientific Measurement Hallucinations for LLMs: This paper proposes the MeasHalu framework, which mitigates LLM hallucinations in scientific measurement extraction through a fine-grained measurement hallucination taxonomy and a two-stage optimization (reasoning-aware SFT + hallucination-targeted GRPO reward), significantly outperforming baselines on MeasEval.
Mechanisms of Prompt-Induced Hallucination in Vision–Language Models: In controlled object counting tasks, prompt-induced hallucination (PIH)—where the model follows the prompt rather than the image—is localized to 3–10 attention heads in the early layers (\(L0-1\)) of LLaVA-OneVision, Qwen-VL, and Janus-Pro. Mean ablation of these heads, requiring no retraining, reduces prompt-following from 42–64% to <11% and restores true counting rates to 70–78%, while zero-shot transferring to color identification (PIH suppression of 40–95%).
Mitigating Hallucinations in Large Vision-Language Models without Performance Degradation: This paper proposes the MPD framework, which decouples hallucination components through semantic-aware orthogonal subspace projection and selectively updates a small number of parameters most relevant to hallucinations. While reducing hallucinations by 23.4%, it maintains 97.4% of general generation capability without introducing additional inference overhead.
MultiHaluDet: Multilingual Hallucination Detection via LLM Hidden State Probing: MultiHaluDet utilizes the full-layer hidden state trajectories of frozen LLMs for multi-scale sequence modeling, subsequently discriminating hallucinations through out-of-fold representations and an ensemble meta-learner. It achieves approximately 98% AUROC on HaluEval / TriviaQA and transfers effectively to French, Bangla, and Amharic.
Parametric Knowledge is Not All You Need: Toward Honest Large Language Models via Retrieval of Pretraining Data: The authors argue that existing "LLM Honesty" benchmarks fail to consider what knowledge the model actually encountered during pretraining. By utilizing Pythia, whose training data is fully public, they define knowledge boundaries based on "retrievability of answers in pretraining data" to construct the more reliable TIP-TriviaQA benchmark. They further propose RETAIN, a tri-agent method that retrieves the model’s own pretraining corpus to decide whether to answer or refuse, improving honest EM-F1 from a baseline high of ~40 to 58.57.
Rethinking Evaluation for LLM Hallucination Detection: A Desiderata, A New RAG-based Benchmark, New Insights: This paper redefines seven requirements for RAG hallucination detection benchmarks and constructs Trivia++, a new benchmark featuring long contexts, multi-round human annotations, and realistic noisy labels. The study finds that existing detectors perform significantly below ideal levels on organic RAG hallucinations.
Spotlight and Shadow: Attention-Guided Dual-Anchor Introspective Decoding for MLLM Hallucination Mitigation: Ours proposes DaID (Dual-Anchor Introspective Decoding), which mines visual perception differences across MLLM internal layers—amplifying visual signals via the "Spotlight" layer and suppressing linguistic inertia via the "Shadow" layer—to achieve hallucination mitigation within a single forward pass.
Stable-RAG: Mitigating Retrieval-Permutation-Induced Hallucinations in Retrieval-Augmented Generation: This work reveals the high sensitivity of RAG systems to the permutation order of retrieved documents and proposes Stable-RAG. By applying spectral clustering to hidden states generated by document permutations to identify dominant reasoning patterns, and subsequently employing DPO alignment to guide hallucinatory outputs toward correct answers, Stable-RAG achieves dual improvements in accuracy and reasoning consistency across three QA datasets.
The Reasoning Trap: How Enhancing LLM Reasoning Amplifies Tool Hallucination: This work systematically reveals the "Reasoning Trap" paradox: enhancing LLM reasoning capabilities (whether through RL, distillation, or switchable reasoning modes) systematically amplifies tool hallucinations. This effect is inherently associated with the reasoning process itself rather than RL training, and existing mitigation strategies (Prompt Engineering, DPO) face an inevitable reliability-capability trade-off.
Through the Magnifying Glass: Adaptive Perception Magnification for Hallucination-Free VLM Decoding: This paper proposes Perception Magnifier (PM), a visual decoding method that iteratively identifies key visual regions using multi-layer attention during each autoregressive decoding step and adaptively magnifies them. By enhancing the effective resolution of key regions, it mitigates VLM visual hallucinations while maintaining spatial structural integrity and reasoning capabilities.
TPA: Next Token Probability Attribution for Detecting Hallucinations in RAG: Ours proposes the TPA framework, which mathematically decomposes the generation probability of each LLM token into contributions from seven sources (Query, RAG Context, Past Token, Self Token, FFN, Final LayerNorm, and Initial Embedding). By aggregating these features with Part-of-Speech (POS) tagging, it achieves Prev. SOTA performance in RAG hallucination detection.
Two Pathways to Truthfulness: On the Intrinsic Encoding of LLM Hallucinations: This paper discovers that LLMs encode truthfulness signals through two distinct information pathways: Question-Anchored (dependent on information flow from question to answer) and Answer-Anchored (extracting self-contained evidence from the generated answer itself). These pathways are closely linked to knowledge boundaries. Based on this, the authors propose Mixture-of-Probes and Pathway Reweighting, achieving up to a 10% improvement in AUC for hallucination detection.
Understanding New-Knowledge-Induced Factual Hallucinations in LLMs: Analysis and Interpretation: This paper systematically analyzes factual hallucinations caused by learning new knowledge during the SFT phase using a controlled synthetic dataset, Biography-Reasoning. It discovers that the fundamental mechanism of hallucination is the weakened attention of the model towards key entities and proposes KnownPatch—injecting a small amount of known knowledge at the end of training to restore attention patterns, effectively mitigating hallucinations.
Vocabulary Hijacking in LVLMs: Unveiling Critical Attention Heads by Excluding Inert Tokens to Mitigate Hallucination: This paper discovers that certain invalid visual tokens in LVLMs consistently decode into a set of irrelevant words and hijack attention. Consequently, it proposes HABI to locate these tokens, NHAR to identify reliable visual heads, and HAVAE to enhance these heads during inference to mitigate hallucinations.
Why LLMs Hallucinate on Structured Knowledge: A Mechanistic Analysis of the Reasoning Process: This paper reveals the internal failure mechanisms of LLMs when processing linearized structured knowledge through two mechanistic indicators: Structural Shortcut Reliance (SSR) and Semantic Alignment Score (SAS). Based on these signals, a lightweight hallucination detector is constructed.