Skip to content

👻 Hallucination Detection

💬 ACL2026 · 27 paper notes

📌 Same area in other venues: 🧪 ICML2026 (19) · 📷 CVPR2026 (18) · 🔬 ICLR2026 (9) · 🤖 AAAI2026 (15) · 🧠 NeurIPS2025 (17) · 📹 ICCV2025 (4)

🔥 Top topics: LLM ×6 · Multimodal/VLM ×4 · Speech & Audio ×3 · RAG ×3 · Reasoning ×2

Aligning with Your Own Voice: Self-Corrected Preference Learning for Hallucination Mitigation in LVLMs

The AVES-DPO framework is proposed: utilizing consistency-based multi-model verification (YOLO/GroundingDINO/Qwen3-VL) to detect fine-grained hallucinations in an LVLM's own responses across object/attribute/relation levels, followed by self-correction and detail enrichment by the same LVLM. This ensures preference pairs remain within the target model's "internal distribution." With only 5.2K samples, it surpasses SOTA methods dependent on GPT-4V teachers while achieving approximately 25× data efficiency across multiple benchmarks.

Benchmarking Deflection and Hallucination in Large Vision-Language Models

This paper introduces VLM-DeflectionBench, a multimodal benchmark containing 2,775 samples, which systematically evaluates the deflection vs. hallucination behavior of Large Vision-Language Models (LVLMs) when evidence is insufficient or misleading across four evaluation scenarios (Parameterized/Oracle/Realistic/Adversarial). Experiments across 20 SOTA LVLMs reveal that almost all models fail to reliably deflect under noisy evidence.

Detecting Hallucinations in SpeechLLMs at Inference Time Using Attention Maps

This paper proposes four audio-attention-based metrics (AudioRatio, AudioConsistency, AudioEntropy, TextEntropy) and trains a lightweight logistic regression classifier to detect SpeechLLM hallucinations at inference time, achieving up to a +0.23 PR-AUC improvement on in-domain data.

Dialectic-Med: Mitigating Diagnostic Hallucinations via Counterfactual Adversarial Multi-Agent Debate

The paper proposes Dialectic-Med, a multi-agent medical diagnostic framework inspired by Popper’s falsificationism. Through adversarial dialectical reasoning among a Proposer (diagnostic hypothesis), an Opponent (visual falsification module actively retrieving contradictory visual evidence), and a Mediator (weighted consensus graph decision), it achieves SOTA performance on MIMIC-CXR-VQA, VQA-RAD, and PathVQA. It improves explanation faithfulness by 12.5% and significantly mitigates diagnostic hallucinations.

Distorted or Fabricated? A Survey on Hallucination in Video LLMs

This paper presents the first systematic classification of hallucination phenomena in Video Large Language Models (Vid-LLMs), proposing a mechanism-driven taxonomy consisting of "Dynamic Distortion" (errors in spatio-temporal relations and reference consistency) and "Content Fabrication" (driven by statistical priors and audio-visual conflicts), while surveying evaluation benchmarks, mitigation strategies, and root cause analyses.

Enhancing Hallucination Detection via Future Context

This paper proposes using sampled "future context" (subsequent sentences) to enhance hallucination detection in black-box scenarios. By exploiting the "snowball effect"—where hallucinations tend to persist once they occur—the method consistently improves detection performance across various sampling approaches, such as SelfCheckGPT and SC.

FaithLens: Detecting and Explaining Faithfulness Hallucination

This paper proposes FaithLens, an 8B parameter faithfulness hallucination detection model. It undergoes cold-start SFT using high-quality synthetic data subject to three-dimensional filtering (label correctness, explanation quality, and data diversity), followed by further optimization via rule-based reinforcement learning (prediction correctness reward + explanation quality reward). It outperforms GPT-5.2 and o3 across 12 tasks while providing high-quality explanatory outputs.

FinGround: Detecting and Grounding Financial Hallucinations via Atomic Claim Verification

FinGround is a three-stage "verify-then-ground" pipeline for financial document QA: (1) finance-aware hybrid retrieval; (2) decomposing answers into atomic claims and verifying them with a type-routed strategy across a six-category taxonomy (numerical, temporal, entity-attribute, comparative, regulatory, and computational—where computational claims use formula reconstruction + arithmetic re-verification); (3) grounded rewriting of unsupported claims with paragraph/cell-level citations. By distilling GPT-4o into an 8B detector, it achieves a 91.4% F1 score with 18× acceleration, reducing the end-to-end hallucination rate by 78% compared to GPT-4o+CoT.

Generating Effective CoT Traces for Mitigating Causal Hallucination

This paper proposes the Causal Hallucination Rate (CHR) metric to quantify the tendency of small LLMs to over-predict causal relationships in Event Causality Identification (ECI). Through systematic experiments, two key criteria for effective CoT data are identified (sufficiently long semantic explanations + distribution alignment with the target model). A low-cost CoT data generation pipeline is designed, reducing the CHR of Qwen2.5-1.5B from 83.54% to 6.26% while improving average accuracy to 66.00%.

HalluAudio: A Comprehensive Benchmark for Hallucination Detection in Large Audio-Language Models

This paper proposes HalluAudio, the first large-scale cross-domain (speech/environmental sound/music) benchmark for audio hallucination detection. It comprises over 5,000 human-verified QA pairs with systematic adversarial prompt designs. By evaluating mainstream LALMs across multidimensional metrics (Accuracy, Hallucination Rate, Yes-No Bias, Rejection Rate, and Error Types), the study reveals significant deficiencies in current models regarding acoustic anchoring, temporal reasoning, and music attribute understanding.

Hallucination Detection in LLMs with Topological Divergence on Attention Graphs

TOHA treats the LLM attention matrix as a weighted graph and utilizes Manifold Topology Divergence from Topological Data Analysis (TDA) to measure the "topological novelty of the response subgraph relative to the prompt subgraph." It identifies "hallucination-aware heads" that are stable across datasets. Averaging only 10 such heads enables a training-free detector in RAG scenarios that is 70× faster than SelfCheckGPT and achieves significantly higher ROC-AUC.

Logical Consistency as a Bridge: Improving LLM Hallucination Detection via Label Constraint Modeling between Responses and Self-Judgments

The model treats an LLM's self-judgment ("whether it believes its own answer was correct") as a potentially hallucinated generation. It first trains a "meta-judgment detector" using intrinsic features to estimate the credibility of the self-judgment. Then, leveraging the inherent logical rule—"if self-judgment is true, the labels are identical; if false, the labels are opposite"—it utilizes Huber loss to constrain the response detector and meta-judgment detector via confidence-weighted mutual learning. During inference, only the response detector is used, achieving gains from both perspectives with zero additional inference cost.

Lost in Diffusion: Uncovering Hallucination Patterns and Failure Modes in Diffusion Large Language Models

This work provides the first systematic comparison of hallucination patterns between diffusion large language models (dLLMs) and their autoregressive (AR) counterparts. It reveals that current dLLMs have a higher propensity for hallucinations and identifies three diffusion-specific failure modes: Premature Termination, Incomplete Denoising, and Contextual Invasion.

MeasHalu: Mitigation of Scientific Measurement Hallucinations for LLMs

Ours proposes the MeasHalu framework, which mitigates hallucinations in LLMs during scientific measurement extraction through a fine-grained measurement hallucination taxonomy and a two-stage optimization process (reasoning-aware SFT + hallucination-targeted GRPO rewards), significantly surpassing baselines on MeasEval.

Mechanisms of Prompt-Induced Hallucination in Vision–Language Models

In controlled object counting tasks, the hallucination behavior where the "model follows the prompt instead of the image" is localized to 3–10 attention heads in the early layers (primarily L0-1) of LLaVA-OneVision / Qwen-VL / Janus-Pro. Applying mean ablation to these heads without any retraining causes prompt-following to drop from 42–64% to <11%, recovers true counting rates to 70–78%, and enables zero-shot transfer to color recognition tasks (PIH suppression of 40–95%).

Mitigating Hallucinations in Large Vision-Language Models without Performance Degradation

This paper proposes the MPD framework, which decouples hallucination components via semantic-aware orthogonal subspace projection and selectively updates a small number of parameters most relevant to hallucinations. It reduces hallucinations by 23.4% while maintaining 97.4% of general generation capability without introducing additional inference overhead.

MultiHaluDet: Multilingual Hallucination Detection via LLM Hidden State Probing

MultiHaluDet utilizes full-layer hidden state trajectories of frozen LLMs for multi-scale sequence modeling, further identifying hallucinations through out-of-fold representations and ensemble meta-learners. It achieves approximately 98% AUROC on HaluEval / TriviaQA and generalizes to French, Bengali, and Amharic.

Rethinking Evaluation for LLM Hallucination Detection: A Desiderata, A New RAG-based Benchmark, New Insights

This paper redefines seven requirements that a RAG scenario hallucination detection benchmark should satisfy and constructs Trivia++, a long-context dataset featuring multi-round human annotations and realistic noisy labels. The study finds that existing detectors still perform significantly below ideal levels on organic RAG hallucinations.

Spotlight and Shadow: Attention-Guided Dual-Anchor Introspective Decoding for MLLM Hallucination Mitigation

Ours proposes DaID (Dual-Anchor Introspective Decoding), which leverages the internal visual perception differences across MLLM layers—amplifying visual signals through the Spotlight layer and suppressing linguistic inertia through the Shadow layer—to achieve hallucination mitigation within a single forward pass.

Stable-RAG: Mitigating Retrieval-Permutation-Induced Hallucinations in Retrieval-Augmented Generation

This paper reveals the high sensitivity of RAG systems to the permutation order of retrieved documents and proposes Stable-RAG: it identifies dominant reasoning patterns by performing spectral clustering on hidden states induced by document permutations, and then employs DPO alignment to guide hallucinated outputs toward correct answers, achieving dual improvements in accuracy and reasoning consistency across weight QA datasets.

The Reasoning Trap: How Enhancing LLM Reasoning Amplifies Tool Hallucination

Systematically reveals the "Reasoning Trap" paradox: enhancing LLM reasoning capabilities (whether via RL, distillation, or switchable reasoning modes) systematically amplifies tool hallucination. This effect is associated with reasoning itself rather than RL training, and existing mitigation strategies (Prompt Engineering, DPO) face an inevitable reliability-capability trade-off.

Through the Magnifying Glass: Adaptive Perception Magnification for Hallucination-Free VLM Decoding

This paper proposes Perception Magnifier (PM), a visual decoding method that iteratively identifies key visual regions based on multi-layer attention at each auto-regressive decoding step and adaptively magnifies them. By increasing the effective resolution of key regions while maintaining spatial structural integrity and reasoning capabilities, PM mitigates visual hallucinations in VLMs.

TPA: Next Token Probability Attribution for Detecting Hallucinations in RAG

This paper proposes the TPA framework, which mathematically decomposes the generation probability of each token in LLMs into contributions from seven sources (Query, RAG Context, Past Token, Self Token, FFN, Final LayerNorm, and Initial Embedding). By aggregating these features with Part-of-Speech (POS) tagging, it achieves SOTA hallucination detection performance in RAG scenarios.

Two Pathways to Truthfulness: On the Intrinsic Encoding of LLM Hallucinations

This paper discovers two distinct information pathways for truthfulness signals within LLMs: Question-Anchored (relying on the flow from question to answer) and Answer-Anchored (extracting self-contained evidence from the generated answer itself). These pathways are closely linked to knowledge boundaries. Based on this, two pathway-aware hallucination detection methods, Mixture-of-Probes and Pathway Reweighting, are proposed, achieving AUC improvements of up to 10%.

Understanding New-Knowledge-Induced Factual Hallucinations in LLMs: Analysis and Interpretation

This paper systematically analyzes the phenomenon of factual hallucinations caused by learning new knowledge during the SFT stage using a controlled synthetic dataset, Biography-Reasoning. It discovers that the fundamental mechanism of hallucination is the weakening of the model's attention toward key entities. The authors propose KnownPatch—injecting a small amount of known knowledge at the end of training to restore attention patterns—effectively mitigating hallucinations.

Vocabulary Hijacking in LVLMs: Unveiling Critical Attention Heads by Excluding Inert Tokens to Mitigate Hallucination

This paper discovers that certain invalid visual tokens in LVLMs consistently decode into a set of irrelevant words and hijack attention. Consequently, it proposes HABI to locate these tokens, uses NHAR to identify reliable visual heads, and then enhances these heads via HAVAE during inference to reduce hallucinations.

Why LLMs Hallucinate on Structured Knowledge: A Mechanistic Analysis of the Reasoning Process

The paper reveals the internal failure mechanisms of LLMs when processing linearized structured knowledge through two mechanistic metrics (Structural Shortcut Reliance, SSR, and Semantic Alignment Score, SAS) and constructs a lightweight hallucination detector based on these signals.