Skip to content

👻 Hallucination Detection

💬 ACL2025 · 27 paper notes

📌 Same area in other venues: 📷 CVPR2026 (33) · 🔬 ICLR2026 (40) · 💬 ACL2026 (28) · 🧪 ICML2026 (21) · 🤖 AAAI2026 (15) · 🧠 NeurIPS2025 (17)

🔥 Top topics: LLM ×10 · Multimodal/VLM ×6 · RAG ×3 · Alignment/RLHF ×2

Activation Steering Decoding: Mitigating Hallucination in Large Vision-Language Models through Bidirectional Hidden State Intervention

This paper proposes ASD (Activation Steering Decoding), a training-free, inference-time hallucination mitigation method. By identifying hallucination direction patterns within the intermediate hidden states of LVLMs, it leverages bidirectional steering and contrastive decoding to suppress hallucinated outputs while preserving the model's performance on general visual understanding tasks.

Aligning Large Language Models to Follow Instructions and Hallucinate Less via Effective Data Filtering

This paper proposes the NOVA framework, which filters knowledge-aligned, high-quality instruction data by measuring the LLM's familiarity with instructions via Internal Consistency Probing (ICP) and familiarity with target responses via Semantic Equivalence Identification (SEI). Fine-tuning LLaMA-3-8B with only 5% of selected data achieves an 8.6-point improvement on BioGEN and a 7.2-point improvement on FollowRAG, while preserving instruction-following capability.

Alleviating Hallucinations from Knowledge Misalignment in Large Language Models via Selective Abstention Learning

To address the hallucination issue in LLMs caused by knowledge misalignment (inconsistency between model parametric knowledge and reality), this paper proposes a Selective Abstention Learning method. This approach enables the model to actively refuse to answer when encountering questions outside its knowledge boundary instead of fabricating content, thereby reducing hallucinations.

Automated Explanation Generation and Hallucination Detection for Heritage Image Retrieval

This paper proposes a framework combining automated explanation generation and hallucination detection for cultural heritage image retrieval. It utilizes vision-language models to generate explainable text descriptions for retrieval results, while ensuring the factual accuracy of descriptions through a domain-knowledge-constrained hallucination detection mechanism, validating the effectiveness of the method on multiple cultural heritage datasets.

CCHall: A Novel Benchmark for Joint Cross-Lingual and Cross-Modal Hallucinations Detection in Large Language Models

This paper proposes the first joint cross-lingual and cross-modal hallucination detection benchmark, CCHall, covering 9 languages and 4 types of multimodal datasets. It systematically evaluates the hallucination performance of 6 mainstream MLLMs in joint scenarios, revealing that the F1 score of current models in this joint scenario is 10.9% lower than that of cross-modal alone, and 3.4% lower than that of cross-lingual alone. Additionally, two mitigation paths are proposed: multilingual prompting and external tool assistance.

Correcting Hallucinations in News Summaries: Exploration of Self-Correcting LLM Methods with External Knowledge

This paper systematically explores the performance of two self-correction methods (CoVE and RARR) in correcting hallucinations in news summaries. By comparing three search engines, multiple retrieval settings, and prompting strategies, it is found that the combination of Bing search snippets and RARR (few-shot) yields the best performance, with G-Eval aligning closely with human evaluations.

Cracking the Code of Hallucination in LVLMs with Vision-aware Head Divergence

Proposes the VHD metric to quantify how sensitive the output of each attention head is to visual input. It finds that only a few attention heads are highly sensitive to visual information, and the model's over-reliance on language priors is a key factor causing hallucinations. Based on this, a training-free method, VHR, is designed to adaptively reinforce the contribution of vision-sensitive heads layer-by-layer (\(\alpha=2\)), reducing the CHAIR\(_S\) of LLaVA-1.5 on CHAIR from 49.68 to 33.32, with almost no additional inference overhead.

DRAG: Distilling RAG for SLMs from LLMs to Transfer Knowledge and Mitigate Hallucination

DRAG proposes a framework to distill RAG capabilities from large language models (LLMs) to small language models (SLMs): utilizing an LLM (e.g., GPT-4o) to generate evidence and knowledge graph triples for a given question. After ranking and filtering, these are fed to SLMs (2B-9B) as structured contexts, boosting SLM performance on ARC-C by up to 27.7% without fine-tuning, while significantly mitigating hallucinations.

ETF: An Entity Tracing Framework for Hallucination Detection in Code Summaries

Proposes the Entity Tracing Framework (ETF), a hallucination detection framework that extracts code entities via static program analysis and verifies whether these entities are correctly described in the generated summaries using LLMs. Combined with the first-of-its-kind CodeSumEval dataset (~10K samples), it achieves a 73% F1 score in code summary hallucination detection.

FIHA: Autonomous Fine-grained Hallucination Evaluation in Vision-Language Models with Davidson Scene Graphs

This paper proposes FIHA, an automated, fine-grained hallucination evaluation framework that requires neither LLMs nor human annotations. By extracting entities, attributes, and relations from images and descriptions to generate Q&A pairs, and introducing Davidson Scene Graphs (DSG) to model inter-question dependencies, the authors construct the FIHA-v1 benchmark to comprehensively evaluate the hallucination levels of mainstream Large Vision-Language Models.

HalluLens: LLM Hallucination Benchmark

Proposes HalluLens, a hallucination benchmark that clearly distinguishes hallucination from factuality, establishes a clear taxonomy of extrinsic hallucination (inconsistency with training data) and intrinsic hallucination (inconsistency with input context), introduces three dynamically regenerable extrinsic hallucination evaluation tasks, and comprehensively analyzes the limitations of existing benchmarks.

HALoGEN: Fantastic LLM Hallucinations and Where to Find Them

This paper proposes HALoGEN, a large-scale hallucination evaluation framework containing 10,923 prompts across 9 domains (including programming, scientific citation, translation, etc.), equipped with an atomic-level automated verifier. It systematically evaluates hallucinations on approximately 150,000 generation samples from 14 LLMs, discovering that even the best models can have hallucination rates of up to 86% in atomic facts within certain domains, and introduces a taxonomy of Type A/B/C errors.

HD-NDEs: Neural Differential Equations for Hallucination Detection in LLMs

This paper represents the first attempt to apply Neural Differential Equations (Neural DEs) to LLM hallucination detection. By modeling the continuous trajectories of token activations in the hidden space, the proposed method systematically evaluates the truthfulness of statements, outperforming the state-of-the-art (SOTA) on the True-False dataset by over 14% in AUC-ROC.

ICR Probe: Tracking Hidden State Dynamics for Reliable Hallucination Detection in LLMs

Proposes ICR Score (Information Contribution to Residual Stream) to quantify residual stream dynamics by measuring the consistency of the contributions of MHSA and FFN modules to hidden state updates. A lightweight ICR Probe with only 16K parameters is constructed, which consistently outperforms baselines in hallucination detection AUROC across 4 datasets × 3 LLMs.

Beyond Facts: Evaluating Intent Hallucination in Large Language Models

This paper proposes the concept of "Intent Hallucination"—generations that deviate from the user's intent due to LLMs omitting or misinterpreting certain intent constraints when handling complex multi-condition queries. It constructs the FaithQA benchmark (20,068 questions) and the Constraint Score evaluation metric. Experiments demonstrate that intent hallucination is prevalent in SOTA models and intensifies as query complexity increases.

Learning Auxiliary Tasks Improves Reference-Free Hallucination Detection in Open-Domain Long-Form Generation

This work systematically investigates reference-free hallucination detection in open-domain long-form generation, discovering that the internal states (probabilities/entropy) of LLMs are insufficient to reliably distinguish factual and hallucinated content. It proposes RATE-FT (Rationale and Auxiliary Task Enhanced Fine-Tuning), which enhances fine-tuning by incorporating reasoning rationales and auxiliary QA tasks, achieving over 3% improvement on LongFact compared to standard fine-tuning.

Fine-grained Hallucination Detection and Mitigation in Long-form Question Answering

This paper constructs HaluQuestQA (698 QA pairs, 4.7k error annotations, 5 error types), the first LFQA hallucination dataset with span-level error annotations. It trains an automatic feedback model to detect incomplete information error spans and generate explanations, and finally proposes the Error-informed Refinement method to refine answers using feedback signals, reducing hallucinations by approximately 3%, with 84% of users preferring the refined answers in human evaluations.

Mixture of Decoding: An Attention-Inspired Adaptive Decoding Strategy to Mitigate Hallucination in Multimodal LLMs

Proposed Mixture of Decoding (MoD), which utilizes JS divergence to measure the correctness of the model's attention to image tokens. When the attention is correct, complementary decoding is used to amplify key information, whereas when the attention is incorrect, contrastive decoding is adopted to suppress misleading information, thereby adaptively mitigating hallucinations in multimodal large language models.

Monitoring Decoding: Mitigating Hallucination via Evaluating the Factuality of Partial Response during Generation

A Monitoring Decoding (MD) framework is proposed to dynamically monitor the factuality of partial responses during generation. It identifies hallucination-prone tokens via a monitor function and selectively revises these key tokens using a tree-search strategy, significantly improving factual accuracy while maintaining efficiency.

On-Policy Self-Alignment with Fine-grained Knowledge Feedback for Hallucination Mitigation

Proposes RLFH (Reinforcement Learning for Hallucination), an on-policy self-alignment method where the LLM itself acts as the judge to decompose responses into atomic facts, evaluate their truthfulness and informativeness, generate token-level dense reward signals, and optimize via online PPO to effectively mitigate hallucination.

ReefKnot: A Comprehensive Benchmark for Relation Hallucination Evaluation, Analysis and Mitigation in Multimodal Large Language Models

This paper proposes Reefknot, the first comprehensive benchmark to systematically evaluate relation-level hallucination in Multimodal Large Language Models (MLLMs), consisting of over 20k samples across three tasks. Based on confidence-entropy detection, a Detect-then-Calibrate mitigation strategy is proposed, which reduces the average hallucination rate by 9.75%.

REFIND at SemEval-2025 Task 3: Retrieval-Augmented Factuality Hallucination Detection in Large Language Models

The REFIND framework is proposed, which efficiently detects hallucinated spans in LLM outputs by calculating the Context Sensitivity Ratio (CSR)—the ratio of generation probability for each token with and without retrieved documents. It significantly outperforms baselines across 9 languages in SemEval-2025 Task 3.

Removal of Hallucination on Hallucination: Debate-Augmented RAG

DRAG (Debate-Augmented RAG) proposes introducing a Multi-Agent Debate (MAD) mechanism in both the retrieval and generation stages of RAG systems. Through a structured process of proponent-opponent debate and judge arbitration, it eliminates the "hallucination on hallucination" problem caused by erroneous retrieval, significantly improving factual accuracy across six QA benchmarks.

Retrieval Visual Contrastive Decoding to Mitigate Object Hallucinations in Large Vision-Language Models

Proposes Retrieval Visual Contrastive Decoding (RVCD), which constructs positive/negative logit sets by retrieving AI-generated single-concept explicit images to mitigate object hallucinations in Large Vision-Language Models (LVLMs) during the decoding stage, achieving performance significantly superior to existing decoding methods without requiring extra training.

Stochastic Chameleons: Irrelevant Context Hallucinations Reveal Class-Based (Mis)Generalization in LLMs

Explains the internal mechanisms of irrelevant context hallucinations in LLMs using behavioral analysis and mechanistic interpretability experiments: models construct abstract class representations (e.g., "language") in the early layers, followed by a competition for feature selection between two competing circuits (query-based vs. context-based), where their relative activation strength determines whether the model generalizes correctly or hallucinates.

TreeCut: A Synthetic Unanswerable Math Word Problem Dataset for LLM Hallucination Evaluation

This paper proposes TreeCut, a tree-structure-based synthetic dataset generation method. By systematically removing essential condition edges along tree paths, it generates an infinite number of unanswerable mathematical word problems to evaluate the hallucinating behavior of LLMs when facing unsolvable tasks.

Visual Evidence Prompting Mitigates Hallucinations in Large Vision-Language Models

This paper proposes Visual Evidence Prompting (VEP), which uses the outputs of small vision expert models (such as object detectors and scene graph generators) as textualized "visual evidence" input for LVLMs. This training-free approach significantly reduces hallucinations across 11 LVLMs—improving LLaVA-1.5 by 7.2% and Claude 3 by 12.1% on the POPE benchmark.