👻 Hallucination Detection¶
🔬 ICLR2026 · 9 paper notes
📌 Same area in other venues: 🧪 ICML2026 (19) · 💬 ACL2026 (27) · 📷 CVPR2026 (18) · 🤖 AAAI2026 (15) · 🧠 NeurIPS2025 (17) · 📹 ICCV2025 (4)
🔥 Top topics: LLM ×2 · Multimodal/VLM ×2
- Copy-Paste to Mitigate Large Language Model Hallucinations
-
This paper proposes a Copy-Paste generation paradigm that trains LLMs to preferentially copy spans directly from retrieved context rather than paraphrasing them freely. Combined with high-copy-preference DPO training, the approach improves faithfulness on counterfactual RAG benchmarks from 80.2% to 92.8%.
- Dynamic Multimodal Activation Steering for Hallucination Mitigation in Large Vision-Language Models
-
This paper proposes Dynamic Multimodal Activation Steering (DMAS), a training-free method that constructs a semantics-based truthfulness steering vector database and a visual perception steering vector, dynamically selecting the most relevant steering vectors at inference time to intervene on critical attention heads. DMAS significantly mitigates hallucinations in LVLMs, achieving a gain of 94.66 points on MME and a 20.2% reduction in hallucination rate on CHAIR.
- Enhancing Hallucination Detection through Noise Injection
-
Injecting uniform noise into MLP activations of intermediate LLM layers to approximate the Bayesian posterior, capturing epistemic uncertainty that is complementary to the aleatoric uncertainty captured by sampling temperature. This raises hallucination detection AUROC on GSM8K from 71.56 to 76.14.
- Hallucination Begins Where Saliency Drops
-
This paper proposes LVLMs-Saliency, a gradient-aware diagnostic framework that quantifies the visual grounding strength of each output token. It identifies a key finding: hallucinations arise when the saliency of previously generated tokens toward the next token prediction drops. Building on this insight, the paper introduces a dual-mechanism inference-time framework combining SGRS (Saliency-Guided Rejection Sampling) and LocoRE (Local Coherence Reinforcement), achieving significant hallucination reduction across multiple LVLMs.
- Look Carefully: Adaptive Visual Reinforcements in Multimodal Large Language Models for Hallucination Mitigation
-
This paper proposes AIR (Adaptive vIsual Reinforcement), a framework that reduces hallucinations in MLLMs at inference time without any training, via prototype-distance-based token reduction combined with optimal-transport-guided selective patch reinforcement (LLaVA-1.5-7B CHAIR_S: 22→18.4, POPE accuracy +5.3%), while preserving general multimodal capabilities.
- LUMINA: Detecting Hallucinations in RAG System with Context-Knowledge Signals
-
This paper proposes the Lumina framework for detecting hallucinations in RAG systems via "context-knowledge signals": MMD is used to measure external context utilization, while cross-layer token prediction evolution measures internal knowledge utilization, enabling hyperparameter-free generalization.
- SHIELD: Suppressing Hallucinations In LVLM Encoders via Bias and Vulnerability Defense
-
This work is the first to systematically trace object hallucinations in LVLMs back to the visual encoder, identifying three core issues: statistical bias (over-emphasis on high-frequency pattern tokens), inherent bias (residual representations of pre-training dominant objects), and vulnerability (feature distortion under minimal perturbations). It proposes SHIELD—a fully training-free framework that jointly addresses these issues via token reweighting, token subtraction, and contrastive decoding, achieving comprehensive improvements over VCD and OPERA on LLaVA-1.5, InstructBLIP, and Qwen-VL.
- Token-Guard: Towards Token-Level Hallucination Control via Self-Checking Decoding
-
This paper proposes Token-Guard, a token-level hallucination control method based on self-checking decoding, which detects and suppresses hallucinations during decoding via token-level/segment-level scoring in the hidden space and an iterative refinement mechanism, achieving an average F1 improvement of 16.3%.
- VeriTrail: Closed-Domain Hallucination Detection with Traceability
-
This paper proposes VeriTrail — the first closed-domain hallucination detection method that provides traceability for multi-generative-step (MGS) processes. It models the generation process as a DAG and performs layer-by-layer verification along paths, while also introducing the first MGS datasets that include all intermediate outputs with human annotations.