👻 Hallucination Detection¶

📹 ICCV2025 · 5 paper notes

📌 Same area in other venues: 📷 CVPR2026 (33) · 🔬 ICLR2026 (40) · 💬 ACL2026 (28) · 🧪 ICML2026 (21) · 🤖 AAAI2026 (15) · 🧠 NeurIPS2025 (17)

ChartCap: Mitigating Hallucination of Dense Chart Captioning: This work constructs ChartCap, a large-scale dataset of 565K real chart–caption pairs. By adopting type-specific caption schemas that exclude irrelevant information while emphasizing structure and key insights, and by introducing a reference-free Visual Consistency Score (VCS) evaluation metric, the paper effectively mitigates hallucination in VLM-based chart captioning.
DASH: Detection and Assessment of Systematic Hallucinations of VLMs: This paper proposes DASH, a fully automated pipeline that systematically discovers false-positive object hallucination clusters in VLMs via two complementary strategies: LLM-based text query generation (DASH-LLM) and diffusion model optimization-based image query generation (DASH-OPT). Applied to ReLAION-5B, DASH uncovers 19k+ clusters and 950k+ images, and constructs the more challenging DASH-B benchmark.
Mitigating Object Hallucinations via Sentence-Level Early Intervention: This paper proposes SENTINEL, a framework grounded in the key observation that hallucinations emerge early in generation and propagate forward. By combining in-domain candidate bootstrapping with dual-detector cross-validation to construct sentence-level preference data, and employing Context-aware DPO (C-DPO) for early intervention, SENTINEL reduces hallucinations on Object HalBench by 92% while preserving general capabilities.
ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models: This paper proposes ONLY, a training-free single-layer intervention decoding method. It selects text-biased attention heads via the Text-to-Visual Entropy Ratio (TVER) to generate textually-enhanced logits, which are then used in adaptive contrastive or collaborative decoding against the original logits. With only 1.07× inference overhead, ONLY outperforms VCD/M3ID by 3.14% on POPE and reduces CHAIR_S by 6.2 points on CHAIR.
Why LVLMs Are More Prone to Hallucinations in Longer Responses: The Role of Context: This work deeply investigates the root cause of frequent hallucinations in LVLM long-form generation—demonstrating that it is not the length itself but the demands of contextual coherence and completeness that drive the model to extrapolate and hallucinate. Based on this insight, the authors propose HalTrapper, an "induce-detect-suppress" three-stage framework.