👻 Hallucination Detection¶
📹 ICCV2025 · 4 paper notes
📌 Same area in other venues: 🧪 ICML2026 (19) · 💬 ACL2026 (27) · 📷 CVPR2026 (18) · 🔬 ICLR2026 (9) · 🤖 AAAI2026 (15) · 🧠 NeurIPS2025 (17)
- ChartCap: Mitigating Hallucination of Dense Chart Captioning
-
This work constructs ChartCap, a large-scale dataset of 565K real chart–caption pairs. By adopting type-specific caption schemas that exclude irrelevant information while emphasizing structure and key insights, and by introducing a reference-free Visual Consistency Score (VCS) evaluation metric, the paper effectively mitigates hallucination in VLM-based chart captioning.
- DASH: Detection and Assessment of Systematic Hallucinations of VLMs
-
This paper proposes DASH, a fully automated pipeline that systematically discovers false-positive object hallucination clusters in VLMs via two complementary strategies: LLM-based text query generation (DASH-LLM) and diffusion model optimization-based image query generation (DASH-OPT). Applied to ReLAION-5B, DASH uncovers 19k+ clusters and 950k+ images, and constructs the more challenging DASH-B benchmark.
- Mitigating Object Hallucinations via Sentence-Level Early Intervention
-
This paper proposes SENTINEL, a framework grounded in the key observation that hallucinations emerge early in generation and propagate forward. By combining in-domain candidate bootstrapping with dual-detector cross-validation to construct sentence-level preference data, and employing Context-aware DPO (C-DPO) for early intervention, SENTINEL reduces hallucinations on Object HalBench by 92% while preserving general capabilities.
- ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models
-
This paper proposes ONLY, a training-free single-layer intervention decoding method. It selects text-biased attention heads via the Text-to-Visual Entropy Ratio (TVER) to generate textually-enhanced logits, which are then used in adaptive contrastive or collaborative decoding against the original logits. With only 1.07× inference overhead, ONLY outperforms VCD/M3ID by 3.14% on POPE and reduces CHAIR_S by 6.2 points on CHAIR.