👻 Hallucination Detection¶

📷 CVPR2026 · 33 paper notes

📌 Same area in other venues: 🔬 ICLR2026 (40) · 💬 ACL2026 (28) · 🧪 ICML2026 (21) · 🤖 AAAI2026 (15) · 🧠 NeurIPS2025 (17) · 📹 ICCV2025 (5)

🔥 Top topics: Multimodal/VLM ×14 · LLM ×3 · Reasoning ×2

AdaIAT: Adaptively Increasing Attention to Generated Text to Alleviate Hallucinations in LVLM: Addressing the issue where "amplifying image attention suppresses hallucinations but leads to repetitive and wordy output," this paper discovers that real object tokens possess higher attention to the previously generated text \(T_p\) than hallucinated tokens. Consequently, the authors propose increasing attention specifically to \(T_p\) (IAT). By further employing layer-wise thresholds to control "when to intervene" and a head-wise amplification matrix to control "how much to amplify" (AdaIAT), they significantly reduce hallucination rates (CS/CI) on LLaVA-1.5, Janus-Pro, and Qwen2.5-VL with almost no loss in text diversity.
Beyond the Global Scores: Fine-Grained Token Grounding as a Robust Detector of LVLM Hallucinations: Ours proposes a patch-level LVLM hallucination detection framework, discovering that hallucinated tokens exhibit dispersed attention patterns and low semantic alignment. Based on these signatures, Attention Dispersion Score (ADS) and Cross-modality Grounding Consistency (CGC) are designed as lightweight metrics, achieving a detection accuracy of 90%.
CausalLens: Sensitivity-Guided Multi-Head Causal Intervention for Hallucination Mitigation in Large Vision-Language Models: CausalLens decomposes each attention head of the decoder into three pathways—"visual, text, and system prompt"—and identifies heads that truly attend to the image using a visual sensitivity score. By amplifying visual contributions and applying projection alignment corrections in a single forward pass within the middle layers (L10–L20), it significantly reduces hallucinations in Large Vision-Language Models without retraining or multiple decoding iterations.
COPO: Causal-Oriented Policy Optimization for Hallucinations of MLLMs: The authors discovered that MLLMs, when post-trained with GRPO (using only outcome rewards based on final answer correctness), tend to over-focus on image backgrounds, forming spurious "background \(\to\) answer" correlations that lead to hallucinations. They propose COPO, which calculates a "causal completeness" reward (sufficiency + necessity) for each reasoning token and injects it into the GRPO advantage function. This forces the model to reward only those tokens that truly determine the answer's correctness, consistently reducing hallucination rates across multiple benchmarks such as CHAIR and POPE.
Cross-Modal Attention Calibration for LVLM Hallucination Mitigation: To mitigate hallucinations in LVLMs, this paper proposes CMAC, a training-free cross-modal attention calibration framework. It uses the IMD module to perform "surgical" masking of high cross-modal weight value vectors in the attention layer to construct a more accurate hallucination distribution for contrastive decoding. Additionally, the CMPC module scales the position indices of image tokens to alleviate the position bias introduced by RoPE. CMAC consistently outperforms existing contrastive decoding methods across POPE, CHAIR, and MME.
Envision, Attend, Then Respond: Counterfactual Hallucination Mitigation in Large Vision-Language Models: EnAR is a training-free framework that utilizes a diffusion model to generate a "visual impression" of what an input image "should look like." By comparing the visual attention differences between the original image and this impression, it identifies counterfactual elements (e.g., a five-legged alpaca). These tokens are then masked for contrastive decoding, forcing the LVLM to anchor its response on real pixels rather than linguistic priors. This approach achieves a 10.82% improvement on the counterfactual benchmark VLMBias and an average 6.9% gain on the general hallucination benchmark POPE.
Evaluating and Easing Hallucinations for GUI Grounding: This paper presents the first systematic study of hallucinations in GUI grounding, categorizing them into "confusion hallucinations" (misidentifying similar elements) and "fabrication hallucinations" (inventing non-existent coordinates). The authors construct GUI-HalluBench, a bilingual dataset with dual subsets, to diagnose the correlation between hallucinations and parsing capabilities. They propose a training-free Parsing-guided Prompt (PGP) and a Hallucination-aware Fine-Tuning (HFT) solution. Experiments demonstrate that stronger parsing leads to fewer hallucinations, with HFT yielding an absolute improvement of approximately 7%.
Fighting Hallucinations with Counterfactuals: Diffusion-Guided Perturbations for LVLM Hallucination Suppression: Ours proposes CIPHER, a training-free test-time hallucination suppression method. In the offline stage, a diffusion model is used to generate counterfactual images to construct the OHC-25K dataset, from which a visual hallucination subspace is extracted via SVD. During inference, hidden states are projected onto the orthogonal complement of this subspace, significantly reducing visual hallucinations in LVLMs without modifying model parameters or increasing inference overhead.
Fine-Grained Multi-Image Object Hallucination Benchmark: MIOH is the first fine-grained object hallucination diagnostic benchmark designed for multi-image scenarios. It creates a matrix of "4 object tasks × 3 multi-image reasoning modes" resulting in 26 question types, further overlaid with three controllable adversarial pressures: "number of images / perceptual difficulty / contextual bias." Evaluations of 29 models reveal that even GPT-5 and Gemini-2.5-Pro achieve overall accuracies of only 63.1% and 64.4%, respectively, with a global average of only 36.1%. The study identifies that hallucinations primarily originate from the cross-image integration stage rather than simple perceptual failure.
FINER: MLLMs Hallucinate under Fine-grained Negative Queries: The study identifies a sharp increase in MLLM hallucination rates under fine-grained negative queries (queries with a single subtle error among multiple objects/attributes/relations). It proposes the FINER benchmark and the FINER-Tuning method (based on DPO), achieving a maximum improvement of 24.2% on InternVL3.5-14B.
First Logit Boosting: Visual Grounding Method to Mitigate Object Hallucination in Large Vision-Language Models: Addressing the "long-term decay" problem in Large Vision-Language Models (LVLMs)—where models increasingly detach from images and fabricate objects later in the generation process—this paper proposes First Logit Boosting (FLB). The method stores the logit of the first generated token and adds it back to the logits of each subsequent step with a weight that increases over time. FLB is training-free, requires no external models, uses only a single forward pass, and significantly reduces object hallucinations on CHAIR/AMBER benchmarks with almost no added inference overhead.
HalluGen: Synthesizing Realistic and Controllable Hallucinations for Evaluating Image Restoration: HalluGen utilizes diffusion posterior sampling combined with masked gradient guidance to proactively inject "controllable type, location, and severity" realistic hallucinations into image restoration results. This enables the creation of the first hallucination dataset with ground-truth labels (4350 brain MRIs), establishment of a benchmark, proposal of the hallucination-sensitive SHAFE metric, and training of a no-reference detector that generalizes to real restoration failures.
HulluEdit: Single-Pass Evidence-Consistent Subspace Editing for Mitigating Hallucinations in Large Vision-Language Models: HulluEdit is proposed as a single-pass, reference-free subspace editing framework. By decomposing hidden states into orthogonal visual evidence, conflicting prior, and residual uncertainty subspaces, it selectively suppresses hallucination patterns without interfering with visual grounding. It achieves SOTA hallucination mitigation performance on POPE and CHAIR benchmarks.
KVSmooth: Mitigating Hallucination in Multi-modal Large Language Models through Key-Value Smoothing: Ours proposes KVSmooth, a training-free plug-and-play method that performs smoothing on KV-Cache via adaptive Exponential Moving Average (EMA) guided by attention row entropy. It effectively suppresses semantic drift and hallucination generation triggered by sink tokens during the decoding process of Multi-modal Large Language Models (MLLMs). On LLaVA-1.5, it reduces CHAIR_S from 41.8 to 18.2 (a 56% reduction) while improving F1 from 77.5 to 79.2.
Locate-then-Sparsify: Attribution Guided Sparse Strategy for Visual Hallucination Mitigation: The authors propose the LTS-FS (Locate-Then-Sparsify for Feature Steering) framework, which utilizes a causal intervention attribution method to locate hallucination-related layers. It applies layer-wise sparse control of feature steering intensity based on attribution scores, effectively mitigating LVLM hallucinations while preserving model generalization.
Lyapunov Probes for Hallucination Detection in Large Foundation Models: (M)LLMs are viewed as high-dimensional dynamical systems evolving in representation space, and "hallucinations" are redefined as cases where inputs fall into unstable knowledge boundary regions rather than stable equilibrium points. A lightweight probe network with a Lyapunov monotonic decay constraint (taking multi-layer hidden states and perturbation information as input) is used for discrimination, achieving AUPRC scores that consistently outperform ordinary probes by 4–8% across multiple LLMs/MLLMs.
MAD: Modality-Adaptive Decoding for Mitigating Cross-Modal Hallucinations in Multimodal Large Language Models: To address "cross-modal hallucinations" in audio-visual large language models—where one modality incorrectly influences the generation of another—this paper proposes MAD (Modality-Adaptive Decoding). MAD is a training-free method that first extracts modality weights by having the model identify which modality is required for a question, then uses these weights to adaptively weight a four-way contrastive decoding branch. This suppresses interference from irrelevant modalities, improving overall accuracy on CMM/AVHBench by several percentage points compared to baselines like AVCD.
Mitigating Multimodal Hallucinations via Gradient-based Self-Reflection: Ours proposes GACD (Gradient-based Influence-Aware Constrained Decoding), which utilizes first-order Taylor gradients to estimate the influence of each token on the output. It simultaneously mitigates multimodal hallucinations caused by text-visual bias and co-occurrence bias during the inference stage without requiring auxiliary models or fine-tuning.
MoD-DPO: Towards Mitigating Cross-modal Hallucinations in Omni LLMs using Modality Decoupled Preference Optimization: MoD-DPO (Modality-Decoupled DPO) is proposed to decouple the contributions of various modalities in Omni LLMs through three mechanisms: invariance regularization, sensitivity regularization, and language prior debiasing. It effectively mitigates cross-modal hallucinations (such as using auditory information to answer visual questions) and derives a closed-form optimal policy.
One Token, Two Fates: A Unified Framework via Vision Token Manipulation Against MLLMs Hallucination: This paper redefines MLLM object hallucination as a "vision-language imbalance" problem and proposes a training-free framework that manipulates vision tokens solely in the intermediate representation layer. It enhances visual signals using vision tokens from augmented images (SVC) while constructing negative samples in the latent space using pruned vision tokens to purify internal model bias (CRC). On LLaVA-1.5, it improves the average absolute accuracy of POPE by approximately 2%, with only a 1.06× latency overhead during inference.
PAS: Prelim Attention Score for Detecting Object Hallucinations in Large Vision-Language Models: This study discovers that when Large Vision-Language Models (LVLMs) generate object hallucinations, they tend to "ignore the image and instead rely on previously generated tokens (prelim)". Based on this, it proposes the Prelim Attention Score (PAS), a training-free method requiring no additional forward passes. By summing the attention weights of prelim tokens as the hallucination score, PAS achieves SOTA object hallucination detection performance across multiple models and datasets.
Prefill-Time Intervention for Mitigating Hallucination in Large Vision-Language Models: PTI shifts steering intervention to mitigate LVLM hallucinations from the "token-by-token decoding phase" forward to the "one-time prefill phase." By applying modality-aware and key/value-decoupled steering vectors to the initial KV cache, it corrects hallucination-prone representations at the source. It outperforms existing decoding-time methods across three LVLMs and five benchmarks and is compatible with them as a plug-and-play enhancement.
Reallocating Attention Across Layers to Reduce Multimodal Hallucination: A lightweight, training-free plugin method is proposed to alleviate hallucinations in Multimodal Large Reasoning Models (MLRMs) by identifying perceptual and reasoning attention heads and applying Class-Conditioned Rescaling. This approach rebalances cross-layer attention allocation, achieving an average improvement of 4.2% across five benchmarks with almost no additional inference overhead.
Residual Decoding: Mitigating Hallucinations in Large Vision-Language Models via History-Aware Residual Guidance: Residual Decoding (ResDec) is proposed—a training-free, plug-and-play decoding strategy that discovers the semantic anchoring stage by analyzing U-shaped JSD patterns in historical token logit distributions. It effectively suppresses language prior hallucinations in LVLMs by aggregating logits from this stage as residual guidance for current decoding, incurring nearly zero additional inference overhead.
Same Attention, Different Truths: Put Logit-Lens over Visual Attention to Detect and Mitigate LVLM Object Hallucination: This paper revisits LVLM object hallucination using Logit-Lens and discovers that the "attention intensity" for real and hallucinated objects is nearly identical in mid-to-late layers. The key issue is not "how much" the model looks, but whether the high-attention regions decode into the target token. Based on this, hallucinations are categorized into "Visual Uncertainty" and "Contextual Prior." A training-free "Detect-and-Mitigate" framework (LLCC detection + HARM masking + VEED decoding enhancement) is proposed, achieving SOTA on multiple hallucination benchmarks.
SVHalluc: Benchmarking Speech-Vision Hallucination in Audio-Visual Large Language Models: SVHalluc is the first benchmark to systematically evaluate whether audio-visual large models can align speech content with corresponding visual signals. By designing 3 coarse-to-fine tasks for both semantic and temporal dimensions (6 tasks, 2405 samples total), experiments reveal that current open-source audio-visual LLMs perform near random guessing on most tasks, while Gemini 2.5 Pro leads significantly—the root cause is not poor unimodal perception, but a lack of cross-modal integration capability.
Tell Model Where to Look: Mitigating Hallucinations in MLLMs by Vision-Guided Attention: The authors propose Vision-Guided Attention (VGA), a training-free method that leverages the semantic features of visual tokens to construct precise visual localization. It guides the model's attention to relevant visual regions, effectively mitigating hallucinations in MLLMs while maintaining compatibility with FlashAttention.
Thinking in Uncertainty: Mitigating Hallucinations in MLRMs with Latent Entropy-Aware Decoding: This paper discovers that hallucinations in Multimodal Large Reasoning Models (MLRMs) are highly concentrated around transition words like because/however/wait, which correspond to high-entropy (high-uncertainty) steps. Consequently, a training-free LEAD decoding strategy is proposed: at high-entropy steps, the single sampled token is replaced with "probability-weighted continuous embeddings" to preserve multiple reasoning hypotheses and inject visual anchors to reinforce visual grounding. At low-entropy steps, the model reverts to standard discrete decoding, consistently reducing hallucinations across multiple MLRMs and benchmarks.
TriDF: Evaluating Perception, Detection, and Hallucination for Interpretable DeepFake Detection: This paper proposes TriDF, the first benchmark to comprehensively evaluate interpretable DeepFake detection across three dimensions: Perception, Detection, and Hallucination. Comprising 55K high-quality samples covering 16 DeepFake types and 3 modalities, it reveals the tripartite coupling relationship where accurate perception is the foundation of reliable detection, but hallucination severely undermines decision-making.
Understanding and Mitigating Hallucinations in Multimodal Chain-of-Thought Models: This paper systematically analyzes the root causes of hallucinations in Multimodal CoT (MCoT) models. It discovers that hallucinations most frequently occur during reasoning steps involving associative free play (termed "divergent thinking"). Consequently, the authors propose a training-free detection and decoding intervention strategy based on visual entropy. This approach reduces CHAIRS by over 30% on Object HalBench while maintaining or even enhancing general reasoning capabilities.
Understanding the Role of Hallucination in Reinforcement Post-Training of Multimodal Reasoning Models: This paper proposes the Hallucination-as-Cue analysis framework to systematically investigate the actual mechanism of RL post-training in multimodal reasoning models through three modality-specific corruption strategies (Blank Image, Random Image, and Textual Removal). It finds that GRPO training significantly improves reasoning performance even under 100% corrupted visual input, challenging the mainstream assumption that "RL training effectively utilizes visual information."
VES-RFT: Rewarding Visual Evidence Sensitivity to Mitigate Hallucinations in Large Vision-Language Models: VES-RFT defines the "change in model decision entropy before and after providing an image" as a label-free Visual Evidence Sensitivity (VES) reward. Combined with a verifiable reward that automatically checks whether generated objects actually exist in the image, the model is jointly optimized using critic-free GRPO. This allows the VLM to learn to be "confident because it saw the image" rather than "blindly confident based on language priors," significantly reducing object hallucinations on POPE / CHAIR / AMBER with minimal training data and zero additional inference overhead.
Zina: Multimodal Fine-grained Hallucination Detection and Editing: Zina proposes a multimodal fine-grained hallucination detection and editing task, designing a two-stage system (detector MLLM + reviewer MLLM) that delegates token copying to a deterministic function to simplify the model burden. Additionally, the VisionHall dataset is constructed (6.9K manual annotations + 20K graph-structured synthetic data), exceeding GPT-4o by 15.8 points in detection F1.