🔗 Causal Inference¶
📷 CVPR2026 · 4 paper notes
- Fighting Hallucinations with Counterfactuals: Diffusion-Guided Perturbations for LVLM Hallucination Suppression
-
This paper proposes CIPHER, a training-free test-time hallucination suppression method. It generates semantically altered yet structurally preserved counterfactual images via a diffusion model, applies SVD decomposition to the representation differences between original and counterfactual images in LVLM hidden layers to extract a hallucination subspace, and then projects hidden states onto the orthogonal complement of this subspace during inference. CIPHER is the first method to localize and mitigate LVLM hallucinations by intervening on the visual modality.
- Fighting Hallucinations with Counterfactuals: Diffusion-Guided Perturbations for LVLM Hallucination Suppression
-
This paper proposes CIPHER, a training-free test-time hallucination suppression method. In the offline phase, a diffusion model is used to generate counterfactual images, constructing the OHC-25K dataset, from which visual hallucination subspaces are extracted via SVD. During inference, hidden states are projected onto the orthogonal complement of this subspace, significantly reducing visual hallucinations in LVLMs without modifying model parameters or incurring additional inference overhead.
- MaskDiME: Adaptive Masked Diffusion for Precise and Efficient Visual Counterfactual Explanations
-
This paper proposes MaskDiME, a training-free diffusion framework that transforms global classifier guidance into decision-driven local editing via an adaptive dual-mask mechanism, enabling precise and efficient visual counterfactual explanations. MaskDiME achieves inference speeds more than 30× faster than DiME while requiring only one-tenth the GPU memory of ACE/RCSB.
- Retrieving Counterfactuals Improves Visual In-Context Learning
-
This paper proposes CIRCLES, a framework that retrieves counterfactual demonstrations via attribute-guided composed image retrieval, constructing dual-channel in-context demonstrations combining causality and correlation to substantially improve fine-grained visual reasoning in VLMs.