Intervene-All-Paths: Unified Mitigation of LVLM Hallucinations across Alignment Formats¶

Conference: NeurIPS 2025 arXiv: 2511.17254 Code: https://github.com/SooLab/AllPath Area: Multimodal VLM / Hallucination Mitigation / Attention Head Intervention Keywords: hallucination, attention head intervention, causal path, multi-path framework, training-free

TL;DR¶

This paper proposes AllPath, a multi-path hallucination intervention framework grounded in the Transformer causal architecture. It is the first to demonstrate that hallucinations in LVLMs do not stem from a single causal path but from the interaction of three paths — image-to-input-text, image-to-output-text, and text-to-text — and that models adaptively rely on different paths depending on the question-answer alignment format. By designing lightweight key-head identification methods for each path and performing adaptive intervention, AllPath consistently reduces hallucinations across four benchmarks covering different alignment formats: POPE, MCQ-POPE, CHAIR, and MME.

Background & Motivation¶

Background: LVLM hallucination mitigation methods fall broadly into two categories — contrastive decoding methods (VCD, ICD) that reduce language bias by calibrating output distributions, and intervention methods (PAI, AD-HH) that directly manipulate attention weights/heads to enhance visual grounding or suppress text-dominant behavior.

Limitations of Prior Work: Each existing method typically intervenes on only a single causal path — PAI targets only the image→output-text path, while AD-HH targets only the input-text→output-text path — causing each to excel on only a subset of benchmarks (PAI performs well on CHAIR but moderately on POPE; VCD exhibits the opposite pattern).

Key Challenge: Hallucinations arise not from a single path but from the interaction of multiple paths. More critically, LVLMs rely on different causal paths for different question-answer formats (binary / multiple-choice / open-ended description), making single-path intervention insufficient to cover all scenarios.

Key Insight: Starting from the Transformer causal architecture, the paper systematically analyzes all possible information propagation paths and designs targeted key-head identification and intervention methods for each.

Core Idea: A multi-path framework with adaptive path-selection intervention, requiring only a single forward pass to score all heads.

Method¶

Overall Architecture¶

AllPath proceeds in three steps: (1) identify text-to-text (T2T) and image-to-text (I2T) key attention heads using two lightweight methods; (2) analyze the roles of these heads across different paths and alignment formats; (3) adaptively select intervention paths based on question type, amplifying positive heads and suppressing negative ones.

Key Designs¶

Text-to-Text Head Identification (LPI Score):
- Function: Quantifies the degree to which each attention head promotes hallucinated vs. non-hallucinated tokens.
- Mechanism: Defines the Log Probability Increase (LPI) score \(\text{logProb}_{\uparrow}^{(l,n)}(\mathcal{B}_t) = \log\sum_{b\in\mathcal{B}_t}\mathbb{P}(b|h_t^{(l-1)}+H_t^{(l,n)}) - \log\sum_{b\in\mathcal{B}_t}\mathbb{P}(b|h_t^{(l-1)})\), computed separately over the hallucination set \(\mathcal{B}_t^-\) and the non-hallucination set \(\mathcal{B}_t^+\); the T2T Score is \(S_{\text{T2T}}^{(l,n),+} - S_{\text{T2T}}^{(l,n),-}\), where a lower value indicates a stronger tendency to promote hallucinations.
- Design Motivation: Compared to zero-out strategies (which require a full forward pass per head) and training-based methods (which require annotated data), LPI scores all heads in a single forward pass.
Image-to-Text Head Identification (I2T Score):
- Function: Identifies which attention heads exhibit semantically aligned attention patterns over visual tokens.
- Mechanism: Target tokens \(\mathcal{T}_{\text{I2T}}\) (first occurrences of object words) are partitioned into those present in the image \(\mathcal{T}_{\text{I2T}}^+\) and those absent \(\mathcal{T}_{\text{I2T}}^-\). For \(\mathcal{T}_{\text{I2T}}^+\), the total attention of the head over the corresponding region \(M_r\) is computed; for \(\mathcal{T}_{\text{I2T}}^-\), the total attention over the entire image is used. \(S_{\text{I2T}}^{(l,n)} = S_{\text{I2T}}^{(l,n),+} - S_{\text{I2T}}^{(l,n),-}\).
- Design Motivation: A good I2T head should concentrate attention on the relevant region when the object is present, and distribute attention diffusely when the object is absent.
Key Finding — Strong Correlation between T2T Heads and Alignment Format:
- T2T heads are highly correlated across different benchmarks sharing the same alignment format (e.g., Yes/No), with \(\rho=0.82\), but correlation drops sharply across formats (Yes/No vs. MCQ), with \(\rho=0.12\).
- This indicates that T2T heads primarily govern instruction following and format alignment rather than visual understanding.
Key Finding — Independence of Image-to-Input-Text and Image-to-Output-Text Paths:
- Although both types of I2T heads attend to visual content more than average heads, they exhibit almost no correlation with each other.
- This implies that intervening solely on the output path is insufficient; heads on the input path also require intervention.
Adaptive Intervention Strategy:
- Function: Selects the combination of paths to intervene based on question type, applying different scaling factors to selected heads.
- Mechanism: The top-\(\xi\) and bottom-\(\xi\) T2T heads (\(Z_{\text{T2T}}^+\), \(Z_{\text{T2T}}^-\)) and top-\(\zeta\) I2T heads (\(Z_{\text{I2T}}^+\)) are selected. The MHA output is modified as \(\tilde{H}_{\leq t}^{(l)} = \sum_n \lambda^{(l,n)} H_{\leq t}^{(l,n)}\), where \(\lambda^{(l,n)} = \gamma^+\) if \((l,n)\in Z^+\), \(\gamma^-\) if \((l,n)\in Z^-\), and \(1\) otherwise.
- Default settings: \(\gamma^+=2.0\), \(\gamma^-=0.0\) (negative heads are directly zeroed out).

Loss & Training¶

Training-free: The entire method requires no training; head identification is performed via a single forward pass at inference time, followed by scaling-based intervention.
Short-answer tasks: \(\xi=20, \zeta=10\); open-ended tasks: \(\xi=40, \zeta=50\).

Key Experimental Results¶

Main Results — POPE & MCQ-POPE & CHAIR¶

Method	POPE-Random Acc	POPE-Advers. Acc	MCQ-POPE-Random Acc	CHAIR \(C_S\)↓	CHAIR \(C_I\)↓
Vanilla	85.1	80.9	72.8	52.2	14.6
VCD	86.3	81.4	78.2	58.2	16.1
PAI	86.4	82.5	78.0	28.8	7.9
AD-HH	85.0	80.9	78.5	33.2	7.5
AllPath	87.2	82.8	80.5	26.6	7.2

MME Hallucination Subset¶

Method	Existence↑	Count↑	Position↑	Color↑	Total↑
Vanilla	180.0	113.9	116.7	129.4	540.0
VCD	177.8	122.8	122.2	141.7	564.4
PAI	185.0	122.8	114.4	144.4	566.7
AllPath	188.3	126.1	132.2	153.3	600.0

Ablation Study — Number of Heads and Scaling Factors¶

\(\xi\) (T2T heads)	\(\zeta\) (I2T heads)	POPE-Rand Acc	POPE-Advers. Acc
0	10	86.2	81.6
20	0	86.4	82.5
20	10	87.2	82.8
30	10	88.3	82.9
20	15	88.2	83.0

Key Findings¶

VCD performs well on POPE but degrades on CHAIR (\(C_S\) worsens from 52.2 to 58.2) due to intervening only on the text path; PAI performs well on CHAIR but yields limited improvement on POPE/MCQ-POPE — validating the limitations of single-path intervention.
AllPath is the only method that consistently improves performance across all benchmarks spanning three distinct alignment formats.
Completely removing attention from output-text to image results in only ~2% degradation on POPE (indicating the text-to-text path is dominant for Yes/No formats) but a 10% drop on CHAIR (indicating the image-to-text path is critical for open-ended description).
Directly reusing the POPE head configuration for MME still yields consistent improvement, validating the generalizability of the method.

Highlights & Insights¶

The design of scoring all heads in a single forward pass is a core technical contribution, offering orders-of-magnitude efficiency gains over zero-out strategies that require one forward pass per head.
The finding that T2T heads are strongly correlated with alignment format has theoretical value — it explains why different methods perform inconsistently across benchmarks and provides structured guidance for future intervention method design.
The discovery that image-to-input-text and image-to-output-text paths are independent challenges the intuitive assumption that only the output path needs attention.
The plug-and-play nature of the method allows direct application to any Transformer-based LVLM.

Limitations & Future Work¶

Validation is primarily conducted on LLaVA-v1.5-7B; results on Qwen-VL and Qwen2.5-VL are included in the appendix but remain limited in scope.
The scaling factors \(\gamma^+, \gamma^-\) and head counts \(\xi, \zeta\) require manual tuning for different task formats.
The selection of T2T and I2T heads relies on annotated hallucinated/non-hallucinated tokens, necessitating a small labeled set.
Whether head re-identification is needed for entirely new question-answer formats (e.g., chain-of-thought) remains unexplored.
Whether higher-order path interactions (e.g., second-order effects) exist in multimodal large models warrants further investigation.

vs. VCD: VCD calibrates output distributions by contrasting distorted visual inputs, effectively intervening only on the image-to-output-text path, and even degrades on CHAIR; AllPath's multi-path intervention is more comprehensive.
vs. PAI: PAI enhances image attention (image-to-text path) and excels on CHAIR but yields limited improvement on POPE/MCQ-POPE; AllPath additionally covers the T2T path.
vs. AD-HH: AD-HH suppresses "lazy" text-dominant heads but addresses only the text-to-text path; AllPath demonstrates the necessity of simultaneously intervening across multiple paths.

Rating¶

Novelty: ⭐⭐⭐⭐ The multi-path framework is conceptually clear; the T2T/I2T head identification methods are concise and elegant; the finding that "format determines path" is valuable.
Experimental Thoroughness: ⭐⭐⭐⭐⭐ Four benchmarks spanning different alignment formats, correlation analysis, ablations, and multi-model validation — highly comprehensive.
Writing Quality: ⭐⭐⭐⭐ Logically clear; the narrative structure of analysis → finding → method → validation is well-organized.
Value: ⭐⭐⭐⭐ Training-free and applicable to diverse hallucination scenarios; provides meaningful insights into the internal mechanisms of LVLMs.