FlexAC: Towards Flexible Control of Associative Reasoning in Multimodal Large Language Models¶

Conference: NeurIPS 2025 arXiv: 2510.11190 Code: github.com/ylhz/FlexAC Area: Multimodal VLM Keywords: hallucination control, creativity enhancement, associative reasoning, intermediate-layer intervention, steering vectors

TL;DR¶

FlexAC identifies that associative reasoning in MLLMs is primarily encoded in intermediate layers. By extracting steering vectors from hallucinated responses and injecting them into intermediate-layer representations at inference time, it enables flexible control over faithfulness and creativity—reducing hallucination rate by 29% (CHAIR) and improving creativity by 5.8× (Creation-MMBench), all without any training.

Background & Motivation¶

Background: Multimodal LLMs face an inherent tension between faithfulness (low association) and creativity (high association)—factual tasks require suppressing associative reasoning, while creative tasks require amplifying it.

Limitations of Prior Work: (1) Hallucination mitigation methods (e.g., contrastive decoding in VCD, preference optimization in Ha-DPO) broadly suppress associative capacity while reducing hallucinations, leading to degraded creativity (VDAT drops by 1.78); (2) No tunable mechanism exists—methods either fully suppress association or leave it unaddressed.

Key Challenge: Hallucination and creativity likely share the same associative mechanism, manifesting as either "harmful" or "beneficial" depending on the task, yet existing methods cannot distinguish between the two.

Goal: (1) Localize the layer at which associative behavior emerges in MLLMs; (2) Design a controllable mechanism for adjusting associative strength.

Key Insight: Drawing from cognitive science concepts of convergent thinking (fact-grounded association) and divergent thinking (atypical association), the paper hypothesizes that hallucination and creativity stem from a shared associative mechanism that can be regulated via directional interventions in intermediate-layer representations.

Core Idea: Hallucinated responses encode directional information about association. Extracting the representation difference between hallucinated and faithful outputs as a steering vector allows positive injection to enhance creativity and negative injection to suppress hallucination.

Method¶

Overall Architecture¶

The method proceeds in two phases: Phase I (offline) constructs general and task-specific associative steering vectors; Phase II (inference-time) injects steering vectors into intermediate layers with adaptive strength calibration. No retraining is required throughout.

Key Designs¶

Intermediate-Layer Associative Behavior Analysis and Localization:
- Function: Identify at which layers associative behavior emerges in the model.
- Mechanism: Collect faithful responses \(f^{(n)}\) and hallucination-induced responses \(f^{(a)}\) for 1,000 COCO images, and compute cosine distance \(\mathcal{D}_\text{cos}\) and Euclidean distance \(\mathcal{D}_\text{Euc}\) layer by layer. A layer intervention experiment is further conducted—at layer \(m\), the associative feature is replaced with the non-associative counterpart \(f_m^{\text{modified}} = f_m^{(n)}\), and the effect on downstream layers is observed.
- Key Findings: (1) Early layers (0–9) exhibit low distance, indicating shared low-level perception; (2) Cosine distance peaks at intermediate layers (10–15), indicating that associative direction is formed there; (3) Replacing intermediate-layer features substantially reduces downstream divergence, confirming that intermediate layers are the origin—not merely propagators—of associative behavior.
Hallucination-Guided Associative Steering Vector Construction (Phase I):
- Function: Extract directional vectors from hallucinated responses for association control.
- Mechanism: For each sample, compute the directional difference at layer \(l\) as \(v_l = f_l^{(a)} - f_l^{(n)}\), select the Top-K sample pairs with the largest cosine distance, and average them to obtain a general steering vector: \(\mathcal{I} = \text{Top-K}(\mathcal{D}_\text{cos}(f_{l,i}^{(a)}, f_{l,i}^{(n)})); \quad v_l = \frac{1}{|\mathcal{I}|} \sum_{i \in \mathcal{I}} (f_{l,i}^{(a)} - f_{l,i}^{(n)})\)
- Design Motivation: Top-K selection reduces noise; experiments show that randomly sampling 50 images from 2,000 suffices to construct an effective vector.
Task-Specific Associative Vectors (Directional Integration):
- Function: Construct dedicated steering directions for different creative tasks such as story generation and metaphor production.
- Mechanism: GPT-4o is used to generate high-association outputs for the target task; the difference between their intermediate-layer features and those of the base model output defines a task-specific vector \(v_l^{\text{task}}\). At inference time, this is combined with the general vector: \(f_l^{\text{control}} = f_l + \alpha_\text{gen} \cdot v_l^{\text{gen}} + \alpha_\text{task} \cdot v_l^{\text{task}}\)
- Design Motivation: Associative reasoning is multi-dimensional (event planning vs. literary creation require different associative directions), and a single vector is insufficient.
Strength-adaptive Intervention Calibration (SIC):
- Function: Prevent semantic drift caused by excessive steering.
- Mechanism: The steering strength \(\alpha\) is adaptively adjusted based on the alignment between the current representation and the steering direction: \(\alpha = \text{sigmoid}\left(\max\left(-\frac{f_l \cdot v_l}{\|f_l\|\|v_l\|}, 0\right)\right)\) When the current representation is already aligned with the associative direction, \(\alpha\) is small (suppressing over-steering); when misaligned, \(\alpha\) is large (amplifying steering). Post-intervention normalization preserves feature scale: \(f_l^{\text{control}} \leftarrow f_l^{\text{control}} \cdot \frac{\|f_l\|}{\|f_l^{\text{control}}\|}\)
- Design Motivation: Uniformly applying steering vectors causes excessive deviation for inputs that already exhibit strong associative tendencies.

Loss & Training¶

FlexAC is entirely training-free. It only requires offline construction of steering vectors (50 images + GPT-4o-generated samples) and injects them into intermediate layers at inference time.

Key Experimental Results¶

Main Results: Hallucination Benchmarks¶

Model	Method	CHAIR_S↓	CHAIR_I↓	POPE F1↑
Qwen-VL	Regular	40.6	12.5	85.6
Qwen-VL	VCD	42.0	11.2	86.3
Qwen-VL	FlexAC	19.2	5.4	87.1
LLaVA-1.5	Regular	50.8	14.3	86.5
LLaVA-1.5	Ha-DPO	36.8	10.4	83.9
LLaVA-1.5	FlexAC	36.6	10.4	87.9
DeepSeek-VL2	Regular	32.6	9.2	88.5
DeepSeek-VL2	FlexAC	28.6	8.1	88.6

Creativity Benchmarks¶

Method	VDAT (Qwen)	VDAT (LLaVA)	Creation-MMBench Reward
Regular	84.85	86.89	0.00
Ha-DPO	—	85.11↓	—
VCD	83.69↓	86.83	-3.86↓
FlexAC	86.58↑	88.49↑	10.92↑

FlexAC is the only method that simultaneously improves both faithfulness and creativity; all other methods either degrade creativity when reducing hallucinations, or fail to substantially improve either.

Ablation Study¶

Configuration	CHAIR_S↓	VDAT↑
FlexAC-P (full, α=−1)	19.2	—
FlexAC-C (full, α=1)	—	86.58
FlexAC − IS − SIC	30.4	85.05
FlexAC − DI	~20	85.8
Regular	40.6	84.85

Key Findings¶

Intermediate layers (Qwen: 15–17, LLaVA: 11–13, DeepSeek: 4–6) are the optimal intervention points; interventions at early or deep layers are marginally effective.
Instance Selection and SIC contribute most to hallucination mitigation (removing both raises CHAIR_S from 19.2 to 30.4).
Directional Integration is critical for creativity enhancement, reflecting the multi-dimensional nature of associative reasoning.
FlexAC does not degrade general benchmarks (MME/MMMU/MMStar) and in fact yields gains, particularly on OCR tasks due to enhanced text–visual association.

Highlights & Insights¶

Unified View of Hallucination as Association: Framing "harmful hallucination" and "beneficial creativity" as a continuous spectrum of associative strength, and regulating both bidirectionally with the same mechanism, is a novel and elegant conceptual contribution.
Layer Intervention Methodology: The feature-replacement experiment precisely localizes the layers at which association originates (rather than propagates), offering a generalizable analytical paradigm for dissecting other model behaviors.
Adaptive Design of SIC: A simple cosine-angle threshold achieves sample-level dynamic strength control, preventing over-steering; its empirical necessity is confirmed by ablation results.

Limitations & Future Work¶

Requires white-box access to model intermediate layers, making it inapplicable to black-box API models such as ChatGPT.
Steering vectors are constructed from COCO images; whether reconstruction is necessary for domain-shifted scenarios (e.g., medical imaging) remains to be validated.
The VDAT metric is based on CLIP embedding semantic distance and may not fully capture human perception of creativity.
Validation is limited to 7B-scale models; the intermediate-layer dynamics of larger models may differ.

vs. VCD (contrastive decoding): VCD contrasts clear and distorted inputs at the decoding level; FlexAC directly manipulates associative direction at the representation level, enabling more precise and bidirectional control.
vs. Ha-DPO (preference optimization): Ha-DPO requires additional training and irreversibly suppresses association; FlexAC is training-free and allows flexible switching.
vs. CAA (contrastive activation addition): FlexAC can be viewed as an extension of CAA to multimodal settings, augmented with SIC adaptive calibration and task-specific directional integration.

Rating¶

Novelty: ⭐⭐⭐⭐⭐ The unified hallucination–creativity perspective is a genuinely novel insight; the training-free bidirectional control framework offers strong practical utility.
Experimental Thoroughness: ⭐⭐⭐⭐⭐ Covers 3 categories of benchmarks (hallucination / creativity / general), 7 benchmarks, and 3 models, with comprehensive ablations.
Writing Quality: ⭐⭐⭐⭐ The logical chain from analysis to findings to method is clear and well-illustrated.
Value: ⭐⭐⭐⭐⭐ Highly practical—training-free, plug-and-play, and effective, with direct implications for MLLM deployment.