Skip to content

Layer-Specific Fine-Tuning for Improved Negation Handling in Medical Vision-Language Models

Conference: ICML 2026
arXiv: 2602.12498
Code: https://github.com/healthylaife/NAST
Area: Multi-modal VLM / Medical Imaging / Interpretability-guided Training
Keywords: Medical CLIP, Negation Understanding, Causal Tracing, Layer-wise Fine-tuning, LoRA

TL;DR

NAST utilizes causal tracing to calculate the Causal Tracing Effect (CTE) of each layer in the CLIP text encoder toward negation understanding. These CTE scores are then used for layer-wise gradient-scaled LoRA fine-tuning, significantly enhancing the semantic sensitivity of medical VLMs in distinguishing "presence vs. absence" of symptoms and narrowing the affirmative-negation accuracy gap from 21.6% to 4.2%.

Background & Motivation

Background: Medical VLMs such as MedCLIP, BioMedCLIP, and BioViL-T have shown significant performance in image-report alignment and zero-shot diagnosis, and have been explored for automated report generation, retrieval, and decision support.

Limitations of Prior Work: Negation is ubiquitous in radiology reports—"no pneumothorax," "no pleural effusion seen," "no consolidation in the right lower lobe." Negation is not just "absence of an object" but often operates on attributes ("no large effusion," "not right lower lobe consolidation"). However, medical VLMs, predominantly trained on affirmative descriptions during contrastive pre-training, treat negation as a blind spot. Using controlled "affirmative vs. negated semantically equivalent pairs" (e.g., "normal heart size" vs. "no cardiomegaly"), this study found that all mainstream medical VLMs systematically prefer affirmative sentences and exhibit significantly worse negation understanding.

Key Challenge: Simply adding negated samples for fine-tuning (the approach taken by NegCLIP, ConCLIP, and NegBench) provides only marginal relief. This is because negation signals are not uniformly distributed across model layers; they are likely concentrated in specific layers of the text encoder. Tuning parameters uniformly across all layers is inefficient and may degrade other capabilities.

Goal: (i) Provide a polarity-controlled diagnostic benchmark to disentangle poor negation understanding from poor adjective understanding; (ii) provide a fine-tuning dataset to inject "negation knowledge" into medical VLMs at the attribute level (existence/location/severity); (iii) use mechanistic interpretability tools to identify "which layers handle negation" and perform selective fine-tuning to improve negation capability while preserving non-negation performance.

Key Insight: Mechanistic interpretability tools (causal tracing, Meng et al.) are transferred from LLMs to the CLIP text encoder. The question of "which layer and which token is sensitive to negation" is transformed into a computable CTE score, which is then directly fed into the optimizer for layer-wise gradient scaling.

Core Idea: Calculate CTE via causal tracing → Normalize to layer weights \(\alpha_\ell\) → Scale gradients for each layer by \(\alpha_\ell^\beta\) during LoRA fine-tuning, concentrating training resources on the layers truly responsible for negation.

Method

Overall Architecture

NAST consists of three components: (i) MedNega-CXR diagnostic benchmark—composed of affirmative-negate MCQ pairs generated by LLMs based on MIMIC-CXR and reviewed by two radiologists; (ii) Contextual negation fine-tuning dataset—based on CAD annotations, each structured fact \((\text{condition}, \text{existence}, \text{location}, \text{severity})\) undergoes counterfactual perturbation affecting only one attribute, resulting in approximately one million image-text pairs; (iii) NAST algorithm—CTE is first calculated for each layer and position of the text encoder via causal tracing, followed by LoRA fine-tuning using layer-wise weighted gradient updates, targeting the weighted sum of contrastive loss and claim-ranking loss.

Key Designs

  1. MedNega-CXR Diagnostic Benchmark (polarity-controlled MCQ pairs):

    • Function: Directly compares description pairs that are "semantically equivalent but differ in polarity," isolating negation understanding from other confounding abilities.
    • Mechanism: Studies with \(\ge 2\) positive and \(\ge 3\) negative findings are selected from MIMIC-CXR/CheXpert. Affirmative equivalent descriptions are identified for each negative condition (e.g., "no cardiomegaly" \(\leftrightarrow\) "normal heart size"). The workflow involves: constructing contrastive label permutations (hard negatives) → LLM-generated explicit negation MCQs → LLM-based replacement of negation phrases with affirmative equivalents to maintain structure. This yields 6,965 MCQ pairs differing only in polarity.
    • Design Motivation: The medical domain offers a unique advantage—"no pneumonia" can be equivalently expressed as "lungs are well-aerated," allowing for clean contrastive pairs; in general domains, "no car" has no single affirmative equivalent. This controlled contrast ensures the evaluation focuses on negation understanding rather than adjective comprehension or visual perception.
  2. CAD-based Attribute-level Negation Fine-tuning Dataset:

    • Function: Ensures fine-tuning supervision covers realistic clinical negation forms across existence, location, and severity counterfactuals.
    • Mechanism: For each ground-truth fact \((c, e, l, s)\), a counterfactual is generated by altering one attribute (e.g., present \(\leftrightarrow\) absent, left \(\leftrightarrow\) right, small \(\leftrightarrow\) large), translated into natural language via radiology-style templates. Two supervision formats are used: (a) claim-based contrast sets with one correct claim and multiple hard negatives; (b) single negated captions for auxiliary contrastive training.
    • Design Motivation: Existing negation datasets (CC-Neg, NegBench) focus on object presence, lacking critical attribute-level negation for medicine. This study utilizes structured annotations and controlled perturbations to generate 1M pairs, providing sufficient scale and specificity.
  3. CTE-weighted Layer-wise LoRA Fine-tuning:

    • Function: Directly converts the interpretability-derived "layers handling negation" into "layers receiving more updates."
    • Mechanism: (i) Causal tracing is used as a causal probe on the CLIP text encoder. For length-matched pairs (correct caption, foil caption), the hidden states of the foil forward pass are recorded. During the correct caption forward pass, the \(p\)-th token of the \(\ell\)-th layer is replaced by the foil's hidden state to obtain \(S^{\ell,p}\). \(\text{CTE}(\ell, p) = (S^{\text{corr}} - S^{\ell,p}) / (S^{\text{corr}} - S^{\text{foil}})\). Results show negation signals concentrate in layers 1-4, peaking at layer 2. (ii) Token-level CTE is aggregated per layer to get \(\mathrm{CTE}_\ell\), then min-max normalized to \(\alpha_\ell \in [0,1]\). (iii) During LoRA fine-tuning, gradients are scaled as \(\tilde{g}_\ell = \alpha_\ell^\beta \cdot g_\ell\), where \(\beta\) controls concentration. Total loss: \(\mathcal{L}_{\text{total}} = \lambda \mathcal{L}_{\text{CLIP}} + (1-\lambda) \mathcal{L}_{\text{claim}}\).
    • Design Motivation: Uniform LoRA fine-tuning modifies all layers, which consumes pre-trained capabilities and dilutes negation learning. Concentrating updates on "layers truly responsible for negation" is a more efficient and safer injection method. Using \(\alpha_\ell^\beta\) instead of a direct learning rate multiplier preserves a global learning rate to avoid training instability.

Loss & Training

\(\mathcal{L}_{\text{CLIP}}\) is the standard CLIP symmetric contrastive loss applied to batches containing single captions with explicit negation. \(\mathcal{L}_{\text{claim}} = \frac{1}{M}\sum_i \log \frac{\exp(\ell_{i, c_i})}{\sum_j \exp(\ell_{i, j})}\) is the claim-ranking loss, ensuring the correct claim has higher similarity than hard negatives. The optimizer is AdamW with a fixed learning rate, trained on a single RTX 4070. \(\lambda\) and \(\beta\) are key hyperparameters.

Key Experimental Results

Main Results

Contextual negation task (Table 1, in %):

Model R@1↑ R@5↑ Claim Acc.↑
CLIP 23.5 34.7 24.6
NegCLIP 36.2 52.4 41.3
ConCLIP 39.7 55.8 44.9
NegBench 43.1 59.2 48.7
NAST (Ours) 49.5 65.7 55.6

While negation-specialized baselines improve over time, NAST gains an additional 6.9 percentage points in claim accuracy over the strongest baseline.

Ablation Study

Affirmative-Negation gap (Table 3, lower is better) + Update distribution (Table 4):

Model Affirm – Negation Gap (Claim Acc., %)
CLIP 21.6
NegCLIP 12.8
ConCLIP 10.7
NegBench 10.2
NAST 4.2
Method Top-3 Update Share Top-5 Update Share
Uniform FT 28.4% 41.7%
NAST (CTE-weighted) 52.6% 69.3%

CTE weighting successfully concentrates updates in the top negation-sensitive layers, corresponding to the gains in claim accuracy.

Key Findings

  • Layer-wise Localization of Negation: CTE is concentrated in layers 1-4, peaking at layer 2. This aligns with LLM literature where early layers process syntactic function words and deeper layers handle semantics.
  • Source of Improvement: NAST’s gains primarily stem from increased negation accuracy rather than decreased affirmative accuracy—affirmative performance actually slightly improves (Table 2), indicating that CTE guidance does not compromise general alignment capabilities.
  • Sparse Adaptation: The finding that "few layers handle specific functions" suggests that general full-layer LoRA fine-tuning is wasteful. Interpretability-guided sparse fine-tuning could represent the next generation of parameter-efficient adaptation.

Highlights & Insights

  • The "Trace \(\rightarrow\) Prescribe" framework (calculating CTE scores to feed the optimizer as layer weights) serves as a template for advancing mechanistic interpretability from diagnosis to prescription. This paradigm can be adopted by future medical and general VLMs.
  • MedNega-CXR fully leverages the clinical convenience of "affirmative equivalence." While it is difficult to create clean polarity controls in general domains, the medical domain provides a unique testbed for interpretability research.
  • By keeping the backbone frozen and only adding weights to LoRA, NAST successfully narrows the gap from 21.6 to 4.2. This suggests that medical VLMs are very close to handling negation correctly (requiring only a few key layers), rather than needing full retraining.

Limitations & Future Work

  • CTE is calculated based on a synthetic contrast set (e.g., "severe edema vs. no edema"); its transferability to other clinical scenarios (rare diseases, ambiguous expressions) remains unverified.
  • Both causal tracing and LoRA are applied only to the text encoder, leaving the vision encoder and cross-modal projections untouched. If the vision side also contains polarity-sensitive bias, this solution will not address it.
  • Evaluation is limited to MIMIC-CXR style reports and the CheXpert ontology. Testing on other modalities (CT, MRI, pathology) and non-English clinical text would require re-calculating CTE and re-verification.
  • vs. NegCLIP / ConCLIP / NegBench: While these rely on "adding negated samples + contrastive loss," this work adds "layer-targeted optimization" to further improve performance.
  • vs. Causal Tracing for LLM (Meng et al.): Transfers ROME-style causal tracing from LLM knowledge localization to negation processing in CLIP text encoders, and for the first time, uses tracing results as optimizer inputs.
  • vs. Layer-wise Adaptive LR (LARS, LAMB): Unlike those methods which adjust learning rates based on gradient norms, NAST adjusts based on causal contribution, representing a "semantic-aware" version of adaptive learning.

Rating

  • Novelty: ⭐⭐⭐⭐ (Clear path in converting causal tracing into layer-wise training rules.)
  • Experimental Thoroughness: ⭐⭐⭐⭐ (Multiple baselines, diverse tasks, and update distribution ablations.)
  • Writing Quality: ⭐⭐⭐⭐ (Tight pacing from diagnosis to data, method, and evaluation.)
  • Value: ⭐⭐⭐⭐ (Negation understanding in medical safety is a real pain point; CTE weighting is highly reusable.)