Minimizing False-Positive Attributions in Explanations of Non-Linear Models¶

Conference: NeurIPS 2025 arXiv: 2505.11210 Code: GitHub Area: Explainable AI / Interpretability Keywords: XAI, suppressor variables, local explanations, generative explanation, LIME

TL;DR¶

This paper proposes PatternLocal to address false-positive attributions caused by suppressor variables in XAI explanations of non-linear models. The method converts local discriminative surrogate weights into a generative representation, and significantly reduces false-positive feature attributions on three datasets: the XAI-TRIS benchmark, MRI artificial lesions, and EEG motor imagery.

Background & Motivation¶

Background: XAI methods such as LIME, KernelSHAP, and gradient-based approaches are widely used to explain the decision-making process of black-box models, and are particularly critical in high-stakes domains such as healthcare and finance.

Limitations of Prior Work: Existing studies have demonstrated that mainstream XAI methods including LIME and SHAP assign importance weights to suppressor variables — variables that influence model predictions but have no direct statistical dependence on the target variable. For example, when a model predicts epilepsy by exploiting noise probes in irrelevant brain regions, XAI methods may erroneously flag those regions as important.

Key Challenge: The Pattern method for linear models can distinguish discriminative weights from generative activation patterns to eliminate suppressor variable effects, but this method and its deep network extensions (PatternNet/PatternAttribution) perform poorly in non-linear settings and cannot effectively handle suppressor variables in local non-linear explanations.

Goal: Extend suppressor variable suppression from globally linear models to local explanations of non-linear models, addressing instance-level false-positive attribution.

Key Insight: Local linear surrogate weights are first obtained via LIME, KernelSHAP, or gradient methods, and then converted from discriminative weights to a generative representation through a data-driven forward model.

Core Idea: Building on local linear surrogates produced by methods such as LIME, the discriminative weights are converted into generative activation patterns (Patterns) via kernel-weighted regression, thereby naturally eliminating the influence of suppressor variables.

Method¶

Overall Architecture¶

PatternLocal is a two-stage, model-agnostic XAI method: 1. Stage 1 (Local Linear Surrogate): A local linear surrogate is constructed for the target sample \(\mathbf{x}_\star\) using LIME, KernelSHAP, or gradient methods, yielding a discriminative weight vector \(\mathbf{w}\). 2. Stage 2 (Generative Conversion): Using training data within the neighborhood of \(\mathbf{x}_\star\), the surrogate prediction \(\tilde{y} = \mathbf{w}^\top \mathbf{h}(\mathbf{x})\) is regressed onto the simplified input space \(\mathbf{h}(\mathbf{x})\), producing a generative activation pattern \(\mathbf{a}\).

Key Designs¶

Suppressor Variable Elimination (Core Principle): The central idea of the Pattern method is that a discriminative model weight \(\mathbf{W}\) corresponds to a unique forward model \(\mathbf{A} = \Sigma_\mathbf{X} \mathbf{W} \Sigma_\mathbf{M}^{-1}\). The activation patterns of the forward model retain only features statistically related to the target, naturally eliminating suppressor variables. PatternLocal generalizes this principle to the local non-linear setting.
Kernel-Weighted Local Regression: The formal objective of PatternLocal is: \(\mathbf{a} = \arg\min_\mathbf{u} \mathbb{E}_{\mathbf{x} \sim \mathbb{P}_\mathcal{X}} \left[ \Pi_{\mathbf{x}'_\star}(\mathbf{h}(\mathbf{x})) \| \mathbf{h}(\mathbf{x}) - \mathbf{u} \tilde{y} \|_2^2 \right] + \lambda Q(\mathbf{u})\) where \(\Pi\) is a local kernel function that enforces locality of the explanation, and \(Q\) is a regularization term.
Closed-Form Solution (Ridge Regression Form): When \(Q(\mathbf{u}) = \|\mathbf{u}\|_2^2\), a closed-form solution exists: \(\mathbf{a}_{\ell_2} = \frac{\text{Cov}_\Pi[\mathbf{h}(\mathbf{x}), \tilde{y}]}{\text{Var}_\Pi[\tilde{y}] + \lambda}\) That is, the kernel-weighted covariance between simplified features and the surrogate response, divided by the regularized variance.
Input Simplification Schemes: Three input simplifications \(\mathbf{h}\) are supported: (a) identity mapping (raw features); (b) superpixel representation; (c) low-rank approximation. Different schemes are applicable to different scenarios.

Loss & Training¶

Regularization supports either L1 (Lasso) or L2 (Ridge); L1 induces sparsity while L2 admits a closed-form solution.
Hyperparameters are tuned via Bayesian optimization (TPE algorithm) on a validation set using the EMD metric.
The local kernel function \(\Pi\) ensures that explanations reflect only the behavior in the neighborhood of \(\mathbf{x}_\star\).

Key Experimental Results¶

Main Results — XAI-TRIS Benchmark (MLP, Identity Mapping)¶

Method	LIN-WHITE EMD↓	XOR-CORR EMD↓	RIGID-CORR EMD↓	XOR-CORR IME↓
PatternLocal	Best	Significantly best	Comparable to filter methods	Significantly best
LIME	Second	High	High	High
KernelSHAP	Second	High	High	High
Gradient	Moderate	High	High	High
IntegratedGrad	Moderate	Moderate	Moderate	Moderate
Sobel (filter)	Lower	Moderate	Lower	Moderate
Laplace (filter)	Lower	Moderate	Lower	Moderate

Toy Example Validation (XOR Problem, Mean Attribution Magnitude for Suppressor Variable x3)¶

Method	Mean Attribution to x3
LIME	~0.18 (erroneous attribution)
KernelSHAP	~0.17 (erroneous attribution)
Gradient	~0.19 (erroneous attribution)
PatternLocal	~0.01 (near zero)

EEG Motor Imagery Dataset (Physiological Plausibility Evaluation)¶

Method	Dipole Fit Goodness (mean±std)
PatternLocal	0.756 ± 0.090
Raw instances	0.738 ± 0.013
LIME	0.604 ± 0.013

Key Findings¶

PatternLocal significantly outperforms all other XAI methods in XOR and RIGID scenarios (non-linear problems with suppressor variables).
In the RIGID-CORR scenario, Sobel/Laplace filters perform well due to the rigid edge structure of XAI-TRIS images, but this advantage does not generalize to complex backgrounds such as MRI.
On the MRI artificial lesion dataset, PatternLocal explanations align with true lesion locations more accurately than LIME, while filter methods fail due to the absence of clear edges.
In EEG experiments, PatternLocal explanations are physiologically plausible in both time-frequency and source analysis domains, localizing feature patterns to the expected motor cortex regions (contralateral activity).

Highlights & Insights¶

Theoretical Elegance: The unified perspective of discriminative-to-generative conversion is compelling; the toy example provides a mathematical proof that PatternLocal exactly eliminates the suppressor variable (\(a_3=0\)).
Model-Agnostic: The method can serve as a plug-and-play post-processing module for any XAI method that produces local linear surrogates (LIME/SHAP/gradient families).
Closed-Form Availability: The Ridge variant admits an analytic solution, enabling computational efficiency.
Cross-Modal Validation: The method is validated on both images (synthetic and MRI) and EEG time-series signals, demonstrating generality beyond the visual domain.
Experimental Rigor: Bayesian hyperparameter optimization ensures fair comparison; multiple input simplifications, regularization schemes, and model combinations are evaluated.

Limitations & Future Work¶

Requires Access to Training Data: Sufficient training samples within the neighborhood of the explained instance are required; this limits applicability in privacy-sensitive or data-scarce settings.
Spatial Alignment Assumption: The method assumes a degree of consistency or alignment across the input space, which may not hold for natural images or user-generated content.
RIGID-CORR Scenario: The method does not outperform simple edge filters in scenes with rigid edges, indicating that it does not dominate across all structural types.
Residual Misattribution: As with other XAI methods, saliency maps should be treated as indicative rather than definitive.
Large-Scale Modern Architectures Untested: Experiments focus on MLP/CNN/ShallowNet; whether more complex models such as Transformers benefit equally remains to be verified.

vs. Pattern/PatternNet: The original Pattern method applies only to globally linear models; PatternNet extends it to deep networks but performs poorly on the non-linear suppressor variable benchmark. PatternLocal overcomes this limitation through local surrogates combined with kernel-weighted regression.
vs. LIME/KernelSHAP: LIME and SHAP are inherently discriminative local linear surrogates. PatternLocal adds a generative conversion step on top of them, eliminating suppressor variables with virtually no additional assumptions.
vs. Filter Methods (Sobel/Laplace): Filters happen to perform well on synthetic images with clear edges but are unsuitable for complex backgrounds such as MRI and carry no theoretical guarantees.

Rating¶

Novelty: ⭐⭐⭐⭐ The idea of extending the Pattern method from globally linear to locally non-linear settings is concise and compelling, and the theoretical analysis via the toy example is convincing.
Experimental Thoroughness: ⭐⭐⭐⭐ Three datasets across different modalities (synthetic images / MRI / EEG), with extensive hyperparameter search and ablation experiments.
Writing Quality: ⭐⭐⭐⭐⭐ The theoretical derivation from linear to non-linear settings is clear, the toy example is intuitive, and the overall structure is rigorous.
Value: ⭐⭐⭐⭐ Improving the reliability of XAI explanations has practical value in high-stakes domains such as healthcare, and the model-agnostic nature facilitates easy integration.