Provably Explaining Neural Additive Models¶
Conference: ICLR 2026
arXiv: 2602.17530
Code: None
Area: Interpretability / Formal Verification
Keywords: Neural Additive Models, Provable Explanations, Cardinally-minimal Explanations, Formal Verification, Explainable AI
TL;DR¶
Specialized efficient explanation algorithms are designed for Neural Additive Models (NAMs). By requiring only logarithmic verification queries, these algorithms generate cardinally-minimal explanations that outperform existing general-purpose subset-minimal explanation algorithms in terms of both speed and explanation quality.
Background & Motivation¶
The interpretability of neural networks is a core issue for AI safety and trustworthy deployment. Existing post-hoc explanation methods face several challenges:
Background: Most explanation methods (e.g., SHAP, LIME, Grad-CAM) are inherently heuristic and cannot guarantee the correctness of the explanations. For instance, the feature importance rankings provided by SHAP might not truly reflect the basis of the model's decision.
Limitations of Prior Work: The key to obtaining explanations with provable guarantees is finding a "cardinally-minimal subset"—the smallest number of input features sufficient to determine the model's prediction. For standard neural networks, this typically requires: - Exponential verification queries relative to the number of input features. - Each query itself being an NP-hard problem. - Thus, it is generally computationally infeasible.
Key Insight: Neural Additive Models are a family of inherently more interpretable neural networks. The core structure of a NAM is \(f(\mathbf{x}) = h_1(x_1) + h_2(x_2) + \cdots + h_n(x_n)\), where each \(h_i\) is an independent univariate neural network. This additive structure suggests that explanations should be easier to obtain, yet existing work has not fully exploited this structural property.
Key Challenge: Most existing algorithms can only find "subset-minimal" explanations (where no further features can be removed) but cannot guarantee finding "cardinally-minimal" explanations (the subset containing the minimum number of features). Cardinally-minimal explanations are more informative but significantly harder to compute.
Core Problem: Can the additive structure of NAMs be leveraged to efficiently generate provable cardinally-minimal explanations?
Method¶
Overall Architecture¶
The problem addressed is to find the smallest subset of features \(S\) for a trained NAM, a specific input \(\mathbf{x}\), and a perturbation radius \(\epsilon_p\), such that fixing the features in \(S\) keeps the model's prediction unchanged regardless of the values other features take within the perturbation ball \(B_{\epsilon_p}(x_i)\). This is defined as a "cardinally-minimal sufficient explanation." While this requires exponential search for general networks, this paper utilizes the additive structure \(f(\mathbf{x}) = \beta_0 + \sum_i f_i(x_i)\) to decompose the process into two stages: parallelizable offline preprocessing (Stage 1), which performs verification on small univariate components \(f_i\) to derive an importance order; and online selection (Stage 2), which uses binary search over the "top-\(m\) most important features" based on this order. This reduces the exponential search to \(O(\log n)\) full-model verification queries.
%%{init: {'flowchart': {'rankSpacing': 24, 'nodeSpacing': 28, 'padding': 6, 'wrappingWidth': 400}}}%%
flowchart TD
IN["Input: Trained NAM<br/>f = β₀ + Σ fᵢ(xᵢ), instance x, radius εₚ"]
D1["Additive Structure Decomposition<br/>Split into n univariate components fᵢ"]
S1["Stage 1: Parallel Interval Importance Sorting<br/>Binary verification for bounds [lᵢ, uᵢ] → Feature importance total order"]
S2["Stage 2: Binary Selection on Sorted Prefixes<br/>Binary search on top-m prefixes, O(log n) sufficiency queries"]
OUT["Output: Cardinally-minimal sufficient explanation S"]
V["Formal Sufficiency Verification<br/>suff(): Provable worst-case"]
IN --> D1 --> S1 --> S2 --> OUT
V -.-> S1
V -.-> S2
Key Designs¶
1. Mechanism: Decomposing the global sufficiency problem into univariate analysis
The difficulty in explaining general neural networks stems from feature entanglement; determining if a subset is sufficient requires treating the entire network as a black box. The structure of a NAM, \(f(\mathbf{x}) = \beta_0 + \sum_i f_i(x_i)\), naturally breaks this entanglement—each feature \(x_i\) contributes an independent term \(f_i(x_i)\). Consequently, the impact of perturbing feature \(i\) on the decision boundary can be analyzed independently for each component \(f_i : \mathbb{R} \to \mathbb{R}\) and compared directly, without considering interactions. This is the root of the efficiency: reasoning occurs across \(n\) one-dimensional functions instead of an \(n\)-dimensional joint space.
2. Design Motivation: Stage 1 Parallel Interval Importance Sorting
To determine which features to fix or discard, an importance ranking is required. Assuming a prediction \(f(\mathbf{x}) = 1\) (i.e., \(\beta_0 + \sum_i f_i(x_i) \ge 0\)), the algorithm computes how much a perturbation pulls the prediction toward the decision boundary:
Instead of calculating exact values, the algorithm uses binary verification queries to refine interval bounds \([l_i, u_i]\) (where \(l_i \le f_i(x_i^*) \le u_i\)) until a total order is established based on relative deviations \(\Delta l_i = f_i(x_i) - l_i\) and \(\Delta u_i = f_i(x_i) - u_i\). This process is parallelizable and performed on small univariate networks, requiring only logarithmic queries relative to precision.
3. Novelty: Stage 2 Binary Selection on \(O(\log n)\) Queries
Given the total order, the cardinally-minimal explanation must be a prefix of the "top-\(m\) most important features" (Prop 1: replacing a feature in an explanation with a more important one remains sufficient). Thus, binary search is performed on the prefix length \(m\): each step uses one sufficiency verification \(\text{suff}(f, \mathbf{x}, \{F[1],\dots,F[m]\}, \epsilon_p)\) to check if the prefix is sufficient. This locates the shortest sufficient prefix in \(O(\log n)\) queries, compared to \(O(n)\) for naive greedy removal.
4. Function: Provable Worst-case Sufficiency Verification
"Sufficiency" is strictly defined here: once features in \(S\) are fixed, any values taken by features outside \(S\) within the perturbation ball will not change the prediction. This worst-case judgment is provided by neural network verifiers (using interval propagation or linear relaxation), yielding a provable yes/no answer. Unlike sampling methods like SHAP, this ensures no adversarial values can overturn the explanation.
Loss & Training¶
Ours is a post-processing method and does not involve specific loss functions or training strategies. The complexity of Stage 1 is \(O\!\big(\tfrac{n}{\rho}\log(\max_i \tfrac{\beta_i-\alpha_i}{\xi_i})\big)\) with \(\rho\) processors, while Stage 2 requires \(O(\log n)\) full-model verification queries.
Key Experimental Results¶
Main Results¶
Comparison with existing subset-minimal explanation algorithms:
| Method | Explanation Type | Verification Queries | Explanation Size | Computation Time |
|---|---|---|---|---|
| Existing Universal Algorithms | Subset-minimal | Exponential | Larger | Slower |
| Ours (NAM-specific) | Cardinally-minimal | Logarithmic | Minimum | Fastest |
The results indicate that the proposed algorithm solves a harder task (cardinally-minimal vs. subset-minimal) while achieving better speed and explanation quality.
Ablation Study¶
| Configuration | Key Metrics | Description |
|---|---|---|
| No Preprocessing | Queries increase significantly | Significant contribution of preprocessing |
| Different Precision \(\epsilon\) | High precision is slower but improves quality | Trade-off between precision and efficiency |
| Feature Count \(n\) | Logarithmic growth in queries | Validates theoretical logarithmic complexity |
| Different NAM Architectures | Consistent performance | Generalizability of the algorithm |
Key Findings¶
- Cardinal Minimality ≠ Subset Minimality: Cardinally-minimal explanations can be significantly smaller than subset-minimal ones, providing more refined information.
- Formal Explanations vs. Sampling Explanations: Sampling methods (e.g., permutation sampling in SHAP) can yield different (and potentially incorrect) conclusions in certain cases.
- NAM-specific Interpretability Advantage: Beyond visualization, NAMs support efficient formally provable explanations.
- Practical Significance: In safety-critical sectors (medical, finance), unreliable explanations can be more dangerous than no explanations.
Highlights & Insights¶
- Qualitative Leap in Complexity: Reducing complexity from exponential to logarithmic is a fundamental breakthrough made possible by the deep exploitation of NAM structures.
- "Solving Harder Problems Faster": Cardinally-minimal tasks are harder, yet leveraging problem structure makes them more efficient—exemplifying "structure as efficiency."
- Integration of Theory and Application: Bridges formal verification tools (interval propagation, SMT solvers) with machine learning interpretability.
- New Value of NAMs: NAMs are not just visually interpretable; they are computationally more interpretable in a formal sense.
Limitations & Future Work¶
- Scope Limitation: The algorithm relies heavily on the additive structure and cannot be directly extended to general neural networks or models with feature interactions (e.g., NAM-I).
- Expressivity Trade-off: NAMs cannot model feature interactions, which may limit performance. The choice of NAM depends on the trade-offs between interpretability and performance.
- Precision Selection: The choice of \(\epsilon\) for preprocessing affects quality and cost, but no automated method for determining \(\epsilon\) is provided.
- Extension to GA2M: Extending the algorithm to Generalized Additive Models with interactions (GA2M) is a natural next step, though interactions will increase complexity.
Related Work & Insights¶
- Interpretability: SHAP/LIME face reliability issues; provable explanations provide a necessary alternative.
- Formal Verification: NNV, Marabou, and \(\alpha\)-\(\beta\)-CROWN provide the technical foundation for this work.
- Heuristic vs. Provable: This work encourages a shift from heuristic sampling toward worst-case guarantees in XAI.
Rating¶
- Novelty: ⭐⭐⭐⭐⭐ — First logarithmic complexity cardinally-minimal explanation algorithm for NAMs.
- Experimental Thoroughness: ⭐⭐⭐⭐ — Compares well against baselines, though datasets are of limited scale.
- Writing Quality: ⭐⭐⭐⭐ — Rigorous theory but high notation density.
- Value: ⭐⭐⭐⭐ — Important for safety-critical AI despite being restricted to NAMs.