Practical do-Shapley Explanations with Estimand-Agnostic Causal Inference¶
Conference: NeurIPS 2025 arXiv: 2509.20211 Code: To be confirmed Area: Causal Inference / Explainability Keywords: Shapley values, causal inference, do-SHAP, structural causal models, identifiability
TL;DR¶
This paper proposes the Estimand-Agnostic (EA) approach and the Frontier-Reducibility Algorithm (FRA) for efficient computation of causal Shapley values (do-SV). By training a single SCM to learn the observational distribution, the framework answers arbitrary identifiable causal queries and reduces the number of coalitions requiring evaluation by approximately 90% via coalition reduction.
Background & Motivation¶
Background: do-SHAP integrates Shapley values with causal inference by replacing conditional expectations with causal intervention values \(\nu(S) = E[Y|\text{do}(X_S = x_S)]\), thereby eliminating spurious correlations inherent in conventional SHAP. However, computing do-SV requires evaluating \(2^{|X|}\) distinct causal queries.
Limitations of Prior Work: Existing Estimand-Based (EB) methods require manually specifying a causal estimand (e.g., backdoor/frontdoor criterion) for each coalition \(S\) and fitting separate models accordingly—a highly impractical requirement. For \(K=10\) features, one must handle 1,024 distinct causal queries, each potentially requiring a different estimand.
Key Challenge: A fundamental tension exists between the theoretical superiority of do-SV (freedom from spurious correlations) and its computational intractability—the number of coalitions grows exponentially, and each coalition demands independent causal inference.
Goal: (a) Eliminate the need to manually specify estimands for each coalition; (b) reduce the number of coalitions requiring explicit computation.
Key Insight: Directly learn the data-generating process via an SCM. Once the SCM is fitted, any identifiable causal query can be answered by simulating the do-operator, without deriving estimands in advance. The causal graph structure is further exploited to identify redundant coalitions.
Core Idea: Fit a single SCM to the observational distribution for estimand-agnostic causal inference, combined with the Frontier-Reducibility Algorithm for coalition reduction, to make do-SHAP practically viable.
Method¶
Overall Architecture¶
Input: causal graph \(G\), observational data → Step 1: Train an SCM (architecture options: linear / DCN / DCG / CNF) to fit \(\mathcal{P}(V)\) → Step 2: Apply the Frontier-Reducibility Algorithm to reduce \(2^K\) coalitions to an irreducible subset → Step 3: For each irreducible coalition, simulate the do-operator via the SCM to compute \(\nu(S)\) and cache results → Step 4: Aggregate Shapley values.
Key Designs¶
-
Estimand-Agnostic (EA) Causal Queries:
- Function: Train a single SCM to answer arbitrary causal queries without deriving per-coalition estimands.
- Mechanism: The SCM learns the generative mechanism \(V_i = f_i(\text{Pa}_i, U_i)\) for each variable. Under a do-intervention, the target variables \(X_S\) are fixed to their observed values, noise variables \(U\) are sampled from their prior, and the remaining variables are generated in topological order. \(E[Y|\text{do}(x_S)]\) is estimated via Monte Carlo simulation.
- Design Motivation: EB methods require deriving a separate statistical estimand for each identifiable query, which is infeasible as the number of coalitions grows exponentially. The EA approach requires fitting the SCM only once to answer all queries.
-
Frontier-Reducibility Algorithm (FRA):
- Function: Identify coalitions that yield identical causal effects, thereby eliminating redundant computation.
- Mechanism: For a coalition \(S\), the algorithm checks whether a reducible subset \(S' \subset S\) exists such that \(\nu(S) = \nu(S')\). The reduction condition holds when variables later in the topological order (within \(S\)) block all paths from earlier variables to \(Y\). The "frontier" concept formalizes this: the layer of variables in \(S\) closest to \(Y\) constitutes the irreducible core.
- Design Motivation: In sparse causal graphs, a large fraction of coalitions are redundant (~90% reducible). FRA incurs minimal computational overhead (approximately 8 seconds for \(K=100\)) while substantially reducing the number of required causal queries.
-
Support for Multiple SCM Architectures:
- Function: Supports linear models, deep causal networks (DCN), deep causal graphs (DCG), and continuous normalizing flows (CNF).
- Mechanism: Linear SCMs are used to validate theoretical correctness; DCN/DCG handle nonlinear settings; CNF accommodates continuous variables and complex distributions.
- Design Motivation: Different data regimes require different SCM expressiveness, necessitating a flexible framework.
Loss & Training¶
- SCM training: maximize the likelihood of observational data \(\mathcal{P}(V)\).
- Supports both Markovian (no hidden variables) and Semi-Markovian (with hidden variables) causal graphs.
- Semi-Markovian settings require more expressive SCMs to model latent confounders.
Key Experimental Results¶
Main Results¶
| Dataset | Setting | EA Error | EB Error | FRA Speedup |
|---|---|---|---|---|
| Synthetic (Markovian) | Linear SCM | ~0.01 MSE | — | ~90% coalition reduction |
| Synthetic (Semi-Markovian) | DCN | ~0.03 MSE | ~0.02 MSE | ~85% reduction |
| Adult Income | DCG | Computable | Requires manual derivation | Significant |
| Earthquake | CNF | Computable | — | — |
Efficiency Comparison¶
| Method | Time (\(K=100\)) |
|---|---|
| do-SHAP (no caching) | 26m 01s |
| do-SHAP + FRA caching | 21m 14s |
| FRA reduction computation | ~8s |
Key Findings¶
- The EA approach achieves accuracy comparable to EB methods in Markovian settings without requiring per-coalition estimand derivation.
- FRA achieves the greatest reduction on sparse graphs (~90%); it provides no benefit when all variables are direct parents of \(Y\).
- SCM training error propagates to do-SV estimation, but remains controllable given sufficient data.
- Compared to observational SHAP, do-SHAP correctly identifies spuriously correlated variables (e.g., collider bias).
Highlights & Insights¶
- Elegance of the Estimand-Agnostic paradigm: The approach entirely bypasses estimand derivation—arguably the most difficult step in causal inference—by fitting a single SCM. This is the only feasible strategy when the number of coalitions grows exponentially.
- Graph-theoretic ingenuity of FRA: Exploiting causal graph structure (topological ordering, path blocking) to identify computational redundancy incurs negligible cost while yielding substantial savings.
- Practical viability of do-SHAP: The framework bridges the gap between "theoretically correct but computationally intractable" and "deployable on real data."
Limitations & Future Work¶
- A known causal graph is required; errors from causal discovery algorithms propagate to the final estimates.
- The EA approach lacks doubly-robust guarantees available to EB estimand-based methods.
- do-SV estimation quality is directly constrained by SCM training quality.
- Combinatorial explosion persists for large-scale settings (\(K \gg 100\)); FRA provides only a constant-factor improvement.
Related Work & Insights¶
- vs. Observational SHAP: do-SHAP eliminates spurious correlations at higher computational cost; this work substantially lowers that barrier.
- vs. Causal SHAP (Heskes): Heskes' Asymmetric SHAP incorporates causal structure but does not perform do-interventions.
- vs. Causal inference literature: The EA approach bridges the SCM learning community and the explainable AI community.
Rating¶
- Novelty: ⭐⭐⭐⭐ The EA + FRA combination enables do-SHAP to be practically used for the first time.
- Experimental Thoroughness: ⭐⭐⭐⭐ Covers synthetic and real datasets with multiple SCM architectures, though scale remains limited.
- Writing Quality: ⭐⭐⭐⭐ Problem motivation is clearly articulated and the methodology is systematically described.
- Value: ⭐⭐⭐⭐⭐ Advances do-SHAP from a theoretical concept to a practically deployable causal explanation tool.
Additional Technical Details¶
- SCM training is validated over 30 random seeds to assess estimation stability.
- Feature importance is computed as \(FI_X = \frac{1}{N}\sum_{i\in[N]}|\phi_X^{(i)}|/\sum_{X'\in X}|\phi_{X'}^{(i)}|\).
- Semi-Markovian experiments (with latent variable \(U_{X,B}\)) further validate the effectiveness of the EA approach.
- Two real-world datasets are used: CDC Diabetes Health Indicators and bike-sharing demand prediction.
- The frontier-checking complexity of FRA caching scales linearly with coalition size rather than exponentially.
- Value: ⭐⭐⭐⭐ Provides a practical tool for causal explainable AI.