Skip to content

🔗 Causal Inference

🧠 NeurIPS2025 · 21 paper notes

A Principle of Targeted Intervention for Multi-Agent Reinforcement Learning

This paper proposes a Targeted Intervention paradigm grounded in Multi-Agent Influence Diagrams (MAIDs), which applies Pre-Strategy Intervention (PSI) exclusively to a single target agent to guide the entire multi-agent system toward a preferred Nash equilibrium satisfying additional desired outcomes, without requiring global intervention over all agents.

An Analysis of Causal Effect Estimation Using Outcome Invariant Data Augmentation

This paper presents the first systematic analysis of outcome invariant data augmentation (DA) for causal effect estimation. It proves that when DA operations preserve the outcome variable, they are equivalent to soft interventions on the treatment variable, thereby reducing confounding bias. The paper further proposes an IV-like (IVL) regression framework that treats DA parameters as "instrument-like" variables, and reduces bias further through adversarial DA composition.

Bi-Level Decision-Focused Causal Learning for Large-Scale Marketing Optimization

This paper proposes Bi-DFCL, a bilevel optimization framework that jointly leverages observational (OBS) data and randomized controlled trial (RCT) data to train marketing resource allocation models. The upper level trains a Bridge Network with unbiased decision loss on RCT data to dynamically correct the bias of the lower level trained on OBS data. The framework further introduces differentiable surrogate decision losses (PPL/PIFD) grounded in the primal problem and an implicit differentiation algorithm, addressing the predict-then-optimize inconsistency and the bias-variance dilemma of conventional two-stage methods. The system has been deployed at scale on Meituan.

Causality-Induced Positional Encoding for Transformer-Based Representation Learning of Non-Sequential Features

CAPE learns the causal DAG structure among features from tabular data, embeds it into hyperbolic space to generate causality-aware rotary positional encodings (RoPE), enabling Transformers to process non-sequential yet causally structured feature data, with significant performance gains on downstream multi-omics tasks.

Conformal Prediction for Causal Effects of Continuous Treatments

This work is the first to construct conformal prediction intervals for causal effects of continuous treatments (e.g., drug dosage) by parameterizing intervention-induced propensity shifts via a tilting function family and employing quantile regression, providing finite-sample \(1-\alpha\) coverage guarantees under both known and unknown propensity score settings.

Counterfactual Reasoning for Steerable Pluralistic Value Alignment of Large Language Models

This paper proposes the COUPLE framework, which constructs a Structural Causal Model (SCM) to model the dependencies and priorities among multi-dimensional values, and leverages counterfactual reasoning to achieve steerable alignment of LLMs toward arbitrary fine-grained pluralistic value objectives.

Cyclic Counterfactuals under Shift–Scale Interventions

This paper establishes a theoretical framework for counterfactual reasoning under shift–scale soft interventions in cyclic (non-DAG) structural causal models (SCMs). It proves that a global contraction condition guarantees unique solvability of cyclic SCMs and derives sub-Gaussian concentration inequalities for counterfactual distributions.

Demystifying Spectral Feature Learning for Instrumental Variable Regression

This paper establishes rigorous generalization error bounds for spectral feature-based nonparametric instrumental variable (NPIV) regression, revealing that performance is jointly governed by two factors: spectral alignment between the structural function and the conditional expectation operator (approximation error) and the rate of singular value decay (estimation error). A Good-Bad-Ugly trichotomy is proposed along with data-driven diagnostic tools.

Differentiable Structure Learning and Causal Discovery for General Binary Data

This paper proposes a general differentiable structure learning framework based on the Multivariate Bernoulli Distribution (MVB) that makes no assumptions about the specific data-generating process, captures arbitrary higher-order dependencies among binary discrete variables, and proves that while DAGs are not identifiable in the general setting, the minimal equivalence class (Markov equivalence class) is recoverable.

Do-PFN: In-Context Learning for Causal Effect Estimation

This paper proposes Do-PFN, which extends Prior-data Fitted Networks (PFN) to causal effect estimation. A Transformer is pre-trained on large-scale synthetic SCM data to perform in-context causal reasoning, enabling prediction of causal intervention distributions (CID) and CATE from observational data alone—without requiring causal graph knowledge or the unconfoundedness assumption—achieving strong performance on both synthetic and semi-synthetic benchmarks.

Domain-Adapted Granger Causality for Real-Time Cross-Slice Attack Attribution in 6G Networks

This paper proposes a domain-adapted Granger causality framework for 6G network slicing that integrates enhanced Granger causality testing with network resource contention modeling to enable real-time cross-slice attack attribution, achieving 89.2% accuracy and 87 ms response time across 1,100 attack scenarios, substantially outperforming existing statistical, deep learning, and causal discovery methods.

Few-Shot Knowledge Distillation of LLMs With Counterfactual Explanations

This paper proposes CoD (Counterfactual-explanation-infused Distillation), which injects counterfactual explanations into few-shot training sets to precisely map the teacher's decision boundary, achieving significant improvements over standard distillation methods across 6 datasets using only 8–512 samples.

From Black-box to Causal-box: Towards Building More Interpretable Models

This paper proposes a formal definition of causal interpretability, proves that both black-box models and concept bottleneck models fail to satisfy this property, establishes a complete graphical criterion for identifying which model architectures can consistently answer counterfactual queries, and reveals a fundamental tradeoff between causal interpretability and predictive accuracy.

GST-UNet: A Neural Framework for Spatiotemporal Causal Inference with Time-Varying Confounding

This paper proposes GST-UNet, which integrates a U-Net spatiotemporal encoder with iterative G-computation to estimate location-specific conditional average potential outcomes (CAPOs) from a single spatiotemporal observational trajectory. The framework simultaneously handles interference, spatial confounding, temporal carry-over effects, and time-varying confounding, and is validated on a real-world causal analysis of wildfire smoke effects on respiratory hospitalization rates in California.

It's Hard to Be Normal: The Impact of Noise on Structure-agnostic Estimation

This paper proves that Double Machine Learning (DML) is minimax optimal under Gaussian treatment noise (\(O(\epsilon^2 + n^{-1/2})\)), but becomes suboptimal under non-Gaussian noise. It proposes Agnostic Cumulant-based Estimation (ACE), which exploits higher-order cumulants to achieve \(r\)-th order insensitivity \(O(\epsilon^r + n^{-1/2})\).

LLM Interpretability with Identifiable Temporal-Instantaneous Representation

This paper proposes an identifiable temporal causal representation learning framework for the high-dimensional activation spaces of LLMs. By adopting a linearized formulation that jointly models time-lagged and instantaneous causal relationships, it resolves the computational bottleneck that prevents existing CRL methods from scaling to LLM-scale dimensions, while preserving theoretical identifiability guarantees.

Performative Validity of Recourse Explanations

This paper formally analyzes the "performative" effects of recourse explanations — when a large number of rejected applicants act on recourse recommendations, their collective behavior induces distribution shift that renders recourse invalid after model retraining — and proves that only Improvement-based Causal Recourse (ICR), which intervenes solely on causal variables, preserves "performative validity" under broad conditions.

Practical do-Shapley Explanations with Estimand-Agnostic Causal Inference

This paper proposes the Estimand-Agnostic (EA) approach and the Frontier-Reducibility Algorithm (FRA) for efficient computation of causal Shapley values (do-SV). By training a single SCM to learn the observational distribution, the framework answers arbitrary identifiable causal queries and reduces the number of coalitions requiring evaluation by approximately 90% via coalition reduction.

Revealing Multimodal Causality with Large Language Models

This paper proposes MLLM-CD, the first framework for causal discovery from multimodal unstructured data (text + images). It identifies causal variables via contrastive factor discovery, infers causal structure through statistical methods, and resolves structural ambiguity via iterative multimodal counterfactual reasoning.

Root Cause Analysis of Outliers with Missing Structural Knowledge

Two simple and efficient algorithms are proposed for root cause analysis using only marginal anomaly scores: SMOOTH TRAVERSAL (known causal graph — finds the node with the largest score jump along causal paths) and SCORE ORDERING (unknown causal graph — ranks nodes by score and returns the top-\(k\)). Both algorithms provide nonparametric probabilistic guarantees under polytree structure and operate on a single anomalous sample.

Transferring Causal Effects using Proxies

This paper proposes a multi-domain causal effect transfer method based on proxy variables. Given that only proxy variable \(W\) is observed in the target domain, the method leverages multi-source domain data to identify and estimate the interventional distribution under unobserved confounders in the target domain, and provides two consistent estimators with asymptotic confidence intervals.