Skip to content

🔗 Causal Inference

🔬 ICLR2026 · 64 paper notes

📌 Same area in other venues: 📷 CVPR2026 (4) · 💬 ACL2026 (7) · 🧪 ICML2026 (19) · 🤖 AAAI2026 (7) · 🧠 NeurIPS2025 (19) · 📹 ICCV2025 (2)

🔥 Top topics: Adversarial Robustness ×4 · LLM ×3 · Time-Series Forecasting ×2 · Reasoning ×2

A Relative Error-Based Evaluation Framework of Heterogeneous Treatment Effect Estimators

This paper proposes an HTE estimator evaluation framework based on relative error. Through a carefully designed weighted least squares loss + balancing regularization + Dragonnet-style neural network, the relative error estimation remains \(\sqrt{n}\)-consistent, asymptotically normal, and provides valid confidence intervals even when the outcome regression model is misspecified (provided the propensity score model is correct). This allows for reliable comparison of different HTE estimators and yields an aggregated HTE learning algorithm.

Action-Guided Attention for Video Action Anticipation

The authors propose the Action-Guided Attention (AGA) mechanism, which employs the model's own action prediction sequences as attention Query and Key (rather than pixel features). Combined with adaptive gated fusion of historical context and current frame features, it achieves robust generalization from validation to test sets on EPIC-Kitchens-100 while supporting post-training interpretability analysis.

ActiveCQ: Active Estimation of Causal Quantities

ActiveCQ unifies the task of "estimating a specific causal quantity (CATE/ATE/ATT/ATE under distribution shift) with minimal labeled samples" into a single active learning problem. It observes that most causal quantities can be expressed as the integral of a regression function over a specific distribution. By modeling the regression function with Gaussian Processes (GP) and representing the integral distribution via Conditional Mean Embeddings (CME) in an RKHS, the framework analytically derives acquisition functions (Information Gain / Total Variance Reduction) from the posterior uncertainty of the causal quantity. It significantly outperforms benchmarks like Random, BALD, and Coreset with fewer labels across multiple simulated and semi-synthetic datasets.

Adjusting Prediction Model Through Wasserstein Geodesic for Causal Inference

To address the issue where distributional imbalance between treated and control groups prevents prediction models from generalizing across groups, this paper proposes G-learner. Instead of aligning covariates (which leads to information loss and over-balancing), G-learner generates a sequence of intermediate populations along the Wasserstein geodesic between the two distributions. It then uses gradual self-training to step-by-step migrate the prediction model from one group to the other. On News/Twins/Jobs and synthetic datasets, it reduces PEHE/ATE errors to State-of-the-Art (SOTA) or competitive levels.

ALM-MTA: Front-Door Causal Multi-Touch Attribution Method for Creator-Ecosystem Optimization

Addressing the challenge of missing ground truth labels and systemic latent confounding in "consumption-driven production" (CDP) scenarios on short-video platforms, this paper identifies the causal uplift of each consumption touchpoint on "whether the user uploads" using the front-door criterion + an adversarially learned proxy mediator. Contrastive learning is employed to ensure overlap in large action spaces. Evaluated on Kuaishou's production system with 400M DAU, the method improves upload AUC to 0.907 (a relative +40% gain over SOTA) and increases per-exposure efficiency by 670%.

An Orthogonal Learner for Individualized Outcomes in Markov Decision Processes

This paper systematically introduces semiparametric efficiency theory from causal inference into Q-function estimation in MDPs. It proves that classical Q-regression and FQE are essentially naive learners with plug-in bias and proposes the DRQQ-learner—a meta-learner characterized by double robustness, Neyman orthogonality, and quasi-oracle efficiency. By deriving the Efficient Influence Function (EIF), it constructs a debiased two-stage loss, significantly outperforming baseline methods in Taxi and Frozen Lake environments.

Beyond DAGs: A Latent Partial Causal Model for Multimodal Learning

This paper points out that large-scale multimodal data do not follow the generation assumption of a single Directed Acyclic Graph (DAG). It proposes a Latent Partial Causal Model utilizing "undirected edges to connect two sets of latent coupled variables." On both spherical and convex latent spaces, it is proven that representations learned by Multimodal Contrastive Learning (MMCL), such as CLIP, differ from ground-truth latent variables by a linear orthogonal transformation and a permutation transformation, respectively. This provides the first theoretical guarantee for "component-wise decoupling" in MMCL and implements it via a plug-and-play decoupling pipeline (FastICA / PCA+FastICA), achieving improvements in few-shot learning and domain generalization.

CARL: Preserving Causal Structure in Representation Learning

CARL investigates the issue of causal structural drift in cross-modal representation learning. By employing three types of constraints—conditional independence preservation, Markov boundary retention, and monotonic alignment consistency—it maps multi-modal data into a shared representation space while preserving independence relations, mediator information, and causal effect identifiability conditions from the original causal graph.

CaTs and DAGs: Integrating Directed Acyclic Graphs with Transformers for Causally Constrained Predictions

This paper proposes the Causal Transformer (CaT), which injects the adjacency matrix of a pre-specified Directed Acyclic Graph (DAG) as a mask into the transformer's cross-attention. This allows the network to strictly adhere to the causal structure while retaining strong functional approximation capabilities, resulting in improved robustness to covariate shift, better interpretability, and the ability to directly estimate intervention effects.

Causal Discovery in the Wild: A Voting-Theoretic Ensemble Approach

This work treats multiple causal discovery algorithms as "fallible voting experts" and establishes a theoretically guaranteed weighted Bayesian voting framework for structural ensemble using voting theory. By decomposing graphs into edge-level substructures and estimating each expert's "ability matrix" via optimal transport, the approach is more robust and accurate than existing heuristic ensemble methods on both synthetic and real data, while providing explicit guidance on selecting ensemble size, ability, and diversity.

Causal Discovery via Quantile Partial Effect

This paper utilizes the Quantile Partial Effect (QPE) from conditional quantile regression as a shape statistic of the observed distribution. It establishes identifiability for bivariate causal directions under a finite basis function span assumption and further connects QPE with the score function and Fisher information to derive FICO, an efficient non-parametric algorithm for multivariate causal ordering.

Causal Imitation Learning under Expert-Observable and Expert-Unobservable Confounding

This paper proposes a unified causal imitation learning framework that simultaneously models two types of hidden confounding: "observable by the expert but not the imitator" and "unobservable by both." By utilizing \(k\)-step trajectory history as an instrumental variable (IV), the problem is reformulated as a Conditional Moment Restriction (CMR) problem. The authors introduce the DML-IL algorithm with imitation gap upper bound guarantees, which outperforms existing causal IL baselines on continuous control tasks such as MuJoCo under confounding.

Causal Score Conditioning for Multi-Resolution Latent Systems

This paper proposes SVGDM, which embeds score-based diffusion into causal directed graphs. By utilizing "causal score decomposition," it enables information propagation along causal edges across observations with different resolutions and noise levels. This allows for the joint inversion of multiple interdependent latent variables (e.g., earthquake → landslide → building damage) under heterogeneous and incomplete observations.

Causal Structure Learning in Hawkes Processes with Complex Latent Confounder Networks

When a multivariate Hawkes process contains an unknown number of latent subprocesses at unknown locations, this paper first proves that the Hawkes process is equivalent to a discrete-time linear autoregressive causal model after discretizing continuous-time event sequences using minimal windows. It then utilizes the rank constraints of the cross-covariance matrix of observed counts to provide necessary and sufficient conditions for identifying latent confounding subprocesses and all causal edges. Based on this, a two-stage iterative algorithm is designed that does not require prior knowledge of the existence, number, or connections of latent variables.

Characterization and Learning of Causal Graphs with Latent Confounders and Post-treatment Selection from Interventional Data

This paper identifies a long-ignored challenge in interventional causal discovery: post-treatment selection (e.g., in single-cell experiments, only high-activity cells are retained according to quality control standards after intervention). This selection mimics causal responses, causing existing methods to misclassify the presence and absence of direct causal edges into the same equivalence class. The authors explicitly model selection variables using augmented DAGs, propose FI-Markov equivalence (finer than traditional classes) and a new graph representation F-PAG, and provide the provably sound and complete F-FCI algorithm. This approach simultaneously identifies causal relationships, latent confounders, and post-treatment selection from observational and interventional data.

Coarse-to-Fine Learning of Dynamic Causal Structures

This paper proposes DyCausal, which utilizes sliding convolutional windows to first capture "coarse-grained" causal structures of time series across multiple time steps. It then refines the causal matrices for each time step using first-order Taylor linear interpolation, coupled with an "always differentiable" acyclicity constraint \(h_\text{norm}\) based on matrix 1-norm scaling. This approach enables the stable and efficient recovery of fully dynamic (both instantaneous and lagged causality vary over time) time-varying causal graphs for the first time, significantly outperforming existing methods on both synthetic and real-world data.

Conditional Independent Component Analysis for Estimating Causal Structure with Latent Variables

This paper proposes the principle of Conditional Independent Component Analysis (CICA)—extracting components that are mutually independent given a set of latent variables—and proves that by selecting its sparsest solution and applying row permutation, one can identify latent variable positions and all causal edges in linear non-Gaussian acyclic models with latent confounders, thereby breaking the reliance on "purity assumptions" required by methods like GIN/TIN.

Conformalized Survival Counterfactuals Prediction for General Right-Censored Data

In clinical scenarios involving "general right-censoring + multiple treatment options," this paper utilizes the potential outcomes framework combined with weighted conformal prediction to construct a Lower Predictive Bound (LPB) for counterfactual survival times. It upgrades PAC-type approximate coverage from previous methods to exact marginal coverage and achieves double robustness against model misspecification.

Counterfactual Explanations on Robust Perceptual Geodesics

The PCG (Perceptual Counterfactual Geodesic) method is proposed to generate semantically faithful counterfactual explanations via geodesic optimization on a robust perceptual manifold. A two-stage optimization ensures paths are perceptually natural and reach the target class, achieving an FID of 8.3 on AFHQ, significantly outperforming RSGD's 12.9.

Counterfactual LLM-based Framework for Measuring Rhetorical Style

This paper proposes a counterfactual LLM measurement framework: while fixing the substantive content \(X\) (methods, experiments, results), different rhetorical personas generate counterfactual abstracts for the same paper. These abstracts are then calibrated into continuous "rhetorical strength" \(Z\) scores using LLM Judge pairwise comparisons and the Bradley-Terry model. Empirical analysis of 8,485 ICLR submissions shows that stronger visionary rhetoric significantly predicts citations and media attention, and post-2023 rhetorical intensification is highly correlated with the adoption of LLM writing assistance.

Counterfactual Structural Causal Bandits

This paper elevates Structural Causal Bandits (SCB) from the L1/L2 (observational/interventional) layers of the Pearl Causal Hierarchy to the L3 (counterfactual) layer. It proposes the CTF-SCB framework, incorporating implementable counterfactual actions—such as "what if a downstream variable had received \(x'\) instead of \(x\)"—into the arm space. Using a set of graph-theoretic characterizations (CTF-MIS / CTF-POMIS / counterfactual regime graphs), the work prunes an exponential arm space down to a representative subset of "possibly optimal" arms, which, when paired with standard bandit solvers, achieves lower cumulative regret.

Debiased Front-Door Learners for Heterogeneous Effects

This paper transplants the mature DR-Learner and R-Learner from back-door settings to front-door identification scenarios. It proposes two debiased estimators, FD-DR-Learner and FD-R-Learner, ensuring that the conditional front-door effect \(\tau(C)\) achieves quasi-oracle rates even when nuisance functions converge at a slow rate of \(n^{-1/4}\).

Designing Time Series Experiments in A/B Testing with Transformer Reinforcement Learning

Addressing the problem of how to allocate treatments (new vs. old strategy) in time-series experiments to minimize the MSE of ATE estimation. The paper proves an impossibility theorem stating that "allocation strategies ignoring full history are necessarily suboptimal." It then uses a Transformer to encode the full history into states and a double deep Q-network to learn the optimal allocation strategy using MSE directly as the (negative) reward. Across synthetic data, dispatch simulators, and real ride-hailing data, the MSE of this approach is consistently lower than various switchback and MDP-based designs.

Direct Doubly Robust Estimation of Conditional Quantile Contrasts

Ours proposes the first direct estimation method for the Conditional Quantile Comparator (CQC). By explicitly parameterizing the CQC and combining it with doubly robust gradient descent, the method maintains theoretical double robustness while consistently outperforming existing indirect inversion methods in estimation accuracy, interpretability, and computational efficiency in experiments.

Distributional Equivalence in Linear Non-Gaussian Latent-Variable Cyclic Causal Models

This work provides the first complete graphical criterion for distributional equivalence between causal graphs containing both latent variables and cycles under the linear non-Gaussian setting without relying on any structural assumptions. The core tool introduced is edge rank constraints. Based on this, algorithms for traversing equivalence classes and recovering causal models from data are developed—marking the first equivalence characterization and discovery method for parametric causal models without structural assumptions.

Efficient and Sharp Off-Policy Learning under Unobserved Confounding

This work derives a closed-form expression + semiparametrically efficient estimator for the sharp bounds of the value function in personalized off-policy learning under unobserved confounding. It simplifies the originally unstable minimax optimization into a standard minimization problem and proves that minimizing this estimator yields the optimal confounding-robust policy.

Efficient Ensemble Conditional Independence Test Framework for Causal Discovery

The E-CIT (Ensemble Conditional Independence Test) framework is proposed. By splitting data into subsets, executing tests independently, and merging results based on a p-value aggregation method for stable distributions, it reduces the computational complexity of any conditional independence test to linear with respect to sample size. Simultaneously, it maintains or even enhances test power in complex scenarios such as heavy-tailed noise and real-world data.

Exploratory Causal Inference in SAEnce

This paper proposes the "Exploratory Causal Inference" paradigm: rather than requiring scientists to presuppose which effects to measure, it uses foundation models + Sparse Autoencoders (SAEs) to map high-dimensional raw observations (e.g., ant behavior videos) into interpretable neural channels. Then, a recursive hierarchical testing algorithm titled Neural Effect Search automatically discovers unknown outcome variables from data that are truly affected by the treatment in Randomized Controlled Trials (RCTs).

Flattery, Fluff, and Fog: Diagnosing and Mitigating Idiosyncratic Biases in Preference Models

This paper systematically investigates the over-reliance of preference models on five surface features (verbosity, structure, jargon, sycophancy, vagueness). By using causal counterfactual pairs, it quantifies that bias originates from distributional imbalances in training data and proposes Counterfactual Data Augmentation (CDA) as a post-training method, reducing the average miscalibration rate between model and human judgment from 39.4% to 32.5%.

Foundation Models for Causal Inference via Prior-Data Fitted Networks

CausalFM adapts the "tabular foundation model" PFN to causal inference: it uses Structural Causal Models (SCMs) to generate synthetic priors and pre-trains a Transformer on synthetic data. This enables the model to directly provide Bayesian-style CATE estimates for backdoor, front-door, and instrumental variable settings via in-context learning without retraining.

Frequency-Domain Better than Time-Domain for Causal Structure Recovery in Dynamical Systems on Networks

Addressing causal graph recovery for networked dynamical systems, this paper theoretically proves that frequency-domain Wiener filtering is faster than the time-domain (\(O(L^2/\log N)\) speedup via FFT). It discovers that phase information unique to complex estimations in the frequency domain can directly reveal skeletons and colliders across a large class of networks, leading to the proposed "Wiener-Phase" algorithm that avoids combinatorial CI tests.

Function Induction and Task Generalization: An Interpretability Study with Off-by-One Addition

Through the counterfactual task of off-by-one addition (e.g., 1+1=3, 2+2=5), this study utilizes path patching to discover a function induction mechanism within Large Language Models—an attention head circuit capable of inductive reasoning at the function level, transcending token-level pattern matching—and demonstrates that this mechanism is reusable across tasks.

GDR-learners: Orthogonal Learning of Generative Models for Potential Outcomes

This paper proposes a suite of universal Neyman-orthogonal (doubly robust) generative learners, GDR-learners. These learners integrate any SOTA conditional generative model (NF / GAN / VAE / Diffusion) into a two-stage objective loss that is first-order insensitive to nuisance estimation errors. This allows for estimating the entire conditional distribution of potential outcomes (rather than just the expectation) with "quasi-oracle efficiency + rate double robustness."

Good Allocations from Bad Estimates

This paper proves a counter-intuitive conclusion: the sample size required for allocating limited treatment resources (treatment allocation) to target groups is smaller than the sample size required to accurately estimate the treatment effect (CATE) for each group by a factor of \(1/\epsilon\)—requiring only \(O(M/\epsilon)\) instead of \(O(M/\epsilon^2)\) samples, because "coarse estimates are sufficient for making near-optimal allocation decisions."

Ice Cream Doesn't Cause Drowning: Benchmarking LLMs Against Statistical Pitfalls in Causal Inference

The study proposes the CausalPitfalls benchmark, featuring 6 categories, 15 challenges, 75 questions, and 75 datasets generated by structural causal models. It systematically tests whether LLMs fall into classic statistical traps such as Simpson's Paradox and selection bias, revealing that even the strongest models possess a "causal reliability" of less than 45%.

IGC-Net for Conditional Average Potential Outcome Estimation Over Time

This paper proposes IGC-Net: the first neural network to estimate temporal Conditional Average Potential Outcomes (CAPO) through pure regression-based iterative G-computation end-to-end. It correctly adjusts for time-varying confounding while bypassing the zero-division instability of Inverse Probability Weighting (IPW) and the high-dimensional full distribution estimation of G-Net.

Independence Test for Linear Non-Gaussian Data and Applications in Causal Discovery

This paper proves that in linear non-Gaussian mixture models, constant conditional mean and constant conditional variance are sufficient to imply independence. Based on this, a kernel independence test named LiNGIC is proposed, which is simultaneously sensitive to first- and second-order conditional moments. LiNGIC demonstrates higher statistical power than general tests like HSIC in synthetic data and Direct-LiNGAM causal discovery.

Influence without Confounding: Causal Discovery from Temporal Data with Long-term Carry-over Effects

To address spurious causality caused by "ancient historical values directly influencing the present" (long-term carry-over confounding), this paper proves the equivalence between the "OLS score" and the "diagonal of the R matrix from QR decomposition" to identify the true topological order. It eliminates long-term confounding using residuals from limited-step historical regression and proposes the LEVER method, which utilizes a DQN to efficiently search for the optimal variable order using the R matrix as the state.

Journey to the Centre of Cluster: Harnessing Interior Nodes for A/B Testing under Network Interference

To address the high variance issue in GATE estimation during A/B testing under network interference, this paper proposes the Mean-in-Interior (MII) estimator, which averages results only for nodes inside clusters to significantly reduce variance. Furthermore, an augmented AMII estimator is developed using a counterfactual predictor for covariate shift correction, achieving both low bias and low variance.

Learning Dynamic Causal Graphs Under Parametric Uncertainty via Polynomial Chaos Expansions

The strength of each causal edge is upgraded from a "static weight" to a "function of operating parameters \(\xi\)." This function is learned using Polynomial Chaos Expansion (PCE) to discover causal structures that change dynamically with operating conditions, providing provable identifiability and convergence guarantees.

Learning Exposure Mapping Functions for Inferring Heterogeneous Peer Effects

This paper proposes EGONETGNN, which uses Graph Neural Networks to automatically learn the "exposure mapping function" in network peer effects. It eliminates the need to manually specify counts of treated neighbors, enabling robust estimation of heterogeneous peer effects even when influence mechanisms are unknown or depend on local structures (triangles, clustering coefficients, attribute similarity).

Learning Robust Intervention Representations with Delta Embeddings

The authors propose the Causal Delta Embedding (CDE) framework, which represents interventions/actions as the vector difference between pre- and post-intervention states in a latent space. By applying three constraints—independence, sparsity, and invariance—the framework learns robust intervention representations. It significantly outperforms baselines in OOD generalization within the Causal Triplet challenge and automatically discovers the anti-parallel semantic structure of antonymous actions.

LLMs Struggle to Balance Reasoning and World Knowledge in Causal Narrative Understanding

By controllably generating causal narratives across two axes—"world knowledge conflict" and "graph reasoning complexity"—the authors find that SOTA LLMs rely on two shortcuts in causal narrative understanding (event appearance order = causal order, and applying parametric common sense). Neither CoT nor ICL can resolve this; only a "Graph" strategy—where the model first extracts the entire causal graph and then answers via graph traversal—bypasses these shortcuts.

Matching without Group Barrier for Heterogeneous Treatment Effect Estimation

MOGA breaks the matching barrier of "searching for neighbors only within the target treatment group" by including all samples in the candidate pool. It employs a self-optimal transport model to learn matching weights, utilizes random walks on manifolds to propagate factual outcomes for counterfactual prediction, and finds sufficiently close neighbors even under sample sparsity or distribution shifts, significantly improving the precision of heterogeneous treatment effect estimation.

Meta-Router: Bridging Gold-standard and Preference-based Evaluations in LLM Routing

This paper reinterprets the differences between "gold-standard vs. preference-based" data sources as treatment assignment in causal inference. Consequently, the bias in preference data is proven to be exactly the Conditional Average Treatment Effect (CATE). By estimating and correcting this bias using R-/DR-learner meta-learners, a highly accurate and sample-efficient LLM router is trained.

Modeling Interference for Treatment Effect Estimation in Network Dynamic Environment

Addressing the dual challenges of "dynamic networks + neighbor interference," this paper defines a new identifiable estimator, CATE-ID, and proposes the DSPNET framework. It utilizes GCN+RNN to capture time-varying hidden confounders, models spillover effects with data-driven interference representations, and balances confounding representations via a Gradient Reversal Layer (GRL) to achieve unbiased estimation of individual treatment effects from observational dynamic network data.

Multiverse Mechanica: A Testbed for Learning Game Mechanics via Counterfactual Worlds

The study reformulates the ambiguous question of whether a "game world model has truly learned game rules (mechanics)"—previously judged only via a posteriori visual inspection—into a formal causal counterfactual inference task. It introduces Multiverse Mechanica, a playable game testbed capable of natively outputting "parallel world contrastive data + causal graphs for each mechanic," making "learning mechanics" (as opposed to "learning pixels") definable, supervised, and reproducible for evaluation for the first time.

NextQuill: Causal Preference Modeling for Enhancing LLM Personalization

NextQuill reformulates LLM personalization as a causal problem—viewing both model predictions and ground-truth user responses as the joint result of "User History/Features \(\times\) Context." By using causal effects (do-calculus), it isolates the core preference-driven components and employs two alignment losses to learn only these parts, achieving deeper personalization than undifferentiated alignment.

On Measuring Influence in Avoiding Undesired Future

This paper proposes a new influence measure, influence power (InP), for the "Avoiding Undesired Future" (AUF) problem. It measures how much the probability of reaching a target is increased by "actively modifying an actionable variable" compared to "letting it occur naturally." The paper theoretically proves that influence is not equivalent to causal effect (weakly causal or even non-causal variables can be highly useful) and provides a practical algorithm to estimate this quantity from observational data using Monte Carlo Tree Search.

On the Eligibility of LLMs for Counterfactual Reasoning: A Decompositional Study

This paper proposes a decompositional evaluation framework based on Structural Causal Models (SCM), splitting LLM counterfactual reasoning into four stages (causal variable identification → causal graph construction → intervention identification → outcome reasoning). It systematically diagnoses ability bottlenecks across 11 multimodal datasets and suggests tool augmentation and advanced elicitation strategies to improve performance.

On the Identifiability of Causal Graphs with the Invariance Principle

This paper proves that under the conditions of invariant mechanisms and sufficient noise variance scaling across environments, the complete causal graph of any nonlinear invertible structural causal model (SCM) can be uniquely identified using one base environment and two auxiliary environments. This identifiability phenomenon is verified through synthetic experiments following the proof logic.

Overlap-Adaptive Regularization for Conditional Average Treatment Effect Estimation

Addressing the persistent challenge of learning in "low-overlap regions" for Conditional Average Treatment Effect (CATE) estimation, this paper proposes Overlap-Adaptive Regularization (OAR). The regularization strength of the second-stage model in two-stage meta-learners varies inversely with the overlap weight \(\nu(x)\) (stronger regularization for lower overlap). It further introduces dOAR, a debiased version that maintains Neyman orthogonality, consistently outperforming "constant regularization" across multiple (semi-)synthetic datasets.

Overlap-Weighted Orthogonal Meta-Learner for Treatment Effect Estimation over Time

This paper proposes the WO-learner (overlap-weighted orthogonal meta-learner), which focuses estimation on samples that are truly likely to receive the target intervention sequences by applying an "overlap weight" to training samples. Combined with a Neyman-orthogonal weighted population risk function, it maintains stability in low-overlap scenarios where "overlap probability decays exponentially with the prediction horizon." It outperforms existing meta-learners across synthetic, semi-synthetic, and real-world datasets.

Privacy-Protected Causal Survival Analysis Under Distribution Shift

To address the challenge where "multi-center survival data cannot be pooled due to privacy constraints and distributions across sites are inconsistent," this paper utilizes influence function theory to construct a local estimator for each external source site anchored to the target site. It then adaptively weights source sites using convex optimization with an \(\ell_1\) penalty (weighting aligned sources and zeroing out biased ones). Transferring only summary statistics throughout, the method achieves a doubly robust and strictly more efficient target population survival function estimation as long as at least one source is consistent.

Query-Specific Causal Graph Pruning under Tiered Knowledge

This paper proposes a method to prune edges from a causal graph using "tiered knowledge" while maintaining the identifiability of (conditional) causal effects. By reducing the identification problem to a smaller subgraph, it designs a query-specific causal discovery algorithm that achieves exponential speedup compared to existing methods.

Resisting Contextual Interference in RAG via Parametric-Knowledge Reinforcement

This paper proposes Knowledgeable-R1, a reinforcement learning framework that enables LLMs to resist interference from misleading retrieval contexts in RAG scenarios while preserving the ability to utilize reliable context. This is achieved through joint sampling of parametric knowledge (PK) and contextual knowledge (CK) trajectories, combined with local/global advantage calculation and adaptive asymmetric advantage transformation.

Score-based Greedy Search for Structure Identification of Partially Observed Causal Models

This paper proposes LGES, the first score-based greedy search method for causal models with latent variables that provides identifiability guarantees. It establishes an "algebraic equivalence" criterion using likelihood score and minimum dimensionality, then tightens this equivalence to the Markov Equivalence Class (MEC) using a weak structural assumption called the Generalized N Factor Model (GNFM). Finally, it employs a two-phase greedy search driven by two edge-deletion operators to efficiently recover the entire structure including latent variables, outperforming existing constraint-based methods on small samples and real-world psychological data.

Self-Supervised Learning from Structural Invariance

AdaSSL is proposed to model conditional uncertainty between positive pairs by introducing latent variables and deriving a variational lower bound of mutual information. This enables SSL to handle complex (multimodal, heteroscedastic) conditional distributions in naturally paired data, outperforming baselines in causal representation learning, fine-grained image understanding, and video world models.

SelfReflect: Can LLMs Communicate Their Internal Answer Distribution?

The study proposes the SelfReflect metric—an information-theoretic distance measuring the discrepancy between an LLM's self-stated uncertainty summary and its true internal answer distribution. It discovers that modern LLMs generally fail to autonomously reflect their internal uncertainty, but can generate faithful uncertainty summaries by sampling multiple outputs and feeding them back into the context.

Stochastic Neural Networks for Causal Inference with Missing Confounders

This paper proposes CI-StoNet: a stochastic neural network (StoNet) that directly encodes the Markov decomposition of a causal DAG into the network architecture. It employs adaptive Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) to simultaneously impute missing latent confounders and estimate sparse network parameters. This provides causal effect estimation with model-level identifiability guarantees and strong nonlinear modeling capabilities for observational data where not all confounders are observed.

Synthesising Counterfactual Explanations via Label-Conditional Gaussian Mixture Variational Autoencoders

The paper proposes L-GMVAE (Label-Conditional Gaussian Mixture VAE) and the LAPACE algorithm. By learning multiple class-specific Gaussian cluster centroids in the latent space and performing linear interpolation from the input latent representation to these target centroids, the method generates path-based counterfactual explanations that simultaneously ensure validity, plausibility, diversity, and perfect robustness to input perturbations.

TCD-Arena: Assessing Robustness of Time Series Causal Discovery Methods Against Assumption Violations

TCD-Arena proposes an extensible robustness testing suite for time series causal discovery, systematically evaluating 10 categories of methods through 33 types of progressively intensified real-world assumption violations and approximately 36 million causal discovery attempts. The study finds that different algorithms exhibit vastly different robustness profiles, and simple ensembles can further improve stability on both lag and summary graphs.

Theoretical Guarantees for Causal Discovery on Large Random Graphs

This paper introduces the first finite-dimensional deviation concentration bounds (rather than asymptotic consistency or worst-case bounds) for causal orientation using random single-variable interventions. On sparse Erdős–Rényi and generalized Barabási–Albert random graphs, the False Negative Rate (FNR) of orientation errors concentrates increasingly—or even vanishes—as the dimension \(d\) increases, proving that high dimensionality and heavy-tailed degree heterogeneity, often viewed as obstacles, inherently regularize causal discovery.

Topological Causal Effects

This paper defines causal treatment effects on the topological structure of outcomes. By using power-weighted silhouette functions of persistence diagrams to characterize "treatment-induced topological changes," the authors propose a fully non-parametric, \(\sqrt{n}\)-consistent doubly robust AIPW estimator. Furthermore, they construct a formal hypothesis test for the existence of topological effects based on functional weak convergence and silhouette stability bounds.