🔗 Causal Inference¶

🔬 ICLR2026 · 18 paper notes

Action-Guided Attention for Video Action Anticipation: This paper proposes an Action-Guided Attention (AGA) mechanism that uses the model's own action prediction sequences as the Query and Key in attention (rather than pixel-level features), combined with adaptive gated fusion of historical context and current frame features. AGA achieves strong generalization from validation to test set on EPIC-Kitchens-100 and supports post-hoc interpretability analysis.
AgentTrace: Causal Graph Tracing for Root Cause Analysis in Deployed Multi-Agent Systems: This paper proposes AgentTrace, a framework that constructs causal graphs from execution logs of multi-agent systems and localizes root cause nodes via backward tracing combined with lightweight feature-based ranking (a weighted linear combination of five feature groups). On 550 synthetic fault scenarios, AgentTrace achieves Hit@1 of 94.9% with a latency of 0.12 seconds—69× faster than LLM-based analysis.
Copy-Paste to Mitigate Large Language Model Hallucinations: This paper proposes a Copy-Paste generation paradigm that trains LLMs to preferentially copy spans directly from retrieved context rather than paraphrasing them freely. Combined with high-copy-preference DPO training, the approach improves faithfulness on counterfactual RAG benchmarks from 80.2% to 92.8%.
Counterfactual Explanations on Robust Perceptual Geodesics: This paper proposes PCG (Perceptual Counterfactual Geodesic), a method that generates semantically faithful counterfactual explanations by optimizing geodesics on a robust perceptual manifold. A two-stage optimization ensures that the resulting path is both perceptually natural and reaches the target class. PCG achieves FID=8.3 on AFHQ, substantially outperforming RSGD (FID=12.9).
Direct Doubly Robust Estimation of Conditional Quantile Contrasts: This paper proposes the first direct estimation method for the conditional quantile contrast (CQC) by explicitly parameterizing the CQC and combining it with doubly robust gradient descent. The approach maintains theoretical double robustness while empirically outperforming existing indirect inversion methods across estimation accuracy, interpretability, and computational efficiency.
Distributional Equivalence in Linear Non-Gaussian Latent-Variable Cyclic Causal Models: This work provides, for the first time in the linear non-Gaussian setting and without any structural assumptions, a complete graphical criterion for distributional equivalence among causal graphs with latent variables and cycles. The central technical tool is the newly proposed edge rank constraints, upon which algorithms for enumerating equivalence classes and recovering causal models from data are developed — representing the first equivalence characterization and discovery method in parametric causal models that requires no structural assumptions.
Efficient Ensemble Conditional Independence Test Framework for Causal Discovery: This paper proposes E-CIT (Ensemble Conditional Independence Test), a framework that partitions data into subsets, performs independent tests on each subset, and aggregates the resulting p-values via a stable distribution-based combination method. E-CIT reduces the computational complexity of any base CIT to linear in sample size, while maintaining or improving test power in challenging settings such as heavy-tailed noise and real-world data.
Flattery, Fluff, and Fog: Diagnosing and Mitigating Idiosyncratic Biases in Preference Models: This paper systematically investigates the over-reliance of preference models on five surface-level features (verbosity, structure, jargon, sycophancy, and vagueness). By constructing causal counterfactual pairs, it quantifies how biases originate from distributional imbalances in training data, and proposes a post-training method based on Counterfactual Data Augmentation (CDA) that reduces the average miscalibration rate relative to human judgments from 39.4% to 32.5%.
Function Induction and Task Generalization: An Interpretability Study with Off-by-One Addition: Using off-by-one addition (e.g., 1+1=3, 2+2=5) as a counterfactual task, this paper applies path patching to reveal a function induction mechanism within large language models — an attention head circuit that performs inductive reasoning at the function level, transcending token-level pattern matching — and demonstrates that this mechanism is reused across tasks.
Journey to the Centre of Cluster: Harnessing Interior Nodes for A/B Testing under Network Interference: To address the high-variance problem in GATE estimation for A/B testing under network interference, this paper proposes the Mean-in-Interior (MII) estimator—which averages only over interior nodes within each cluster to substantially reduce variance—and further introduces a counterfactual predictor to correct for covariate shift, yielding the augmented AMII estimator that achieves low bias and low variance simultaneously.
Learning Robust Intervention Representations with Delta Embeddings: This paper proposes the Causal Delta Embedding (CDE) framework, which represents interventions/actions as vector differences between pre- and post-intervention states in the latent space. Three constraints—independence, sparsity, and invariance—are imposed on the delta vectors to learn robust intervention representations. The framework significantly surpasses baselines on the Causal Triplet benchmark in OOD generalization, and autonomously discovers anti-parallel semantic structures for antonymous actions.
On the Eligibility of LLMs for Counterfactual Reasoning: A Decompositional Study: This paper proposes a decompositional evaluation framework grounded in Structural Causal Models (SCMs), decomposing LLM counterfactual reasoning into four stages (causal variable identification → causal graph construction → intervention identification → outcome reasoning). It systematically diagnoses capability bottlenecks at each stage across 11 multimodal datasets, and introduces tool-augmented and advanced elicitation strategies to improve performance.
Resisting Contextual Interference in RAG via Parametric-Knowledge Reinforcement: This paper proposes Knowledgeable-R1, a reinforcement learning-based framework that jointly samples trajectories from parametric knowledge (PK) and contextual knowledge (CK), combined with local/global advantage estimation and adaptive asymmetric advantage transformation, enabling LLMs to resist misleading retrieved contexts in RAG scenarios while preserving the ability to leverage reliable context.
RFEval: Benchmarking Reasoning Faithfulness under Counterfactual Perturbations: This paper proposes a formal framework for reasoning faithfulness (stance consistency + causal influence) and the RFEval benchmark (7,186 instances × 7 tasks). By applying output-level counterfactual interventions to evaluate 12 open-source LRMs, it finds that 49.7% of outputs are unfaithful and that accuracy is not a reliable proxy for faithfulness.
Self-Supervised Learning from Structural Invariance: This paper proposes AdaSSL, which introduces latent variables to model conditional uncertainty between positive pairs, derives a variational lower bound on mutual information, and enables SSL to handle complex (multimodal, heteroscedastic) conditional distributions in naturally paired data. AdaSSL outperforms baselines on causal representation learning, fine-grained image understanding, and video world models.
SelfReflect: Can LLMs Communicate Their Internal Answer Distribution?: This paper proposes SelfReflect — an information-theoretic distance metric that measures the discrepancy between an LLM's self-reported uncertainty summary and its true internal answer distribution. The study finds that modern LLMs are broadly incapable of autonomously reflecting their internal uncertainty, but that faithful uncertainty summaries can be generated by sampling multiple outputs and feeding them back into the context.
Synthesising Counterfactual Explanations via Label-Conditional Gaussian Mixture Variational Autoencoders: This paper proposes L-GMVAE (Label-Conditional Gaussian Mixture VAE) and the LAPACE algorithm. By learning multiple Gaussian cluster centers per class in the latent space and performing linear interpolation from the input's latent representation to the target class center, the method generates path-based counterfactual explanations while guaranteeing validity, plausibility, diversity, and perfect robustness to input perturbations.
Validating Interpretability in siRNA Efficacy Prediction: A Perturbation-Based, Dataset-Aware Protocol: This paper proposes a standardized perturbation-based saliency faithfulness validation protocol for siRNA efficacy prediction, serving as a "pre-synthesis checkpoint" to assess the reliability of saliency maps. The authors further introduce BioPrior, a biologically informed regularization method to improve saliency faithfulness. Results show that 19/20 fold-dataset instances pass the validation, while cross-dataset transfer reveals two distinct failure modes.