Efficient Ensemble Conditional Independence Test Framework for Causal Discovery¶

Conference: ICLR 2026 arXiv: 2509.21021 Code: None Area: Causal Inference Keywords: conditional independence test, causal discovery, ensemble method, stable distribution, p-value combination

TL;DR¶

This paper proposes E-CIT (Ensemble Conditional Independence Test), a framework that partitions data into subsets, performs independent tests on each subset, and aggregates the resulting p-values via a stable distribution-based combination method. E-CIT reduces the computational complexity of any base CIT to linear in sample size, while maintaining or improving test power in challenging settings such as heavy-tailed noise and real-world data.

Background & Motivation¶

Background: Constraint-based causal discovery methods (e.g., the PC algorithm) rely on numerous conditional independence tests (CITs) to determine causal graph structure. KCIT (kernel-based CIT) is among the most popular approaches, but incurs $O(n^3)$ time complexity with respect to sample size.

Limitations of Prior Work: - The high computational cost of individual CITs is the primary bottleneck in causal discovery, not the number of tests performed. - Existing acceleration methods (RCIT, FastKCIT) are tailored specifically to KCIT and do not constitute general-purpose frameworks. - Shah & Peters (2018) proved that no single CIT is uniformly powerful across all conditional dependence structures — making a general acceleration framework more valuable than improving any single method.

Key Challenge: Large samples are necessary to ensure test power, yet the high complexity of CITs renders large-sample computation infeasible.

Goal: Design a general, plug-and-play framework applicable to any CIT method to reduce computational overhead while preserving statistical power.

Key Insight: Drawing inspiration from ensemble learning — data are partitioned into fixed-size subsets, tests are performed independently, and the resulting p-values are aggregated. The key innovation lies in the aggregation step: the closure property of stable distributions is exploited to design a p-value combination method with guaranteed consistency.

Core Idea: Divide-and-conquer + stable distribution p-value aggregation = a linear-complexity acceleration framework for arbitrary CITs.

Method¶

Overall Architecture¶

E-CIT follows a three-step pipeline (Figure 1): 1. Divide: Partition $n$ samples uniformly into $K$ subsets of fixed size $n_k$, where $K = n / n_k$. 2. Test: Apply the base CIT independently to each subset, yielding $K$ p-values $\{p_1, \ldots, p_K\}$. 3. Aggregate: Combine the $K$ p-values into a final p-value using a stable distribution-based method.

With $n_k$ fixed, the total complexity of the base CIT becomes $K \times O(f(n_k)) = O(n)$, achieving linearization compared to the original $O(f(n))$.

Key Designs¶

Stable Distribution-Based P-value Aggregation (Definition 2):
- Function: Combines the p-values from $K$ sub-tests into a single final p-value with guaranteed statistical properties.
- Mechanism: Exploits the closure property of stable distributions — if $X_j \sim \mathbf{S}(\alpha, \beta, \gamma, \delta)$ are i.i.d., then $\frac{1}{K}\sum X_j \sim \mathbf{S}(\alpha, \beta, K^{1/\alpha - 1}\gamma, \delta)$. The test statistic is defined as: $$T_e = \frac{1}{K} \sum_{k=1}^K F_S^{-1}(p_k)$$ The final p-value is $p_e = F_{S'}(T_e)$, where $S' = \mathbf{S}(\alpha, \beta, K^{1/\alpha-1}\gamma, \delta)$.
- Design Motivation: The parameter $\alpha$ controls tail heaviness; $\alpha = 2$ recovers the Stouffer method (Gaussian), and $\alpha = 1$ corresponds to the Cauchy combination. Tuning $\alpha$ allows adaptation to different CITs and data characteristics.
Theoretical Guarantees (Theorem 1 & 2):
- Function: Establishes validity, admissibility, unbiasedness, and consistency of the ensemble test.
- Mechanism:
  - Validity: Under the null hypothesis, $p_e$ is uniformly distributed on $[0,1]$ (for exact p-values).
  - Consistency (Theorem 2): Power approaches 1 as $K \to \infty$, requiring only: ① the expected p-value of sub-tests is $\le \alpha_e$; ② the p-value density on $[0, 1/2]$ is no less than its mirror value; ③ stable distribution parameters satisfy $\alpha \ge 1, \beta = \delta = 0$.
- Design Motivation: The consistency conditions impose no assumptions on the data-generating process, requiring only that sub-tests be reasonably valid. This enables E-CIT to provide consistency guarantees even in complex settings where the base CIT itself lacks them.
Flexibility via the $\alpha$ Parameter:
- Function: Controls the degree of flexibility in p-value aggregation.
- Mechanism: By the Neyman–Pearson lemma, the optimal combination statistic is a monotone transformation of $-\sum \log f_1(p_k)$. Since different CITs yield different alternative p-value distributions under different dependence structures, $\alpha$ enables adaptive tuning.
- Design Motivation: Classical methods (Fisher, Stouffer) correspond to fixed values of $\alpha$ and lack flexibility; E-CIT provides a parsimonious one-dimensional control via $\alpha$.

Loss & Training¶

E-CIT is an unsupervised method requiring no training.
Practical recommendations: $n_k = 400$ (empirically determined), $\alpha \in \{1.75, 2.0\}$, $\beta = \delta = 0$, $\gamma = 1$.

Key Experimental Results¶

Main Results¶

Data generation follows a post-nonlinear model, with $Z$ drawn from normal or Laplace distributions, and noise from Student-t, Cauchy, or Laplace distributions.

Computational Efficiency (Figure 2, KCIT acceleration):

Method	Time Complexity	Runtime at n=2000	Type I Error	Power
KCIT (original)	$O(n^3)$	~100s	~0.05	baseline
RCIT	$O(n)$	~0.1s	~0.05	slightly below KCIT
FastKCIT	$O(n \log n)$	~1s	~0.05	close to KCIT
E-KCIT	$O(n)$	~0.1s	~0.05	matches or exceeds KCIT (better under heavy tails)

Cross-Method Generality (Table 2, n=1200, Normal Z, t-noise df=2):

Method	Orig. Power	Ensemble Power (α=1.75)
RCIT	0.548	0.623
LPCIT	0.422	0.447
CMIknn	0.982	0.988
FisherZ	0.510	0.561
CCIT	0.904 (Type I=0.454!)	0.816 (Type I=0.286↓)

Ablation Study¶

Real Data: Flow-Cytometry (Table 3):

Method	Orig. F1	Ensemble F1
KCIT	0.624	0.695
RCIT	0.665	0.687
LPCIT	0.691	0.741
CMIknn	0.779	0.756
FisherZ	0.737	0.767

Key Findings¶

Significant speedup: E-KCIT reduces KCIT's $O(n^3)$ complexity to $O(n)$, achieving runtime comparable to RCIT.
Power gains, not losses: Under heavy-tailed noise (Student-t df=2, Cauchy), E-KCIT surpasses both KCIT and RCIT in power — subset-level estimation is more stable.
Generality: Effective across 6 distinct CIT methods (KCIT, RCIT, LPCIT, CMIknn, CCIT, FisherZ).
Real-data advantage: On the Flow-Cytometry dataset, E-CIT improves F1-score by 2–5 percentage points for most methods.
Unexpected finding for CCIT: E-CIT substantially reduces CCIT's inflated Type I error (from 0.45+ to 0.28–0.34), at the cost of a modest reduction in power, yielding better-calibrated tests.
Causal discovery application (Figure 3): On nonlinear additive noise causal graphs, E-KCIT outperforms both KCIT and RCIT in F1 and SHD.

Highlights & Insights¶

A general framework, not a specific method: E-CIT functions as an accelerator rather than a new CIT — it can be plugged into any existing method.
Extremely mild theoretical consistency conditions: No assumptions are placed on the data or model; only reasonable validity of sub-tests is required.
Elegant application of stable distributions: The closure property of stable distributions enables exact p-value aggregation, with $\alpha$ providing flexible control.
Practical insight: In complex settings (heavy tails, real data), ensemble aggregation can improve power — small-sample estimates are more stable, and aggregation compensates for individual weaknesses.

Limitations & Future Work¶

The theoretical analysis assumes sub-test p-values are i.i.d. — correlation may arise in practice (e.g., time-series data or distributional shift).
Optimal selection of $\alpha$ is context-dependent; only empirical recommendations ($\{1.75, 2.0\}$) are currently provided.
Subset size $n_k$ must be sufficiently large for valid sub-tests — the curse of dimensionality may persist for very high-dimensional conditioning sets $Z$.
Methods already exhibiting strong performance (e.g., CMIknn) benefit less, suggesting the framework is most effective for accelerating moderately powerful tests.
Future directions include handling correlated p-values, optimizing $\alpha$ for specific CITs, and developing adaptive subset size selection.

RCIT (Strobl et al., 2019): Accelerates KCIT via random Fourier features — applicable only to KCIT, whereas E-CIT is a general framework.
FastKCIT (Schacht & Huang, 2025): Partitions data using GMMs — conceptually similar but designed exclusively for KCIT.
Cauchy combination method (Liu & Xie, 2020): Combines p-values via the Cauchy distribution for whole-genome sequencing tests — E-CIT generalizes this to arbitrary stable distributions in the CIT context.
Insight: Divide-and-conquer with aggregation is a general paradigm for large-scale statistical testing, with potential applicability to other computationally demanding testing problems.

Rating¶

Novelty: ⭐⭐⭐⭐ The closure property of stable distributions is creatively applied to p-value aggregation for CIT, yielding a clear and practical framework; however, the divide-and-aggregate paradigm itself is not entirely novel.
Experimental Thoroughness: ⭐⭐⭐⭐⭐ Synthetic data + real data + causal discovery application; 6 CIT methods × multiple noise distributions × multiple sample sizes; comprehensive ablation.
Writing Quality: ⭐⭐⭐⭐ Well-structured with good integration of theory and experiments; Figure 1 provides an intuitive overview; proofs are deferred to the appendix, keeping the main text accessible.
Value: ⭐⭐⭐⭐ Highly practical — directly integrable into existing causal discovery pipelines; meaningful for large-scale causal discovery applications.

Method	Time Complexity	Runtime at n=2000	Type I Error	Power
KCIT (original)	\(O(n^3)\)	~100s	~0.05	baseline
RCIT	\(O(n)\)	~0.1s	~0.05	slightly below KCIT
FastKCIT	\(O(n \log n)\)	~1s	~0.05	close to KCIT
E-KCIT	\(O(n)\)	~0.1s	~0.05	matches or exceeds KCIT (better under heavy tails)