Learning Subgroups with Maximum Treatment Effects without Causal Heuristics¶
Conference: AAAI 2026 arXiv: 2511.20189 Code: https://github.com/ylincen/causal-subgroup Area: Causal Inference / Subgroup Discovery Keywords: Treatment Effects, Subgroup Discovery, CART, Partition Model, Causal Inference
TL;DR¶
Under the SCM framework, the paper proves that the subgroup with maximum treatment effect must exhibit homogeneous pointwise effects (Theorem 1); under the partition model assumption, it proves that optimal subgroup discovery reduces to standard supervised learning (Theorem 2), achievable via CART with the Gini index. On 77 ACIC-2016 semi-synthetic datasets, the proposed method achieves a mean treatment effect of 10.54 (vs. 7.84 for the runner-up), ranking first on 51.9% of datasets.
Background & Motivation¶
Background: Discovering subgroups with maximum average treatment effect is a central problem in causal inference. Existing methods (CausalTree, QUINT, SIDES, etc.) design specialized "causal heuristics" to construct splitting criteria.
Limitations of Prior Work: Causal heuristics are fragile — (1) imbalance between treatment and control groups degrades split quality during tree growth; (2) specialized splitting criteria lack theoretical optimality guarantees; (3) assumptions vary substantially across methods, leading to inconsistent results.
Key Challenge: Does causal subgroup discovery genuinely require specialized "causal" methods, or is standard supervised learning sufficient?
Goal: To prove that, under reasonable assumptions, maximum-effect subgroup discovery is equivalent to a standard classification/regression problem.
Key Insight: Theory-driven — first prove the homogeneity theorem (Theorem 1), then prove the reduction theorem under the partition model (Theorem 2), and finally implement the approach using the simplest possible CART.
Core Idea: The subgroup with maximum treatment effect must be one of the homogeneous partitions → learn partitions via CART → estimate treatment effect for each partition → select the one with the largest effect.
Method¶
Overall Architecture¶
(1) Learn data partitions using CART with Gini index (classification) or MSE (regression); (2) Honest inference: learn the tree on a training set, estimate treatment effects for each leaf on a held-out test set; (3) Select the leaf with the largest estimated effect as the target subgroup.
Key Designs¶
-
Theorem 1 (Homogeneity): The maximum-effect subgroup \(Q^*\) must have homogeneous pointwise treatment effects — \(Q' \subset Q\): if \(A(Q') \leq A(Q)\), then \(A(Q \setminus Q') \geq A(Q)\).
-
Theorem 2 (Reduction): Under the partition model \(Y = f_Y(T, \sum_i i \cdot \mathbf{1}_{K_i}(X), N_Y)\), the maximum-effect subgroup must correspond to some partition \(K_i\) → standard supervised learning suffices to recover the partition.
-
Theorem 3 (Extension): Results extend to settings with hidden confounders \(U\).
-
Honest Inference: Training/evaluation split — tree structure is learned on one half of the data; treatment effects are estimated on the other half, preventing overfitting.
Loss & Training¶
CART: Gini index (classification) or MSE (regression), with cost-complexity pruning and cross-validation.
Key Experimental Results¶
Main Results (77 ACIC-2016 Semi-Synthetic Datasets)¶
| Method | Mean Treatment Effect ↑ | Rank-1 Ratio |
|---|---|---|
| Ours (CART) | 10.540 | 51.9% |
| CausalTree | 7.843 | 14.3% |
| CURLS | 7.410 | 18.0% |
| DistillTree | 7.451 | 13.0% |
| InteractionTree | 6.280 | 3.9% |
| QUINT | 5.135 | 0.0% |
| SIDES | 4.622 | 1.3% |
Statistical Significance (Holm-corrected Wilcoxon)¶
| vs. Method | \(p_{\text{holm}}\) |
|---|---|
| QUINT | 7.75e-14 |
| SIDES | 7.75e-14 |
| InteractionTree | 8.68e-12 |
| CausalTree | 1.21e-04 |
| CURLS | 1.30e-05 |
Key Findings¶
- Standard CART ranks first on 51.9% of the 77 datasets, substantially outperforming all specialized causal methods.
- All improvements are statistically significant under Wilcoxon + Holm correction (\(p < 0.001\)).
- Causal heuristic methods are fragile under treatment/control imbalance — Gini/MSE is unaffected by this issue.
- Results generalize beyond the partition model assumption, remaining effective on semi-synthetic data.
Highlights & Insights¶
- The theoretical conclusion that "causal heuristics are unnecessary" is striking — under reasonable assumptions, the simplest CART outperforms all specialized causal methods.
- Honest inference (training/evaluation separation) is a critical implementation detail — it prevents the bias introduced by selecting subgroups and estimating effects on the same data.
Limitations & Future Work¶
- The partition model assumption may not fully hold in settings with continuous treatment effects.
- Axis-aligned splits in CART constrain the shape of discoverable subgroups.
- Performance in high-dimensional feature spaces (>50 features) has not been validated.
Related Work & Insights¶
- vs. CausalTree: CausalTree employs causally-specific splitting criteria, yielding a mean effect of 7.84 vs. 10.54 for the proposed method.
- vs. CURLS: CURLS uses a regularized causal loss, yet still underperforms standard Gini.
Rating¶
- Novelty: ⭐⭐⭐⭐⭐ The theoretical conclusion that causal heuristics are unnecessary overturns conventional wisdom.
- Experimental Thoroughness: ⭐⭐⭐⭐ 77 semi-synthetic + synthetic datasets, 7 baselines, and statistical significance testing.
- Writing Quality: ⭐⭐⭐⭐⭐ Clear theorem–experiment correspondence and rigorous logic.
- Value: ⭐⭐⭐⭐⭐ Offers fundamental methodological insights for causal inference.