ICLR 2026 (Oral) Causal Inference causal discovery latent variables cyclic causal models distributional equivalence edge rank constraints non-Gaussian models

Distributional Equivalence in Linear Non-Gaussian Latent-Variable Cyclic Causal Models¶

Conference: ICLR 2026 (Oral) arXiv: 2603.04780 Code: MarkDana/Equiv-LiNG-Latent / Online Demo Area: Causal Inference / Causal Discovery Keywords: causal discovery, latent variables, cyclic causal models, distributional equivalence, edge rank constraints, non-Gaussian models

TL;DR¶

This work provides, for the first time in the linear non-Gaussian setting and without any structural assumptions, a complete graphical criterion for distributional equivalence among causal graphs with latent variables and cycles. The central technical tool is the newly proposed edge rank constraints, upon which algorithms for enumerating equivalence classes and recovering causal models from data are developed — representing the first equivalence characterization and discovery method in parametric causal models that requires no structural assumptions.

Background & Motivation¶

Background: Causal discovery aims to infer causal relationships from observational data and is a foundational task in causal reasoning. In real-world settings, data frequently involve unobserved latent variables and causal cycles — such as feedback loops in gene regulatory networks or mutual causation in economic systems — rendering causal discovery particularly challenging.

Limitations of Prior Work: 1. Most methods require latent variables to follow specific indicator patterns (e.g., each latent variable affects at least two observed variables and is not shared). 2. Some methods restrict how latent variables interact with other variables (e.g., no direct latent-to-latent causal relationships). 3. Nearly all methods prohibit cycles in causal graphs, restricting attention to DAG structures. 4. These structural assumptions frequently fail in practice, severely limiting the applicability of existing approaches.

Key Challenge: The central obstacle to developing general, assumption-free methods is the absence of an equivalence characterization — without knowing what is identifiable (i.e., which graphs produce the same observational distribution), it is generally impossible to design identification procedures. Without an equivalence theory, the upper bound on achievable precision in causal discovery remains unknown.

Goal: To establish a complete equivalence theory under linear non-Gaussian models (LiNG-Latent). The core innovation is the introduction of edge rank constraints as a new tool — filling the gap left by independence constraints, which are incomplete in the latent-variable setting — thereby enabling a complete graphical criterion.

Method¶

Overall Architecture¶

The proposed framework advances across three levels, forming a complete chain from theory to application:

Input: Observational data (assumed to be generated by a linear non-Gaussian model), potentially involving latent variables and causal cycles.
Level 1 — Equivalence Characterization: Establish a graphical criterion to determine whether two causal graphs are distributionally equivalent.
Level 2 — Equivalence Class Enumeration: Given a graph, identify all distributionally equivalent graphs.
Level 3 — Learning from Data: Recover the causal model from observational data up to the equivalence class.

The model is formalized as a linear structural equation:

\[X = BX + \Lambda L + E\]

where \(X\) is the vector of observed variables, \(B\) is the matrix of causal coefficients among observed variables (cycles permitted), \(L\) denotes latent variables, \(\Lambda\) is the effect matrix from latent to observed variables, and \(E\) is non-Gaussian independent noise.

Key Design 1: Edge Rank Constraints¶

This constitutes the central technical contribution. In latent-variable causal models, the mixing matrix encodes causal structure information. Classical methods rely on independence constraints (no causal path between two variables → statistical independence), but independence constraints are incomplete in the presence of latent variables — two variables may be statistically dependent through latent variables even without a direct causal relationship.

Edge rank constraints extract information about the existence of edges and the latent variable structure by analyzing the rank of submatrices of the mixing matrix:

If all causal paths from variable set \(S\) to variable \(X_i\) can be "explained" by \(r\) latent variables, then the rank of the corresponding submatrix of the mixing matrix is at most \(r\).
Such rank constraints are more fine-grained than independence constraints — independence constraints are a special case (rank zero = independence).
Edge rank constraints provide additional discriminative power in settings where independence constraints are incomplete.

Key Design 2: Complete Graphical Criterion for Distributional Equivalence¶

Two causal graphs with arbitrary latent variable structures and cycles are distributionally equivalent (i.e., generate the same set of observational distributions) if and only if they impose the same set of edge rank constraints.

The completeness of this criterion rests on: - Necessity: Distinct sets of edge rank constraints imply distinct distribution sets (demonstrated via counterexample construction). - Sufficiency: Identical sets of edge rank constraints imply that parameters can be constructed such that the two graphs generate identical distributions.

Non-Gaussianity plays a critical role here: compared to Gaussian models, non-Gaussian noise provides additional identifiability information, yielding finer equivalence classes (i.e., more causal graphs are distinguishable).

Key Design 3: Equivalence Class Enumeration and Learning Algorithm¶

Equivalence Class Enumeration: Given a causal graph \(\mathcal{G}\), all equivalent graphs are systematically enumerated by: - Adding, removing, or reversing edges among observed variables while preserving the set of edge rank constraints. - Modifying the number and connectivity pattern of latent variables. - Verifying that the modified graph imposes the same set of edge rank constraints.

Learning from Data: 1. Estimate the mixing matrix from data using principles from independent component analysis (ICA). 2. Determine edge rank constraints via matrix rank tests. 3. Search for causal graphs satisfying all constraints and output the equivalence class.

Key Experimental Results¶

Main Results: Validation of the Equivalence Criterion¶

Experimental Setting	Evaluation Target	Core Result
Synthetic data (acyclic, no latent variables)	Reduces to LiNGAM setting	Fully consistent with known results, confirming correctness
Synthetic data (cyclic, no latent variables)	Cyclic causal models	Graphical criterion correctly identifies all equivalent/non-equivalent graph pairs
Synthetic data (acyclic, with latent variables)	Latent variable DAGs	Edge rank constraints yield finer equivalence classes than independence constraints
Synthetic data (cyclic + latent variables)	Most general setting	Criterion is complete — graphs within the same equivalence class are indeed indistinguishable
Online interactive demo equiv.cc	User-defined verification	Users can manually specify graphs; the system instantly displays the equivalence class

Ablation Study: Contribution of Constraint Types¶

Constraint Combination	Equivalence Class Granularity	Identifiability Strength
Independence constraints only	Coarse (non-equivalent graphs incorrectly merged)	Weak
Independence + edge rank constraints	Finest (exact equivalence classes)	Strongest
Gaussian model + all constraints	Intermediate	Moderate (non-Gaussianity provides additional information)
Non-Gaussian + rank constraints only (no independence)	Near finest	Strong (rank constraints subsume independence constraints)

Key Findings¶

Edge rank constraints are indispensable: Relying solely on independence constraints causes multiple non-equivalent graphs to be incorrectly grouped into the same equivalence class; adding edge rank constraints tightens equivalence classes precisely.
Non-Gaussianity provides substantial identifiability gains: Compared to Gaussian models, the non-Gaussian setting yields smaller equivalence classes (more graphs are distinguishable), consistent with the classical LiNGAM findings.
Cycles do not fundamentally preclude causal discovery: Cyclic causal relationships enlarge equivalence classes but do not render the problem unsolvable — meaningful causal inference remains feasible in the linear non-Gaussian setting.
Equivalence class size grows with graph complexity: Yet the problem remains tractable for moderately sized graphs (\(\sim\)10 variables).

Highlights & Insights¶

Significant theoretical breakthrough: The first complete equivalence characterization in any parametric setting without structural assumptions — a landmark contribution to the causal discovery literature.
ICLR 2026 Oral: Acceptance as an oral presentation reflects strong reviewer recognition of the theoretical contribution.
Edge rank constraints have independent value: Beyond this paper, they have broad applicability to latent-variable causal discovery problems more generally.
Precise problem framing: The epistemological insight that "equivalence characterization is a prerequisite for general methods" provides important guidance for the broader causal discovery community.
Strong practicality: Open-source code and an interactive online demo at equiv.cc are provided, lowering the barrier to adoption.

Limitations & Future Work¶

Restricted to linear model assumptions; nonlinear causal relationships (additive noise models, post-nonlinear models, etc.) are not addressed.
The non-Gaussianity assumption may not hold in certain domains (e.g., approximately Gaussian financial data).
Algorithm computational complexity grows exponentially with the number of variables; scalability to large-scale problems (>20 variables) remains to be verified.
Equivalence class enumeration faces combinatorial explosion for very large equivalence classes.
Systematic evaluation on large-scale real data is lacking (experiments are primarily based on synthetic data and small-scale validation).
Future extensions to more general settings — mixed non-Gaussian/Gaussian, partially nonlinear models, etc. — remain open.

Rating¶

Novelty: ⭐⭐⭐⭐⭐
Experimental Thoroughness: ⭐⭐⭐
Writing Quality: ⭐⭐⭐⭐
Value: ⭐⭐⭐⭐⭐