UnCLe: Towards Scalable Dynamic Causal Discovery in Non-Linear Temporal Systems¶

Conference: NeurIPS 2025 arXiv: 2511.03168 Code: GitHub Area: Causal Discovery / Time Series Analysis Keywords: dynamic causal discovery, time series, Granger causality, temporal perturbation, nonlinear systems

TL;DR¶

This paper proposes UnCLe, a scalable dynamic causal discovery method based on TCN autoencoder disentanglement and autoregressive dependency matrices. It infers time-varying causal relationships by measuring per-timestep prediction error increments following temporal perturbation, achieving state-of-the-art performance on both static and dynamic causal discovery benchmarks.

Background & Motivation¶

Discovering causal relationships from observational time series is a fundamental problem in understanding complex systems. Real-world systems typically exhibit dynamic causality—causal relationships that evolve over time:

Predator–prey relationships shift with seasons

Gene regulatory networks change across developmental stages

Human biomechanics exhibit different inter-joint causal relationships across motion phases (takeoff → flight → landing)

Nevertheless, mainstream temporal causal discovery methods almost exclusively infer static causal graphs (time-averaged dependencies), disregarding the evolving nature of causal relationships.

Specific limitations of existing methods: - NeuralGC / GVAR / TCDF: Component-wise design with \(O(N^2)\) parameters, not scalable to large systems - GVAR: Capable of generating dynamic causal graphs but with limited accuracy (e.g., direction reversals in TVSEM cannot be captured) - JRNGC / CUTS+: Address scalability via parameter sharing but do not support dynamic causal discovery - No existing method has been rigorously evaluated on dynamic causal datasets

Method¶

Overall Architecture¶

UnCLe operates in two stages:

Training stage: Learns a semantically disentangled representation of the time series along with autoregressive inter-variable dependency relationships
Post-hoc analysis stage: Infers dynamic causal graphs via temporal perturbation, or infers static causal graphs via dependency matrix aggregation

Key Designs¶

Uncoupler-Recoupler Network (Semantic Disentanglement Autoencoder)

Core Idea: Disentangles complex multivariate time series into multi-channel semantic representations, enabling inter-variable dependencies to be modeled linearly in the latent space.

- **Uncoupler**: A parameter-shared TCN encoder mapping each univariate time series $x_i \in \mathbb{R}^T$ to a $C$-channel latent representation $z_i \in \mathbb{R}^{T \times C}$
- **Recoupler**: A parameter-shared TCN decoder reconstructing the original series $\tilde{x}_i$ from latent representations
- **Parameter sharing** is critical: all $N$ variables share a single set of TCN parameters, substantially improving learning efficiency and generalization (ablations show significant performance degradation without it)
- Causal convolutions in the TCN ensure no information leaks from the future to the past

Theoretical Motivation: An appropriate coordinate transformation (learned by the Uncoupler) can approximate complex nonlinear dynamics as a linear system—the central insight of Koopman operator theory.

Auto-regressive Dependency Matrices

Core Idea: In the disentangled latent space, simple linear matrices model inter-variable dependencies.

- Maintains $C$ dependency matrices $\Psi = \{\Psi^1, \ldots, \Psi^C\}$, where each $\Psi^c \in \mathbb{R}^{N \times N}$
- Autoregressive prediction: $\hat{z}^c_{:,t+1} = \sigma(\Psi^c \cdot z^c_{:,t})$
- The predicted latent representation is fed into the Recoupler to generate predictions in the original space
- $L_1$ regularization encourages sparsity in the dependency matrices, suppressing spurious causal connections

Dynamic Granger Causal Inference via Temporal Perturbation

Core Idea: If \(x_j\) is a true cause of \(x_i\), randomly permuting the temporal structure of \(x_j\) will substantially increase the model's prediction error for \(x_i\).

Procedure: - Apply a random temporal permutation to variable \(j\) (preserving marginal distribution while destroying temporal information) - Compute the per-timestep prediction error increment before and after perturbation: \(\Delta\varepsilon^{\setminus j}_{i,t} = \max(0,\ \varepsilon^{\setminus j}_{i,t} - \varepsilon_{i,t})\) - The error increment serves as the dynamic causal strength of \(x_j \to x_i\) at time \(t\) - Repeating this for all variable pairs and timesteps yields the time-resolved causal graph \(\hat{G}^{\text{Pert}}\)

Why temporal permutation over other perturbations? - Zero-value masking distorts the data distribution, destabilizing the model - Noise injection cannot fully eliminate the original signal - Temporal permutation both eliminates predictive temporal information and perfectly preserves the marginal distribution of each variable

Dependency Aggregation for Static Causal Graphs
- Aggregates dependency matrices \(\Psi\) along the channel dimension via \(L_2\)-norm: \(A^{\text{Agg}}_{l,k} = \sqrt{\frac{1}{C} \sum_c (\Psi^c_{l,k})^2}\)
- Efficiently obtains a static global causal graph without post-hoc perturbation analysis

Loss & Training¶

Two-stage training:
Pre-training: Optimize reconstruction loss \(L_{\text{Recon}}\) (MSE) only, training Uncoupler + Recoupler
Full training: Joint optimization \(L_{\text{Total}} = L_{\text{Recon}} + \alpha \cdot L_{\text{Pred}} + L_{L_1}\)
\(L_{\text{Recon}}\): Reconstruction MSE loss
\(L_{\text{Pred}}\): Prediction MSE loss (autoregressive next-step prediction)
\(L_{L_1}\): \(L_1\) regularization on dependency matrices to promote sparsity
Dropout 0.2: Additional regularization for TCN training
\(\alpha\) is a hyperparameter weighting the prediction task

Key Experimental Results¶

Main Results¶

Static Causal Discovery (Synthetic Dataset Lorenz96):

Dataset	Metric	UnCLe(P)	CUTS+	JRNGC	NeuralGC
Lorenz#1 (\(p\)=20, \(F\)=10)	AUROC	.999	.986	.963	.891
Lorenz#2 (\(p\)=20, \(F\)=40)	AUROC	.969	.852	.786	.657
Lorenz#3 (\(p\)=100, \(T\)=500)	AUROC	.964	.946	.884	Not scalable

Dynamic Causal Discovery:

Dataset	Metric	UnCLe(P)	GVAR	Static Best
TVSEM (bivariate alternating causality)	AUROC	1.000	0.733	0.467
TVSEM	AUPRC	1.000	0.400	0.300
ND8 (8-variable nonlinear)	AUROC	.921	.723	.905
ND8	AUPRC	.633	.220	.799

Human Motion Capture (MoCap) — Skeletal Connection Missing Rate:

Method	Missing Rate ↓
UnCLe	.200
GVAR	.622
JRNGC	.600

Ablation Study¶

Configuration	Lorenz#3 AUROC	Note
Full UnCLe(P)	~0.96	Baseline
w/o parameter sharing	Significant drop	Parameter sharing critical in high-dimensional settings
w/o dependency matrices	<0.5 (random)	Model completely fails to learn causal structure
w/o prediction task	Very low	Reconstruction alone is insufficient to model inter-variable dependencies

Perturbation Strategy Comparison (Lorenz#1):

Perturbation Strategy	AUROC	AUPRC	ACC
Temporal permutation (Ours)	.999	.996	.994
Noise injection	.981	.946	.978
Zero-value masking	.974	.932	.969
No perturbation	.500	.575	.850

Key Findings¶

Perfect dynamic causal discovery on TVSEM (AUROC=1.0), while GVAR achieves only 0.733 and fails to capture causal direction reversals
In human motion analysis, UnCLe's dynamic causal graphs are highly consistent with biomechanical knowledge: upper limbs dominate during takeoff → lower limbs coordinate during flight → whole-body involvement at landing
Skeletal connection missing rate of only 0.200 (vs. ~0.6 for GVAR/JRNGC), indicating that UnCLe preserves fundamental anatomical structure
Dependency matrices are the core component—removing them reduces AUROC to random chance; parameter sharing is essential for high-dimensional settings

Highlights & Insights¶

A rare contribution to dynamic causal discovery: The vast majority of existing methods only infer static causal graphs; UnCLe is among the few capable of generating time-resolved causal graphs
Elegant "disentanglement + linearization" design: The TCN Uncoupler transforms nonlinear dynamics into a latent space amenable to linear modeling, while the dependency matrices capture inter-variable relationships via simple linear transformations—analogous to a discrete approximation of Koopman operator theory
Principled choice of perturbation analysis: Temporal permutation is the optimal perturbation strategy as it simultaneously satisfies two conditions: eliminating temporal predictive information and preserving marginal distributions
Compelling MoCap case study: Anchoring the dynamic causal graphs to the human skeletal structure reveals causal patterns across motion phases that align fully with the biomechanics literature
Excellent scalability: Parameter sharing enables the model to handle 100+ variables, where most baseline methods fail or suffer severe performance degradation

Limitations & Future Work¶

Lack of identifiability guarantees: The authors explicitly acknowledge this as the primary limitation—no theoretical proof that latent linearization ensures causal faithfulness
Inherent assumptions of Granger causality: The framework assumes no latent confounders; spurious causal connections may arise in the presence of unobserved variables
Temporal perturbation must be applied separately for each variable, incurring \(O(N)\) times the computational cost of a single forward pass
Only uniformly sampled time series are supported; irregular time series are not handled
Evaluation metrics for dynamic causal graphs remain limited—quantitatively measuring "the quality of causal evolution capture" is an open challenge

NeuralGC (Tank et al.): A classical neural Granger causality method; component-wise design is not scalable
GVAR: The only prior method capable of generating dynamic causal graphs, but with far lower accuracy than UnCLe
CUTS+: A scalable method with parameter sharing, but limited to static causal discovery
Koopman theory: The idea of linearizing nonlinear dynamics inspired the design of the Uncoupler
Takeaway: Causal discovery can be fruitfully approached from the perspective of "disentangled representations + linear modeling"; parameter sharing and perturbation analysis constitute simple yet effective engineering choices

Rating¶

Novelty: ⭐⭐⭐⭐⭐ Dynamic causal discovery is an underexplored yet important problem; the combination of Uncoupler and perturbation analysis is novel
Experimental Thoroughness: ⭐⭐⭐⭐⭐ Comprehensive evaluation covering synthetic and real data, static and dynamic settings, scalability, ablations, and perturbation strategy comparisons
Writing Quality: ⭐⭐⭐⭐ The MoCap case study is vivid and illustrative; mathematical formulations are clear
Value: ⭐⭐⭐⭐⭐ Fills the gap in dynamic causal discovery; the method is concise and practical, with open-source code and data