Skip to content

UnCLe: Towards Scalable Dynamic Causal Discovery in Non-Linear Temporal Systems

Conference: NeurIPS 2025 arXiv: 2511.03168 Code: GitHub Area: Causal Discovery / Time Series Analysis Keywords: dynamic causal discovery, time series, Granger causality, temporal perturbation, nonlinear systems

TL;DR

This paper proposes UnCLe, a scalable dynamic causal discovery method based on TCN autoencoder disentanglement and autoregressive dependency matrices. It infers time-varying causal relationships by measuring per-timestep prediction error increments following temporal perturbation, achieving state-of-the-art performance on both static and dynamic causal discovery benchmarks.

Background & Motivation

Discovering causal relationships from observational time series is a fundamental problem in understanding complex systems. Real-world systems typically exhibit dynamic causality—causal relationships that evolve over time:

Predator–prey relationships shift with seasons

Gene regulatory networks change across developmental stages

Human biomechanics exhibit different inter-joint causal relationships across motion phases (takeoff → flight → landing)

Nevertheless, mainstream temporal causal discovery methods almost exclusively infer static causal graphs (time-averaged dependencies), disregarding the evolving nature of causal relationships.

Specific limitations of existing methods: - NeuralGC / GVAR / TCDF: Component-wise design with \(O(N^2)\) parameters, not scalable to large systems - GVAR: Capable of generating dynamic causal graphs but with limited accuracy (e.g., direction reversals in TVSEM cannot be captured) - JRNGC / CUTS+: Address scalability via parameter sharing but do not support dynamic causal discovery - No existing method has been rigorously evaluated on dynamic causal datasets

Method

Overall Architecture

UnCLe operates in two stages:

  1. Training stage: Learns a semantically disentangled representation of the time series along with autoregressive inter-variable dependency relationships
  2. Post-hoc analysis stage: Infers dynamic causal graphs via temporal perturbation, or infers static causal graphs via dependency matrix aggregation

Key Designs

  1. Uncoupler-Recoupler Network (Semantic Disentanglement Autoencoder)

Core Idea: Disentangles complex multivariate time series into multi-channel semantic representations, enabling inter-variable dependencies to be modeled linearly in the latent space.

- **Uncoupler**: A parameter-shared TCN encoder mapping each univariate time series $x_i \in \mathbb{R}^T$ to a $C$-channel latent representation $z_i \in \mathbb{R}^{T \times C}$
- **Recoupler**: A parameter-shared TCN decoder reconstructing the original series $\tilde{x}_i$ from latent representations
- **Parameter sharing** is critical: all $N$ variables share a single set of TCN parameters, substantially improving learning efficiency and generalization (ablations show significant performance degradation without it)
- Causal convolutions in the TCN ensure no information leaks from the future to the past

Theoretical Motivation: An appropriate coordinate transformation (learned by the Uncoupler) can approximate complex nonlinear dynamics as a linear system—the central insight of Koopman operator theory.

  1. Auto-regressive Dependency Matrices

Core Idea: In the disentangled latent space, simple linear matrices model inter-variable dependencies.

- Maintains $C$ dependency matrices $\Psi = \{\Psi^1, \ldots, \Psi^C\}$, where each $\Psi^c \in \mathbb{R}^{N \times N}$
- Autoregressive prediction: $\hat{z}^c_{:,t+1} = \sigma(\Psi^c \cdot z^c_{:,t})$
- The predicted latent representation is fed into the Recoupler to generate predictions in the original space
- $L_1$ regularization encourages sparsity in the dependency matrices, suppressing spurious causal connections
  1. Dynamic Granger Causal Inference via Temporal Perturbation

Core Idea: If \(x_j\) is a true cause of \(x_i\), randomly permuting the temporal structure of \(x_j\) will substantially increase the model's prediction error for \(x_i\).

Procedure: - Apply a random temporal permutation to variable \(j\) (preserving marginal distribution while destroying temporal information) - Compute the per-timestep prediction error increment before and after perturbation: \(\Delta\varepsilon^{\setminus j}_{i,t} = \max(0,\ \varepsilon^{\setminus j}_{i,t} - \varepsilon_{i,t})\) - The error increment serves as the dynamic causal strength of \(x_j \to x_i\) at time \(t\) - Repeating this for all variable pairs and timesteps yields the time-resolved causal graph \(\hat{G}^{\text{Pert}}\)

Why temporal permutation over other perturbations? - Zero-value masking distorts the data distribution, destabilizing the model - Noise injection cannot fully eliminate the original signal - Temporal permutation both eliminates predictive temporal information and perfectly preserves the marginal distribution of each variable

  1. Dependency Aggregation for Static Causal Graphs

    • Aggregates dependency matrices \(\Psi\) along the channel dimension via \(L_2\)-norm: \(A^{\text{Agg}}_{l,k} = \sqrt{\frac{1}{C} \sum_c (\Psi^c_{l,k})^2}\)
    • Efficiently obtains a static global causal graph without post-hoc perturbation analysis

Loss & Training

  • Two-stage training:
  • Pre-training: Optimize reconstruction loss \(L_{\text{Recon}}\) (MSE) only, training Uncoupler + Recoupler
  • Full training: Joint optimization \(L_{\text{Total}} = L_{\text{Recon}} + \alpha \cdot L_{\text{Pred}} + L_{L_1}\)
  • \(L_{\text{Recon}}\): Reconstruction MSE loss
  • \(L_{\text{Pred}}\): Prediction MSE loss (autoregressive next-step prediction)
  • \(L_{L_1}\): \(L_1\) regularization on dependency matrices to promote sparsity
  • Dropout 0.2: Additional regularization for TCN training
  • \(\alpha\) is a hyperparameter weighting the prediction task

Key Experimental Results

Main Results

Static Causal Discovery (Synthetic Dataset Lorenz96):

Dataset Metric UnCLe(P) CUTS+ JRNGC NeuralGC
Lorenz#1 (\(p\)=20, \(F\)=10) AUROC .999 .986 .963 .891
Lorenz#2 (\(p\)=20, \(F\)=40) AUROC .969 .852 .786 .657
Lorenz#3 (\(p\)=100, \(T\)=500) AUROC .964 .946 .884 Not scalable

Dynamic Causal Discovery:

Dataset Metric UnCLe(P) GVAR Static Best
TVSEM (bivariate alternating causality) AUROC 1.000 0.733 0.467
TVSEM AUPRC 1.000 0.400 0.300
ND8 (8-variable nonlinear) AUROC .921 .723 .905
ND8 AUPRC .633 .220 .799

Human Motion Capture (MoCap) — Skeletal Connection Missing Rate:

Method Missing Rate ↓
UnCLe .200
GVAR .622
JRNGC .600

Ablation Study

Configuration Lorenz#3 AUROC Note
Full UnCLe(P) ~0.96 Baseline
w/o parameter sharing Significant drop Parameter sharing critical in high-dimensional settings
w/o dependency matrices <0.5 (random) Model completely fails to learn causal structure
w/o prediction task Very low Reconstruction alone is insufficient to model inter-variable dependencies

Perturbation Strategy Comparison (Lorenz#1):

Perturbation Strategy AUROC AUPRC ACC
Temporal permutation (Ours) .999 .996 .994
Noise injection .981 .946 .978
Zero-value masking .974 .932 .969
No perturbation .500 .575 .850

Key Findings

  • Perfect dynamic causal discovery on TVSEM (AUROC=1.0), while GVAR achieves only 0.733 and fails to capture causal direction reversals
  • In human motion analysis, UnCLe's dynamic causal graphs are highly consistent with biomechanical knowledge: upper limbs dominate during takeoff → lower limbs coordinate during flight → whole-body involvement at landing
  • Skeletal connection missing rate of only 0.200 (vs. ~0.6 for GVAR/JRNGC), indicating that UnCLe preserves fundamental anatomical structure
  • Dependency matrices are the core component—removing them reduces AUROC to random chance; parameter sharing is essential for high-dimensional settings

Highlights & Insights

  • A rare contribution to dynamic causal discovery: The vast majority of existing methods only infer static causal graphs; UnCLe is among the few capable of generating time-resolved causal graphs
  • Elegant "disentanglement + linearization" design: The TCN Uncoupler transforms nonlinear dynamics into a latent space amenable to linear modeling, while the dependency matrices capture inter-variable relationships via simple linear transformations—analogous to a discrete approximation of Koopman operator theory
  • Principled choice of perturbation analysis: Temporal permutation is the optimal perturbation strategy as it simultaneously satisfies two conditions: eliminating temporal predictive information and preserving marginal distributions
  • Compelling MoCap case study: Anchoring the dynamic causal graphs to the human skeletal structure reveals causal patterns across motion phases that align fully with the biomechanics literature
  • Excellent scalability: Parameter sharing enables the model to handle 100+ variables, where most baseline methods fail or suffer severe performance degradation

Limitations & Future Work

  • Lack of identifiability guarantees: The authors explicitly acknowledge this as the primary limitation—no theoretical proof that latent linearization ensures causal faithfulness
  • Inherent assumptions of Granger causality: The framework assumes no latent confounders; spurious causal connections may arise in the presence of unobserved variables
  • Temporal perturbation must be applied separately for each variable, incurring \(O(N)\) times the computational cost of a single forward pass
  • Only uniformly sampled time series are supported; irregular time series are not handled
  • Evaluation metrics for dynamic causal graphs remain limited—quantitatively measuring "the quality of causal evolution capture" is an open challenge
  • NeuralGC (Tank et al.): A classical neural Granger causality method; component-wise design is not scalable
  • GVAR: The only prior method capable of generating dynamic causal graphs, but with far lower accuracy than UnCLe
  • CUTS+: A scalable method with parameter sharing, but limited to static causal discovery
  • Koopman theory: The idea of linearizing nonlinear dynamics inspired the design of the Uncoupler
  • Takeaway: Causal discovery can be fruitfully approached from the perspective of "disentangled representations + linear modeling"; parameter sharing and perturbation analysis constitute simple yet effective engineering choices

Rating

  • Novelty: ⭐⭐⭐⭐⭐ Dynamic causal discovery is an underexplored yet important problem; the combination of Uncoupler and perturbation analysis is novel
  • Experimental Thoroughness: ⭐⭐⭐⭐⭐ Comprehensive evaluation covering synthetic and real data, static and dynamic settings, scalability, ablations, and perturbation strategy comparisons
  • Writing Quality: ⭐⭐⭐⭐ The MoCap case study is vivid and illustrative; mathematical formulations are clear
  • Value: ⭐⭐⭐⭐⭐ Fills the gap in dynamic causal discovery; the method is concise and practical, with open-source code and data