Dynamic Causal Discovery in Alzheimer's Disease through Latent Pseudotime Modelling¶
Conference: NEURIPS2025 arXiv: 2511.04619 Code: To be confirmed Area: Medical Imaging Keywords: Causal discovery, Alzheimer's disease, Bayesian networks, pseudotime, time-varying causal graphs, biomarkers
TL;DR¶
This paper applies BN-LTE (Bayesian Network with Latent Time Embedding) to real-world ADNI data from AD patients to infer dynamic causal graphs that evolve along a disease pseudotime axis. The learned pseudotime achieves a diagnostic AUC of 0.82, substantially outperforming chronological age (AUC 0.59), and reveals dynamic causal relationships between emerging biomarkers NfL/GFAP and established AD markers.
Background & Motivation¶
Background: Approximately $380 billion is invested annually in Alzheimer's disease (AD) research, yet clinical trials continue to fail. A fundamental reason is that the causal relationships among the thousands of pathways involved in AD remain poorly understood. Causal inference provides a powerful framework for elucidating these relationships.
Limitations of Prior Work: - Most causal discovery methods assume a static causal graph, whereas the pathophysiological processes of AD are dynamically evolving—causal relationships differ across disease stages. - Core assumptions (acyclicity, no unobserved confounders) are frequently violated or untestable in medical data. - Individual rates of disease progression vary due to latent factors such as cognitive reserve, meaning chronological age does not equal disease stage. - The causal relationships between emerging plasma biomarkers NfL and GFAP and established AD markers (Aβ, pTau) remain unclear.
Key Challenge: Inter-individual variability in disease progression rates complicates time-series analysis—patients of the same age may be at entirely different disease stages. Cross-sectional data cannot directly capture disease dynamics.
Goal: - Infer a data-driven "pseudotime" to order patients along their disease progression trajectory. - Learn how causal relationships evolve as a function of pseudotime. - Integrate the dynamic causal interactions between novel and established biomarkers.
Key Insight: The paper leverages the BN-LTE model (Zhou et al. 2023), treating pseudotime as a latent variable that modulates causal mechanisms, and infers dynamic causal graphs from cross-sectional ADNI data.
Core Idea: Order patients using a latent pseudotime and learn a causal graph that evolves with disease progression, thereby revealing dynamic causal relationships among AD biomarkers.
Method¶
Overall Architecture¶
- Input: Cross-sectional ADNI data from 380 patients (48 AD, 117 MCI, 215 CN), comprising 16 variables (demographics, regional brain volumes, plasma biomarkers, cognitive scores).
- Model: BN-LTE (Bayesian Network with Latent Time Embedding), with posterior inference via MCMC sampling.
- Output: (1) A disease pseudotime \(Z\) for each patient; (2) A pseudotime-varying causal graph \(G(Z)\).
Key Designs¶
-
Pseudotime Model:
- Function: Replace chronological age with a data-driven latent variable \(Z\) to order patients.
- Mechanism: The conditional distribution of each variable is modelled as \(X_j = a_j(Z) + \sum_l b_{jl}(Z) X_l + \epsilon_j\), where \(a_j(Z)\) is a baseline trajectory function (the natural progression of a marker along pseudotime) and \(b_{jl}(Z)\) is a pseudotime-dependent causal effect coefficient; both are parameterised using cubic B-splines.
- Design Motivation: Age \(\neq\) disease stage—factors such as cognitive reserve cause considerable variation in progression among age-matched individuals. The identifiability of pseudotime \(Z\) is theoretically guaranteed under the condition that causal relationships vary along this axis.
-
Background Knowledge Constraints:
- Function: Incorporate minimal, disease-agnostic prior knowledge.
- Mechanism: (1) Root nodes: immutable variables (sex, APOE genotype) cannot have incoming edges. (2) Sink nodes: cognitive scores cannot have outgoing edges (in the elderly ADNI cohort, reverse effects of cognition on other variables are negligible).
- Design Motivation: In real-world data where model assumptions may be violated, disease-agnostic background knowledge substantially improves graph recovery (Table 2: directional precision increases from 62% to 96%), while avoiding the introduction of subjective biases regarding disease mechanisms.
-
MCMC Posterior Inference:
- Function: Estimate the posterior distributions of pseudotime and the causal graph.
- Mechanism: Four chains × 5,000 iterations (1,000 burn-in). Posterior inclusion probability (PIP) is used as a confidence measure for causal edges; the final causal graph is constructed by thresholding at PIP ≥ 0.5.
- Design Motivation: The Bayesian approach naturally provides uncertainty quantification, and PIP avoids hard binary decisions about edge presence.
Loss & Training¶
- Gaussian likelihood model: \(\epsilon_j \sim \mathcal{N}(0, \sigma_j^2)\)
- Cubic B-spline parameterisation with 5 knots
- The Coulomb prior used in the original BN-LTE is removed, as AD patients are not uniformly distributed across disease stages
Key Experimental Results¶
Main Results — Diagnostic Predictive Power: Pseudotime vs. Age¶
| Predictor | AUC | p-value | Note |
|---|---|---|---|
| Pseudotime \(Z\) | 0.82 (95% CI: 0.81, 0.82) | <0.001 | Strong predictive power |
| Age | 0.59 | <0.01 | Weak predictive power |
Ablation Study — Effect of Background Knowledge on Graph Recovery¶
| Configuration | Edge Presence Precision | Edge Presence Recall | Direction Precision | Direction Recall | SHD |
|---|---|---|---|---|---|
| No background knowledge | 0.80 | 0.16 | 0.62 | 0.50 | 67 |
| + Root node constraints | 0.72 | 0.35 | 0.89 | 0.84 | 53 |
| + Root + Sink nodes | 0.88 | 0.45 | 0.96 | 0.88 | 41 |
Key Causal Findings¶
| Causal Edge | PIP (with background knowledge) | Literature Consistency |
|---|---|---|
| pTau217 → GFAP | 0.80 | Possible/Unknown |
| Aβ42 → Aβ40 | 0.75 | Confirmed |
| pTau217 → NfL | 0.57 | Possible |
| NfL → Hippocampus | 0.53 | Possible |
| Aβ42 → NfL | 0.46 | Possible |
Key Findings¶
- Pseudotime ordering is consistent with disease severity: Figure 1 shows that CN patients cluster at early pseudotime, MCI patients at intermediate values, and AD patients at late pseudotime; biomarker trajectories including declining hippocampal volume and rising NfL and GFAP are consistent with known AD pathology.
- Causal relationships change dynamically: The influence of pTau on NfL emerges at early pseudotime—consistent with the consensus that pTau effects precede neurodegeneration—whereas the influence of age on GFAP remains constant throughout the disease course.
- Background knowledge yields substantial gains: Imposing only two disease-agnostic constraints—that sex/APOE are not influenced by other variables and that cognitive scores do not influence other variables—improves directional precision from 62% to 96%.
- Inconsistencies are also identified: The inferred edges pTau → GFAP and NfL → Aβ40 conflict with the literature, which holds that amyloid pathology precedes tau pathology, highlighting remaining limitations of the model and data.
Highlights & Insights¶
- Transfer of the pseudotime concept from single-cell biology to clinical disease modelling: Pseudotime is widely used in single-cell RNA-seq for cell trajectory inference; this paper transfers the concept to patient-level disease progression modelling, elegantly addressing the inability of cross-sectional data to directly capture dynamics.
- Outsized impact of disease-agnostic background knowledge: Without any expert knowledge of AD mechanisms, simply encoding "immutable variables are root nodes" and "cognitive scores are sink nodes" raises directional precision from 62% to 96%—a finding with important implications for the practical application of causal discovery.
- Clinical value of dynamic causal graphs: The fact that causal relationships vary across disease stages implies that the timing of combination therapies may need to be tailored to a patient's disease stage—an insight with direct relevance to clinical trial design.
- Causal positioning of novel biomarkers: This paper provides the first causal-framework analysis of the dynamic interactions between NfL and GFAP and traditional AD biomarkers; the early emergence of the pTau → NfL edge offers a causal rationale for the clinical interpretation of these emerging markers.
Limitations & Future Work¶
- Strong assumptions: The model assumes causal sufficiency (no unobserved confounders) and faithfulness, both of which are likely violated in medical data.
- Limited sample size: With only 380 patients, certain subgroups (e.g., 48 AD patients) have insufficient statistical power.
- Uni-dimensional pseudotime: Compressing disease progression into a one-dimensional scalar may be inadequate for the true heterogeneity of AD, which may require a multi-dimensional representation.
- Consensus graph as ground truth: The literature-derived consensus graph may itself be incomplete or contested, with the directionality of some edges unknown.
- Longitudinal data not utilised: ADNI contains longitudinal follow-up data that were not used; longitudinal analysis could validate the predictive validity of the pseudotime model.
- Future directions:
- Relax the causal sufficiency assumption and model unobserved confounders (e.g., via FCI-based methods).
- Extend to multi-dimensional pseudotime (multiple latent progression factors).
- Cross-cohort validation (multi-dataset causal discovery).
- Use longitudinal data to validate dynamic causal relationships.
Related Work & Insights¶
- vs. static causal graph methods: Classical methods such as PC and GES produce a single fixed graph and cannot capture changes in causal relationships during disease progression. The dynamic graph of BN-LTE represents a qualitative advancement.
- vs. Zhou et al. (2023): This paper constitutes the first application of BN-LTE to real AD data, contributing the discovery of the substantial value of disease-agnostic background knowledge and providing a causal analysis of NfL and GFAP.
- vs. time-series causal discovery: Methods such as Granger causality require longitudinal data; this paper infers dynamic relationships from cross-sectional data, making it applicable to a broader range of clinical settings.
Rating¶
- Novelty: ⭐⭐⭐⭐ — First systematic application of pseudotime combined with dynamic causal discovery in AD; the disease-agnostic background knowledge strategy has methodological value.
- Experimental Thoroughness: ⭐⭐⭐⭐ — Ablations across multiple configurations, quantitative comparison against a consensus graph, and MCMC convergence diagnostics.
- Writing Quality: ⭐⭐⭐⭐ — Clinical motivation and methodological descriptions are clear; both findings and inconsistencies are discussed candidly.
- Value: ⭐⭐⭐⭐⭐ — Substantive contributions to both AD research and causal discovery methodology, with strong translational potential.