Inferring Stochastic Dynamics with Growth from Cross-Sectional Data¶
Conference: NeurIPS 2025 arXiv: 2505.13197 Code: None Area: Computational Biology / Stochastic Dynamics Inference Keywords: Probability flow inference, Fokker-Planck equation, cell dynamics, optimal transport, branching diffusion process
TL;DR¶
This paper proposes Unbalanced Probabilistic Flow Inference (UPFI), which jointly infers the drift, diffusion, and growth rate of stochastic dynamical systems from cross-sectional data via a Lagrangian formulation of the Fokker-Planck equation, constituting the first method to accurately handle scenarios involving cell proliferation and death.
Background & Motivation¶
Single-cell RNA sequencing (scRNA-seq) is a destructive technology — each cell can only be measured at a single time point, yielding population-level cross-sectional snapshots at different times. Reconstructing the underlying dynamical system from these snapshots is a central inverse problem.
Limitations of existing methods:
- Most methods assume noise-free or isotropic constant diffusion and do not account for cell proliferation/death
- Ignoring proliferation/death leads to incorrectly inferred state transitions (e.g., erroneously linking apoptotic cells to pluripotent cells)
- Methods such as DeepRUOT incorporate growth but involve multi-stage, unstable training procedures
- The original PFI method does not handle changes in cellular mass (unbalanced transport)
Key challenge: the identifiability of drift and growth rate (fitness) — the same sequence of observed distributions may be generated by different combinations of drift and growth.
Method¶
Overall Architecture¶
UPFI adopts a two-stage training scheme:
- Offline score matching: Estimates the time-dependent score function \(\mathbf{s}_t(\mathbf{x}) \approx \nabla \log \rho_t(\mathbf{x})\) from snapshot data via denoising score matching.
- Online ODE fitting: In the Lagrangian reference frame, learns the drift \(\mathbf{v}_t\) and growth rate \(g_t\) such that the pushed-forward distribution matches the observations.
Core insight: The Fokker-Planck equation with growth can be reformulated as a \(d+1\)-dimensional ODE system (position + mass), avoiding the need to solve high-dimensional PDEs.
Key Designs¶
-
Lagrangian Formulation and Mass Equation:
- Function: Transforms the FPE with source terms into a characteristic-line ODE system, where the position evolves according to the probability flux velocity (drift minus divergence correction minus score term), and mass grows exponentially along characteristics at the growth rate.
- Mechanism: \(\frac{d\mathbf{x}_t}{dt} = \mathbf{v}_t - \nabla \cdot \mathbf{D}_t - \mathbf{D}_t \nabla \log \rho_t\), \(\frac{dm_t}{dt} = g_t(\mathbf{x}_t) m_t\)
- Design Motivation: The Lagrangian perspective reduces the PDE to an ODE, and the score function is independent of the dynamical parameters and can be precomputed offline.
-
Unbalanced Sinkhorn Divergence as Loss:
- Function: The unbalanced Sinkhorn divergence \(S_{\varepsilon,\gamma}\) is used to measure the discrepancy between the pushed-forward distribution and the observed distribution.
- Mechanism: The Sinkhorn divergence operates directly on discrete measures, allows for mass non-conservation, and requires no density estimation.
- Design Motivation: The classical Wasserstein distance requires mass conservation and is thus unsuitable for systems with growth; the Sinkhorn divergence offers favorable geometric and computational properties.
-
Wasserstein-Fisher-Rao Regularization for Identifiability:
- Function: Adds the regularization term \(\lambda \int (\|\mathbf{v}_t\|^2 + \alpha |g_t|^2) d\rho_t dt\).
- Mechanism: Analysis of the Ornstein-Uhlenbeck process demonstrates that drift and growth are not uniquely identifiable (Corollary 2.2); regularization ensures a unique solution.
- Design Motivation: Proposition 2.1 proves that even when restricting to autonomous drift, both the symmetric and antisymmetric parts of the drift matrix can be confounded with growth.
Loss & Training¶
Total loss: \(L = \sum_{i=1}^K S_{\varepsilon,\gamma}(\hat{\rho}_{t_i}, \rho_{t_i}) + \lambda(t_i - t_{i-1}) \int_{t_{i-1}}^{t_i} \int (\|\mathbf{v}_t\|^2 + \alpha |g_t|^2) d\rho_t dt\)
Theorem 2.3 proves that in the continuous-time limit, this loss has a unique minimum for OU processes. In practice, the ODE integration requires only 2–3 Euler steps.
Key Experimental Results¶
Main Results (Table)¶
Path Energy Distance on the bistable system (lower is better):
| Dimension \(d\) | UPFI | PFI | fitness-ODE | TIGON++ | DeepRUOT | OTFM | UOTFM |
|---|---|---|---|---|---|---|---|
| 2 | 0.14±0.09 | 1.41±0.16 | 0.30±0.18 | 0.46±0.12 | 2.15±0.01 | 1.16±0.13 | 0.42±0.13 |
| 5 | 0.04±0.03 | 1.34±0.06 | 0.30±0.14 | 0.63±0.16 | 0.47±0.04 | 1.07±0.11 | 0.36±0.10 |
| 10 | 0.05±0.04 | 1.03±0.18 | 0.29±0.15 | 0.61±0.06 | 1.32±0.05 | 1.09±0.19 | 0.38±0.08 |
| 50 | 0.15±0.02 | — | — | — | — | — | — |
Ablation Study¶
- PFI, which does not account for growth, infers incorrect cross-branch streamlines on the bistable system (errors exceeding 10×).
- OU process validation: the correctness of UPFI is verified in a linear-quadratic setting with known analytical solutions.
- Effect of regularization strength \(\lambda\) on drift-growth separation: too small leads to non-identifiability; too large causes underfitting.
Key Findings¶
- UPFI consistently achieves the best or near-best performance across all dimensions and all baselines.
- PFI without growth modeling systematically produces incorrect particle flow directions, erroneously connecting the two branches.
- Computational complexity: \(O(B^2)\) for Sinkhorn divergence and \(O(Bd)\) for score matching, enabling moderate high-dimensional scalability.
- The method also demonstrates strong performance on real scRNA-seq data (hematopoietic stem cell differentiation).
Highlights & Insights¶
- Theoretical contribution: Proposition 2.1 and Corollary 2.2 formally characterize the drift-growth identifiability problem for the first time.
- Streamlined architecture: The two-stage training (score matching + ODE fitting) is more stable than multi-stage approaches such as DeepRUOT.
- Physical interpretability: The Lagrangian formulation preserves physical interpretability; the inferred drift field and growth rate carry biological meaning.
- Theorem 2.3 establishes that regularized training yields a unique solution in the OU setting.
Limitations & Future Work¶
- Identifiability of drift and growth in the nonlinear setting remains unresolved; regularization introduces an inductive bias.
- Score matching may be inaccurate in high-dimensional, sparse data regimes.
- The diffusion coefficient \(\mathbf{D}_t\) is assumed known, which is generally not the case in practice.
- Computational scalability is constrained by ODE integration and the \(O(B^2)\) complexity of the Sinkhorn divergence.
- Practical data challenges such as batch effects are not addressed.
Related Work & Insights¶
- PFI (Zhang & Chardès 2023): The direct predecessor of this work, but without growth handling.
- Waddington-OT (Schiebinger et al. 2019): Handles growth via unbalanced optimal transport but does not infer drift.
- TIGON++ / fitness-ODE: Account for growth but assume deterministic dynamics or a global fitness function.
- This work bridges stochastic analysis, optimal transport, and biological dynamics inference.
Rating¶
⭐⭐⭐⭐ — Theoretically rigorous, methodologically concise and effective; the first work to systematically address stochastic dynamics inference with growth.