Learning Patient-Specific Disease Dynamics with Latent Flow Matching for Longitudinal Imaging Generation¶
Conference: ICLR 2026 arXiv: 2512.09185 Code: Unavailable Area: Medical Imaging / Disease Progression Modeling Keywords: disease progression, flow matching, patient-specific, longitudinal MRI, ArcRank loss
TL;DR¶
This paper proposes the Δ-LFM framework, which employs an ArcRank loss to construct patient-specific, temporally aligned trajectories in latent space (directionally consistent and monotonically increasing in magnitude). The framework extends the flow matching time range from \([0,1]\) to \([0,T]\) (actual time intervals) to enable prediction at arbitrary time points. Δ-LFM comprehensively outperforms eight baseline methods across three Alzheimer's longitudinal MRI benchmarks and introduces a progression-specific evaluation metric, Δ-RMAE.
Background & Motivation¶
Background: Disease progression modeling is critical for early diagnosis and personalized treatment. The evolution from GANs to diffusion models has enabled higher-fidelity longitudinal medical image generation, yet most approaches capture only population-level trends.
Limitations of Prior Work: (1) Most models neglect individual heterogeneity—progression rates vary substantially across patients with the same disease; (2) the stochastic denoising process of diffusion models disrupts temporal continuity; (3) autoencoder latent spaces are misaligned across patients and uncorrelated with clinical severity indices; (4) conventional image quality metrics (PSNR/SSIM) are inflated in longitudinal settings—scans from the same patient are naturally highly similar, causing subtle disease-related changes to be masked by normal anatomy.
Key Challenge: Longitudinal image generation must simultaneously satisfy high fidelity (image quality) and high accuracy (correct progression direction); existing methods prioritize the former at the expense of the latter.
Goal: To construct a patient-specific generative framework in which the latent space is semantically meaningful, predictions at arbitrary time points are supported, and the direction of progression is correctly captured.
Key Insight: Disease progression in latent space can be modeled as a velocity field—flow matching naturally learns velocity fields from source to target, making it conceptually aligned with disease dynamics.
Core Idea: ArcRank constraints enforce a "straight-line" latent trajectory per patient (constant direction, increasing magnitude), and Δ-LFM advances along this line using real-valued time steps.
Method¶
Overall Architecture¶
The framework consists of two stages. Stage 1: a VAE with ArcRank loss constructs the patient-specific latent space. Stage 2: a 3D U-Net learns the velocity field within the latent space via flow matching, with conditional signals injected through AdaLN.
Key Designs¶
- ArcRank Loss — Patient Trajectory Latent Alignment
- Function: Forces latent representations of the same patient at different time points to align along a specific direction, with magnitude monotonically increasing over time.
- Mechanism: SVD decomposition \(U\Sigma V^\top = \text{SVD}(\mathbf{z})\) is applied to the latent vector \(\mathbf{z}\), where \(U\) encodes direction (angle) and \(\Sigma\) encodes magnitude (severity). The ArcRank loss is defined as: $\(\mathcal{L}_{\text{ArcRank}} = \lambda_{\text{arc}} \sum_{i<j} |U_i - U_j| + \lambda_{\text{rank}} \sum_{i<j} \max(0, m - (\Sigma_j - \Sigma_i)), \quad t_i < t_j\)$ A pull term \(\mathcal{L}_{\text{Pull}} = |\Sigma_j - \Sigma_i|\) is added to prevent excessive separation between adjacent time points.
-
Design Motivation: SVD jointly handles direction and magnitude in a unified manner, providing greater stability than separately applying cosine similarity and absolute value; stop-gradient is used to stabilize training.
-
Δ-LFM — Temporally Semantic Flow Matching
- Function: Learns a patient-specific continuous-time velocity field in latent space, supporting prediction at arbitrary future time points.
- Mechanism: The standard flow matching time range \([0,1]\) is extended to \([0,T]\), where \(T = t_j - t_i\) denotes the actual interval in years. The target velocity is \(v^*(i,j) = (\mathbf{z}_j - \mathbf{z}_i)/(t_j - t_i)\); at inference, integration proceeds with step size \(\text{d}t = 0.01\): \(\mathbf{z}_{i+\text{d}t} = \mathbf{z}_i + \text{d}t \cdot v_\theta(\mathbf{z}_i, t_i)\).
-
Design Motivation: Normalizing to \([0,1]\) discards actual temporal semantics; the \([0,T]\) parameterization directly supports queries such as "predict MRI three years from now."
-
Δ-RMAE Evaluation Metric
- Function: Assesses the accuracy of the progression direction in generated images rather than absolute image quality.
- Mechanism: A residual metric defined as \(\Delta\text{-RMAE} = \frac{|\Delta_{\text{gt}} - \Delta_{\text{gen}}|}{(\frac{1}{2}(|\Delta_{\text{gt}}| + |\Delta_{\text{gen}}|))} \in [0, 2]\), where \(\Delta = \mathbf{x}_T - \mathbf{x}_0\).
- Design Motivation: Conventional PSNR/SSIM are inflated in longitudinal settings (a model that simply copies the baseline scan scores well); Δ-RMAE focuses exclusively on disease-induced change.
Loss & Training¶
Stage 1 (AE): Reconstruction loss + ArcRank, with \(\lambda_{\text{arc}}=0.005\), \(\lambda_{\text{rank}}=0.01\), and margin \(m\). AdamW optimizer, lr=\(10^{-3}\), batch size=2, 300 epochs. Stage 2 (FM): \(\mathcal{L}_{\text{LFM}} = \sum_{i<j} |v_\theta(i,j) - v^*(i,j)|^2\). 3D U-Net, AdamW, lr=\(3 \times 10^{-5}\), batch size=4, 200 epochs. Conditional signals (age, sex, clinical status) are injected via AdaLN.
Key Experimental Results¶
Main Results — Image Quality (3 Longitudinal MRI Benchmarks, mean±std)¶
| Method | ADNI PSNR↑ | ADNI SSIM↑ | AIBL PSNR↑ | OASIS PSNR↑ |
|---|---|---|---|---|
| CardiacAging | 27.78±1.49 | 92.04 | 28.41 | 26.23 |
| DiffuseMorph | 29.56±1.63 | 93.57 | 29.17 | 28.13 |
| SADM | 26.94±2.28 | 85.15 | 27.97 | 26.74 |
| BrLP | 28.51±1.77 | 91.52 | 28.96 | 27.98 |
| MambaControl | 29.72±1.04 | 93.60 | 29.86 | 28.24 |
| Δ-LFM | 30.59±0.89 | 94.62 | 30.52 | 29.01 |
Main Results — Progression Accuracy (Region MAE + Δ-RMAE)¶
| Method | ADNI Δ-RMAE↓ | AIBL Δ-RMAE↓ | OASIS Δ-RMAE↓ |
|---|---|---|---|
| DiffuseMorph | 0.516 | 0.482 | 0.503 |
| BrLP | 0.630 | 0.594 | 0.622 |
| MambaControl | 0.554 | 0.525 | 0.561 |
| Δ-LFM | 0.436 | 0.417 | 0.473 |
Δ-RMAE is reduced by approximately 21%/21%/16% relative to MambaControl.
Ablation Study (Average over 3 Datasets)¶
| Configuration | PSNR↑ | Δ-RMAE↓ | Notes |
|---|---|---|---|
| LFM Baseline (unconditional, [0,1]) | 27.59 | 0.552 | Worst |
| + Conditional information | 28.46 | 0.486 | Conditioning signals matter |
| + [0,T] time sampling | 28.78 | 0.472 | Temporal semantics are effective |
| + Arc Loss only | 29.52 | 0.457 | Directional constraint is most important |
| + Rank Loss only | 28.36 | 0.474 | Ranking alone is weaker |
| + ArcRank + [0,T] (full) | 30.04 | 0.442 | Components are synergistic |
Key Findings¶
- t-SNE visualization of the ArcRank latent space reveals: (1) scans from the same patient cluster together; (2) diagnostic status (CN/MCI/AD) naturally separates into distinct groups—despite no diagnostic labels being used during training.
- Long-term prediction performance degrades gracefully: PSNR of 31–32 dB at 1–5 years, ~28.6 dB at 10 years, and ~27 dB at 13 years.
- The SVD computation in ArcRank introduces ~40% training time overhead; using
full_matrices=Falsereduces per-call time from 0.055 s to 0.009 s (6× speedup).
Highlights & Insights¶
- "Disease as a velocity field" modeling perspective: Rather than generating future snapshots, the model learns continuous dynamics of the change process—the velocity field in flow matching is conceptually aligned with disease progression.
- Dual design of ArcRank: SVD unifies direction (patient identity) and magnitude (disease severity) along two fundamentally distinct axes—an elegant and parsimonious formulation.
- Δ-RMAE fills an evaluation blind spot: Conventional metrics fail in longitudinal settings (a model that "copies the baseline" still scores high); Δ-RMAE forces models to genuinely capture change rather than remain static.
- Unsupervised emergence of diagnostic states: ArcRank constrains only temporal ordering and directional consistency, yet naturally learns the CN→MCI→AD severity gradient—demonstrating the power of well-chosen inductive biases.
Limitations & Future Work¶
- Validation is limited to Alzheimer's disease; rapidly progressing conditions or diseases involving treatment intervention (e.g., brain tumors) may require different modeling assumptions.
- The linear trajectory assumption (straight-line progression in latent space) may fail to capture nonlinear patterns such as sudden deterioration or stable plateaus.
- Irregular scan intervals are only partially addressed through conditioning signals; changes in progression rate are not explicitly modeled.
- Dataset heterogeneity (multi-scanner/protocol variation) is mitigated only through preprocessing, without dedicated harmonization techniques.
- AE capacity is constrained by GPU memory (48 GB A6000); larger crops or deeper architectures may yield further improvements.
Related Work & Insights¶
- vs. BrLP (Puglisi et al. 2024): BrLP achieves partial personalization via ControlNet conditioned on volumetric ratios, but the conditioning is coarse; Δ-LFM enables finer individual trajectory modeling through ArcRank in latent space.
- vs. TADM (Litrico et al. 2024): TADM predicts residual images but relies on diffusion-based denoising, which disrupts temporal continuity; Δ-LFM preserves continuity naturally through flow matching.
- vs. ImageFlowNet (Liu et al. 2025): ImageFlowNet also employs flow fields but operates in image space; Δ-LFM is more efficient in latent space and additionally supports ArcRank trajectory alignment.
Rating¶
- Novelty: ⭐⭐⭐⭐⭐ — Flow matching for disease progression, ArcRank latent alignment, and the Δ-RMAE evaluation metric constitute three distinct contributions.
- Experimental Thoroughness: ⭐⭐⭐⭐ — Three benchmarks (ADNI/AIBL/OASIS), eight comparison methods, detailed ablations, and long-term prediction analysis.
- Writing Quality: ⭐⭐⭐⭐ — Motivation is clearly articulated, derivations are concise, and visualizations are convincing.
- Value: ⭐⭐⭐⭐⭐ — Significant contribution to medical image generation and disease progression modeling; Δ-RMAE has potential to become a standard metric in the field.