Skip to content

Accelerating Stroke MRI with Diffusion Probabilistic Models through Large-Scale Pre-training and Target-Specific Fine-Tuning

Conference: CVPR 2026 arXiv: 2603.13007 Code: None Area: Medical Imaging / MRI Reconstruction Keywords: accelerated MRI, diffusion probabilistic models, foundation model, stroke MRI, fine-tuning

TL;DR

Inspired by the foundation model paradigm, this work proposes a data-efficient training strategy for diffusion probabilistic models (DPMs) in accelerated MRI reconstruction. A DPM is first pre-trained on large-scale multi-contrast brain MRI data (~4,000 subjects), then fine-tuned with as few as 20 target-domain subjects. The resulting model achieves reconstruction quality comparable to large-dataset training in clinical stroke MRI, with a clinical blind reader study confirming non-inferiority to standard-of-care at 2× acceleration.

Background & Motivation

Background: Machine learning has achieved state-of-the-art performance in accelerated MRI reconstruction, but these methods typically require large amounts of fully-sampled, application-specific training data. Performance degrades substantially when target-domain data is scarce.

Limitations of Prior Work: (1) Stroke MRI represents a prototypical data-scarce scenario—while MRI is more sensitive than CT for detecting ischemic stroke, long scan times and motion sensitivity delay treatment; (2) existing self-supervised and few-data methods fall short of fully supervised baselines in reconstruction quality; (3) end-to-end reconstruction methods require the external training dataset and the target dataset to share the same acquisition model (sampling pattern, coil geometry, etc.), limiting cross-domain transferability.

Key Challenge: High-quality accelerated MRI reconstruction requires large amounts of target-domain fully-sampled data for training, yet clinically specific scenarios such as stroke imaging are precisely where such data are scarce.

Goal: To train a high-quality DPM-based accelerated MRI reconstruction model when target-domain fully-sampled data is extremely limited (20–25 subjects).

Key Insight: The acquisition-model-agnostic nature of DPMs is exploited—since DPMs learn an image prior distribution rather than an end-to-end mapping, pre-training data may originate from different sampling patterns and coil geometries than those used at inference.

Core Idea: Large-scale multi-contrast pre-training + carefully controlled fine-tuning (learning rate reduced by one order of magnitude + very few epochs) = high-quality DPM reconstruction in data-limited clinical stroke MRI.

Method

Overall Architecture

A two-stage training strategy is employed: (1) a DPM is pre-trained on multi-contrast brain MRI (T1, T2, T1-post) from ~4,000 subjects in the fastMRI dataset; (2) the model is fine-tuned on a small target-domain dataset (20–25 stroke patients) using a reduced learning rate (\(1 \times 10^{-5}\)) and a limited number of training epochs (~650, approximately 2% of pre-training). At inference, Diffusion Posterior Sampling (DPS) is used to reconstruct images from undersampled k-space data.

Key Designs

  1. Contrast-Conditioned DPM Architecture:

    • Function: Enables a single DPM to learn the image distributions of multiple MRI contrasts simultaneously.
    • Mechanism: Each contrast type is assigned a one-hot vector, which is mapped to an embedding vector via a small fully connected network; every block of the U-Net receives this embedding as an additional input. The model learns the score function \(\nabla_{x_t} \log p_t(x)\), where \(p_t(x)\) is the image distribution perturbed by Gaussian noise \(\sigma_t\). The DPM and the embedding network are trained jointly.
    • Design Motivation: Heterogeneous multi-contrast data requires the model to distinguish contrast-specific image characteristics; conditioning allows the model to adapt its behavior according to the contrast type.
  2. Controlled Fine-Tuning Strategy:

    • Function: Effectively adapts the model to the target domain with very limited data without overfitting.
    • Mechanism: The learning rate is reduced by one order of magnitude relative to pre-training (\(10^{-4}\)\(10^{-5}\)), and training is limited to ~650 epochs (~2% of pre-training). A new one-hot vector is assigned to the target contrast, and the embedding network weights are updated. Experiments show that insufficient fine-tuning leads to inadequate contrast adaptation, while excessive fine-tuning leads to overfitting—an optimal fine-tuning window exists.
    • Design Motivation: The central challenge in foundation model fine-tuning is balancing adaptation and forgetting; a low learning rate combined with short training time prevents catastrophic forgetting while achieving contrast adaptation.
  3. Diffusion Posterior Sampling (DPS) Reconstruction:

    • Function: Reconstructs images from undersampled k-space measurements using the learned image prior.
    • Mechanism: The following ODE is solved: \(d\mathbf{x} = [-t(\nabla_{\mathbf{x}} \|\mathbf{PFS}\tilde{\mathbf{x}}(\mathbf{x}) - \mathbf{y}\|_2^2 + D_\theta(\mathbf{x}, t))]dt\), where the first term enforces data consistency (driving the reconstruction toward agreement with measurements) and the second term represents the learned prior score. The data consistency weight \(\zeta\) is a critical hyperparameter—higher acceleration factors require larger \(\zeta\).
    • Design Motivation: The acquisition-model-agnostic property of DPMs is a key advantage—unlike end-to-end methods, DPMs learn the image distribution alone and can be combined with any acquisition model at inference via posterior sampling.

Loss & Training

  • Both pre-training and fine-tuning use the EDM training loss (score matching).
  • Fixed data augmentation strategies and noise level schedules are applied throughout.
  • The posterior sampling weight \(\zeta\) scales with the acceleration factor (fewer k-space measurements → larger data consistency weight).

Key Experimental Results

Main Results

FastMRI FLAIR reconstruction NRMSE (pre-trained on T1/T2/T1-post; fine-tuned on 20 FLAIR subjects vs. training on 344 FLAIR subjects):

Method Data Size R=4 R=5 R=6 Notes
Method 1 (Full dataset) 4,125 Best Best Best Upper bound
Method 3 (344 FLAIR) 344 Near best Near best Near best Upper bound
Method 4 (Ours) 20 FLAIR Comparable Comparable Comparable Only 5.8% of data
Method 5 (20 FLAIR only) 20 Significantly worse Significantly worse Significantly worse No pre-training
Method 6 (Joint training) 4,000+20 Worse Worse Worse Inferior to sequential fine-tuning

Ablation Study

Effect of fine-tuning hyperparameters on NRMSE:

Learning Rate 650 epochs 1,250 epochs 2,500 epochs Notes
\(5 \times 10^{-4}\) Moderate Overfitting Severe overfitting LR too high
\(1 \times 10^{-4}\) Good Slight overfitting Overfitting
\(5 \times 10^{-5}\) Good Good Slight overfitting
\(1 \times 10^{-5}\) Best Good Slightly worse Optimal configuration

Key Findings

  • Clinical reader study (80 patients, 2 neuroradiologists): images reconstructed by the DPM from 2× accelerated data show no significant difference from standard-of-care in structural delineation and overall image quality.
  • Reader 1 (80 patients): DPM reconstruction is significantly superior to standard-of-care on multiple metrics (SNR, sharpness, artifacts).
  • Reader 2 (21 patients): standard-of-care is significantly superior to DPM only in SNR; no significant difference on remaining metrics.
  • Sequential fine-tuning is substantially superior to joint training—when target-domain data are scarce, sequential fine-tuning is the more effective strategy.

Highlights & Insights

  • The strategy is remarkably simple yet highly effective—"large-scale pre-training + reduced-learning-rate fine-tuning" is systematically validated in MRI reconstruction for the first time.
  • The clinical reader study provides strong evidence—not only is NRMSE improved, but clinical acceptability is genuinely verified.
  • The acquisition-model-agnostic nature of DPMs is a decisive advantage—pre-training and target data may employ different sampling patterns and coil geometries.
  • Prospective undersampling experiments further confirm the reliability of retrospective results.

Limitations & Future Work

  • Clinical evaluation relies on retrospective undersampling; prospective validation is limited to healthy volunteers.
  • DPS reconstruction is substantially slower than conventional parallel imaging and end-to-end methods, constraining clinical deployment.
  • Fine-tuning hyperparameters (learning rate, number of epochs) require separate tuning for each contrast type.
  • Inter-reader agreement between the two radiologists is only "slight" to "fair," reflecting the inherent variability of subjective assessments.
  • fastMRI: Provides the large-scale pre-training dataset.
  • DPS (Chung et al.): Establishes the posterior sampling framework.
  • Foundation model paradigm (Bommasani et al.): Motivates the pre-training and fine-tuning strategy.
  • Insight: This strategy is directly transferable to other data-scarce medical image reconstruction tasks (e.g., cardiac MRI, musculoskeletal MRI).

Rating

  • Novelty: ⭐⭐⭐ The method itself (pre-training + fine-tuning) is not novel, but its systematic validation in MRI reconstruction is valuable.
  • Experimental Thoroughness: ⭐⭐⭐⭐⭐ Encompasses fastMRI controlled experiments + clinical stroke data + an 80-patient blind reader study + prospective experiments—exceptionally comprehensive.
  • Writing Quality: ⭐⭐⭐⭐ Clinically oriented writing style with rigorous experimental design.
  • Value: ⭐⭐⭐⭐ Directly advances the clinical deployment of DPMs for accelerated MRI; the reader study confirms clinical feasibility.