Accelerating Stroke MRI with Diffusion Probabilistic Models through Large-Scale Pre-training and Target-Specific Fine-Tuning¶

Conference: CVPR 2026 arXiv: 2603.13007 Code: None Area: Medical Imaging / MRI Reconstruction Keywords: accelerated MRI, diffusion probabilistic models, foundation model, stroke MRI, fine-tuning

TL;DR¶

Inspired by the foundation model paradigm, this work proposes a data-efficient training strategy for diffusion probabilistic models (DPMs) in accelerated MRI reconstruction. A DPM is first pre-trained on large-scale multi-contrast brain MRI data (~4,000 subjects), then fine-tuned with as few as 20 target-domain subjects. The resulting model achieves reconstruction quality comparable to large-dataset training in clinical stroke MRI, with a clinical blind reader study confirming non-inferiority to standard-of-care at 2× acceleration.

Background & Motivation¶

Background: Machine learning has achieved state-of-the-art performance in accelerated MRI reconstruction, but these methods typically require large amounts of fully-sampled, application-specific training data. Performance degrades substantially when target-domain data is scarce.

Limitations of Prior Work: (1) Stroke MRI represents a prototypical data-scarce scenario—while MRI is more sensitive than CT for detecting ischemic stroke, long scan times and motion sensitivity delay treatment; (2) existing self-supervised and few-data methods fall short of fully supervised baselines in reconstruction quality; (3) end-to-end reconstruction methods require the external training dataset and the target dataset to share the same acquisition model (sampling pattern, coil geometry, etc.), limiting cross-domain transferability.

Key Challenge: High-quality accelerated MRI reconstruction requires large amounts of target-domain fully-sampled data for training, yet clinically specific scenarios such as stroke imaging are precisely where such data are scarce.

Goal: To train a high-quality DPM-based accelerated MRI reconstruction model when target-domain fully-sampled data is extremely limited (20–25 subjects).

Key Insight: The acquisition-model-agnostic nature of DPMs is exploited—since DPMs learn an image prior distribution rather than an end-to-end mapping, pre-training data may originate from different sampling patterns and coil geometries than those used at inference.

Core Idea: Large-scale multi-contrast pre-training + carefully controlled fine-tuning (learning rate reduced by one order of magnitude + very few epochs) = high-quality DPM reconstruction in data-limited clinical stroke MRI.

Method¶

Overall Architecture¶

A two-stage training strategy is employed: (1) a DPM is pre-trained on multi-contrast brain MRI (T1, T2, T1-post) from ~4,000 subjects in the fastMRI dataset; (2) the model is fine-tuned on a small target-domain dataset (20–25 stroke patients) using a reduced learning rate (\(1 \times 10^{-5}\)) and a limited number of training epochs (~650, approximately 2% of pre-training). At inference, Diffusion Posterior Sampling (DPS) is used to reconstruct images from undersampled k-space data.

Key Designs¶

Contrast-Conditioned DPM Architecture:
- Function: Enables a single DPM to learn the image distributions of multiple MRI contrasts simultaneously.
- Mechanism: Each contrast type is assigned a one-hot vector, which is mapped to an embedding vector via a small fully connected network; every block of the U-Net receives this embedding as an additional input. The model learns the score function \(\nabla_{x_t} \log p_t(x)\), where \(p_t(x)\) is the image distribution perturbed by Gaussian noise \(\sigma_t\). The DPM and the embedding network are trained jointly.
- Design Motivation: Heterogeneous multi-contrast data requires the model to distinguish contrast-specific image characteristics; conditioning allows the model to adapt its behavior according to the contrast type.
Controlled Fine-Tuning Strategy:
- Function: Effectively adapts the model to the target domain with very limited data without overfitting.
- Mechanism: The learning rate is reduced by one order of magnitude relative to pre-training (\(10^{-4}\) → \(10^{-5}\)), and training is limited to ~650 epochs (~2% of pre-training). A new one-hot vector is assigned to the target contrast, and the embedding network weights are updated. Experiments show that insufficient fine-tuning leads to inadequate contrast adaptation, while excessive fine-tuning leads to overfitting—an optimal fine-tuning window exists.
- Design Motivation: The central challenge in foundation model fine-tuning is balancing adaptation and forgetting; a low learning rate combined with short training time prevents catastrophic forgetting while achieving contrast adaptation.
Diffusion Posterior Sampling (DPS) Reconstruction:
- Function: Reconstructs images from undersampled k-space measurements using the learned image prior.
- Mechanism: The following ODE is solved: \(d\mathbf{x} = [-t(\nabla_{\mathbf{x}} \|\mathbf{PFS}\tilde{\mathbf{x}}(\mathbf{x}) - \mathbf{y}\|_2^2 + D_\theta(\mathbf{x}, t))]dt\), where the first term enforces data consistency (driving the reconstruction toward agreement with measurements) and the second term represents the learned prior score. The data consistency weight \(\zeta\) is a critical hyperparameter—higher acceleration factors require larger \(\zeta\).
- Design Motivation: The acquisition-model-agnostic property of DPMs is a key advantage—unlike end-to-end methods, DPMs learn the image distribution alone and can be combined with any acquisition model at inference via posterior sampling.

Loss & Training¶

Both pre-training and fine-tuning use the EDM training loss (score matching).
Fixed data augmentation strategies and noise level schedules are applied throughout.
The posterior sampling weight \(\zeta\) scales with the acceleration factor (fewer k-space measurements → larger data consistency weight).

Key Experimental Results¶

Main Results¶

FastMRI FLAIR reconstruction NRMSE (pre-trained on T1/T2/T1-post; fine-tuned on 20 FLAIR subjects vs. training on 344 FLAIR subjects):

Method	Data Size	R=4	R=5	R=6	Notes
Method 1 (Full dataset)	4,125	Best	Best	Best	Upper bound
Method 3 (344 FLAIR)	344	Near best	Near best	Near best	Upper bound
Method 4 (Ours)	20 FLAIR	Comparable	Comparable	Comparable	Only 5.8% of data
Method 5 (20 FLAIR only)	20	Significantly worse	Significantly worse	Significantly worse	No pre-training
Method 6 (Joint training)	4,000+20	Worse	Worse	Worse	Inferior to sequential fine-tuning

Ablation Study¶

Effect of fine-tuning hyperparameters on NRMSE:

Learning Rate	650 epochs	1,250 epochs	2,500 epochs	Notes
\(5 \times 10^{-4}\)	Moderate	Overfitting	Severe overfitting	LR too high
\(1 \times 10^{-4}\)	Good	Slight overfitting	Overfitting	—
\(5 \times 10^{-5}\)	Good	Good	Slight overfitting	—
\(1 \times 10^{-5}\)	Best	Good	Slightly worse	Optimal configuration

Key Findings¶

Clinical reader study (80 patients, 2 neuroradiologists): images reconstructed by the DPM from 2× accelerated data show no significant difference from standard-of-care in structural delineation and overall image quality.
Reader 1 (80 patients): DPM reconstruction is significantly superior to standard-of-care on multiple metrics (SNR, sharpness, artifacts).
Reader 2 (21 patients): standard-of-care is significantly superior to DPM only in SNR; no significant difference on remaining metrics.
Sequential fine-tuning is substantially superior to joint training—when target-domain data are scarce, sequential fine-tuning is the more effective strategy.

Highlights & Insights¶

The strategy is remarkably simple yet highly effective—"large-scale pre-training + reduced-learning-rate fine-tuning" is systematically validated in MRI reconstruction for the first time.
The clinical reader study provides strong evidence—not only is NRMSE improved, but clinical acceptability is genuinely verified.
The acquisition-model-agnostic nature of DPMs is a decisive advantage—pre-training and target data may employ different sampling patterns and coil geometries.
Prospective undersampling experiments further confirm the reliability of retrospective results.

Limitations & Future Work¶

Clinical evaluation relies on retrospective undersampling; prospective validation is limited to healthy volunteers.
DPS reconstruction is substantially slower than conventional parallel imaging and end-to-end methods, constraining clinical deployment.
Fine-tuning hyperparameters (learning rate, number of epochs) require separate tuning for each contrast type.
Inter-reader agreement between the two radiologists is only "slight" to "fair," reflecting the inherent variability of subjective assessments.

fastMRI: Provides the large-scale pre-training dataset.
DPS (Chung et al.): Establishes the posterior sampling framework.
Foundation model paradigm (Bommasani et al.): Motivates the pre-training and fine-tuning strategy.
Insight: This strategy is directly transferable to other data-scarce medical image reconstruction tasks (e.g., cardiac MRI, musculoskeletal MRI).

Rating¶

Novelty: ⭐⭐⭐ The method itself (pre-training + fine-tuning) is not novel, but its systematic validation in MRI reconstruction is valuable.
Experimental Thoroughness: ⭐⭐⭐⭐⭐ Encompasses fastMRI controlled experiments + clinical stroke data + an 80-patient blind reader study + prospective experiments—exceptionally comprehensive.
Writing Quality: ⭐⭐⭐⭐ Clinically oriented writing style with rigorous experimental design.
Value: ⭐⭐⭐⭐ Directly advances the clinical deployment of DPMs for accelerated MRI; the reader study confirms clinical feasibility.