Demo: Generative AI helps Radiotherapy Planning with User Preference¶

Conference: NeurIPS 2025 (GenAI for Health Workshop)
arXiv: 2512.08996
Code: Demo Video
Area: Medical Imaging
Keywords: radiotherapy planning, dose prediction, user preference interaction, VQ-VAE, generative AI

TL;DR¶

This paper proposes the Flexible Dose Proposer (FDP), a two-stage training framework (VQ-VAE pretraining + multi-condition encoding) that enables slider-based interactive 3D dose distribution prediction incorporating user preferences. The system is integrated into the Eclipse clinical treatment planning system and outperforms Varian RapidPlan in head-and-neck cancer radiotherapy scenarios.

Background & Motivation¶

Background: Radiotherapy planning is a complex clinical workflow with significant variability across institutions and planners. Deep learning methods have made progress in dose prediction, fluence map generation, and MLC leaf sequencing.

Limitations of Prior Work: - Knowledge-driven systems such as RapidPlan rely on DVH prediction and cannot capture spatial dose details; PCA regression-based pipelines are trained on only tens of plans with limited generalizability. - Existing deep learning dose prediction models neglect user preference interaction — different planners have varying requirements for the trade-off between OAR sparing and PTV coverage. - Dose prediction itself does not constitute a deliverable plan, and integration with clinical TPS systems remains insufficiently studied.

Key Challenge: A single model cannot accommodate diverse planning styles, and training is prone to bias toward the specific style of the reference plans.

Key Insight: Drawing on the conditional control paradigm from generative AI, "preference flavors" sliders allow users to customize the OAR–PTV trade-off in real time.

Core Idea: A VQ-VAE pretrained dose decoder provides a stable foundation, and user preference encodings serve as conditional inputs, enabling interactive and personalized dose prediction.

Method¶

Overall Architecture¶

Inputs consist of CT images, RT structures (PTV/OAR contours), beam/angle plates, and user preference slider values. The pipeline proceeds in two stages: Stage I pretrains a VQ-VAE dose decoder, and Stage II trains a flexible dose prediction model with multi-condition inputs. The output is a 3D dose distribution, which is subsequently converted into a deliverable treatment plan in Eclipse via objective function extraction.

Key Designs¶

Stage I: VQ-VAE Base Dose Decoder Pretraining
- Function: Pretrains a VQ-VAE on 31K dose samples to learn latent representations of realistic dose distributions.
- Mechanism: Unlike latent diffusion models that use a VAE for compression and acceleration, this work uses pretraining to stabilize Stage II training. The loss function is: \(\mathcal{L}_{\text{stage1}} = \underbrace{\mathbb{E}_i[\|x_i - \hat{x}_i\|]}_{\text{Reconstruction}} + \beta L_{vq} + L_{adv}(x, \hat{x}) + \underbrace{\lambda \cdot \log(\mathbb{E}_{i<j}[\exp(-t\|\hat{z}_i - \hat{z}_j\|^2)])}_{\text{Uniformity}}\)
- Design Motivation: A uniformity loss regularizes the latent space to prevent mode collapse; the adversarial loss ensures the realism of generated doses. Without pretraining, Stage II training becomes unstable under complex conditioning and produces artifacts at PTV/OAR boundaries.
Stage II: Multi-Condition Flexible Dose Prediction
- Function: Takes CT and RT structures as image inputs and user preferences along with beam plates as conditional inputs to predict personalized 3D doses.
- Mechanism: A MedNeXt-based image encoder processes multi-channel inputs; user preferences and beam/angle information are injected via AdaIN (Adaptive Instance Normalization). The loss function is: \(\mathcal{L}_{\text{stage2}}^{(i)} = \|x_i - \hat{x}_i\| + \|z_i - \hat{z}_i\| + L_{adv}(x_i, \hat{x}_i) + \mathcal{L}_{\text{obj}}^{(i)}\)
- The objective consistency loss \(\mathcal{L}_{\text{obj}}\) ensures alignment between predictions and user preferences: \(\mathcal{L}_{\text{obj}}^{(i)} = \|\tilde{h} - \hat{h}\| + \|p - \hat{p}\| + \|\tilde{w} \cdot u_{\text{oar}} - \hat{u}_{\text{oar}}\|\) where \(\tilde{h}\) denotes the user-specified HI value and \(\tilde{w}\) is the OAR sparing preference weight.
Random Preference Sampling during Training
- Function: Slider values \(\{\tilde{h}, \tilde{w}\}\) are randomly sampled within predefined ranges during training.
- Design Motivation: This enables the model to learn responses to diverse preference combinations rather than overfitting to a single planning style.
Clinical Integration
- The predicted 3D dose is converted into optimization objectives for the Eclipse treatment planning system via objective function extraction, yielding a deliverable plan.

Loss & Training¶

Stage I: reconstruction + VQ quantization + adversarial + uniformity losses, trained on 31K dose samples.
Stage II: image-space reconstruction + latent-space reconstruction + adversarial + objective consistency losses, trained on 6 cohorts (~820 training samples in total).
Single-step generation via GAN; inference takes approximately 30 ms, avoiding the iterative sampling of diffusion models.

Key Experimental Results¶

Main Results: Intra-Patient DVH Variability (VMAT Plans)¶

OAR	RapidPlan std	FDP std	RapidPlan mean	FDP mean
SpinalCord05	3.02	1.18	3.25	1.69
Larynx-PTV	3.49	1.78	6.29	2.94
ParotidCon-PTV	3.38	1.22	4.51	1.92
PosteriorNeck	4.30	0.92	8.15	1.19
Trachea	3.40	1.68	3.42	2.01
Better count	0	15	1	14

FDP outperforms RapidPlan on the std metric for 15/15 OARs and on the mean metric for 14/15 OARs.

Ablation Study¶

Configuration	MAE (↓)	Notes
w/o Stage I pretrain	2.63	boundary artifacts observed
w/ Stage I pretrain	2.56	more realistic dose distributions

Key Findings¶

Stage I pretraining not only reduces MAE but, more importantly, eliminates artifacts at PTV/OAR boundaries, indicating that pretraining provides prior constraints on the dose distribution.
The user preference sliders effectively control the trade-off between OAR sparing and PTV homogeneity: P1 (prioritizing OAR protection) and P2 (prioritizing PTV) produce distinct dose distributions in actual Eclipse plans.
Model inference takes only 30 ms; with visualization the total latency is approximately 1.5 s, approaching real-time interaction.

Highlights & Insights¶

First dose prediction model with interactive sliders: The paper introduces the conditional control paradigm of generative AI into radiotherapy planning, enabling real-time adjustment of user preferences — a concept transferable to other medical imaging tasks requiring personalized outputs.
Elegant two-stage training strategy: Large-scale (31K) but lower-quality data are used to pretrain the decoder, followed by conditional fine-tuning on small-scale high-quality data, addressing training instability under medical data scarcity.
End-to-end clinical validation: The work goes beyond dose prediction by validating deliverable plan quality through Eclipse integration, which is relatively rare in academic research.

Limitations & Future Work¶

Evaluation is limited to head-and-neck cancer; generalizability to other treatment sites (lung, abdomen, etc.) has not been verified.
RapidPlan is trained with a small dataset using conventional methods, so the comparison with a deep learning model involves differing training paradigms and warrants further discussion of fairness.
User preferences currently span only two dimensions — HI and OAR weight — with no control over CI or finer-grained DVH objectives.
GAN-based single-step generation may limit output diversity; diffusion models could potentially offer finer-grained conditional control.

vs. RapidPlan: RapidPlan relies on PCA and small-data conventional ML, predicting only DVH; FDP directly predicts 3D dose with voxel-level spatial awareness.
vs. DoseDiff: Diffusion-model-based dose prediction achieves high accuracy but slow inference due to iterative sampling; FDP employs single-step GAN generation for real-time interaction.
vs. OpenKBP series: Prior work focuses solely on prediction accuracy without considering user preference interaction or clinical TPS integration.

Rating¶

Novelty: ⭐⭐⭐⭐ First interactive preference-driven dose prediction system, though individual technical components (VQ-VAE, AdaIN) are not novel contributions.
Experimental Thoroughness: ⭐⭐⭐ Limited dataset scale, single treatment site, and insufficient comparison with other deep learning methods.
Writing Quality: ⭐⭐⭐⭐ Well-structured, clinical motivation clearly articulated, and demo presentation is intuitive.
Value: ⭐⭐⭐⭐ Strong clinical applicability; addresses genuine pain points in planner workflows.