Multimodal Classification of Radiation-Induced Contrast Enhancements and Tumor Recurrence Using Deep Learning¶
Conference: CVPR2025
arXiv: 2603.11827
Code: To be confirmed
Area: Medical Imaging
Keywords: glioblastoma, radiation-induced contrast enhancement, tumor recurrence, multimodal classification, 3D ResNet
TL;DR¶
This work proposes RICE-NET, a multimodal 3D deep learning model that integrates longitudinal MRI data with radiotherapy radiation dose (RD) maps to distinguish between post-operative radiation-induced contrast enhancement (RICE) and tumor recurrence in glioblastoma, achieving an F1-score of \(0.92\) on an independent test set.
Background & Motivation¶
- Glioblastoma (GBM) patients require radiotherapy after surgical resection to eliminate residual tumor cells, but radiotherapy can also damage normal brain tissue.
- New contrast-enhancing lesions appearing in post-operative follow-up images present a major diagnostic challenge: distinguishing between tumor recurrence and radiation-induced contrast enhancement (RICE), both of which appear highly similar on MRI.
- Current clinical workflows rely on complex and time-consuming evaluations by multidisciplinary tumor boards, requiring reviews of pre- and post-operative scans, multiple follow-up images, and radiotherapy plans.
- Existing methods heavily rely on clinically scarce diffusion MRI or neglect radiotherapy dose maps, despite the latter receiving increasing attention in tumor boards.
- Most prior studies overlook the longitudinal evolutionary information of the images.
Method¶
Data and Preprocessing¶
- Data source: 92 GBM patients from Heidelberg University Hospital, with a training/validation set of 80 patients (48 tumor recurrence + 32 RICE) and an independent test set of 12 patients (7 recurrence + 5 RICE).
- Three 3D input volumes per patient:
- Post-operative MRI (MRI post-OP): T1-weighted contrast-enhanced MRI at post-op baseline, used for radiotherapy planning.
- Event MRI (MRI event): T1-weighted contrast-enhanced MRI when the new contrast-enhancing lesion is detected.
- Radiotherapy Dose Map (RD map): 3D spatial distribution of the cumulative radiation dose.
- Preprocessing workflow: Isotropic resampling \(\rightarrow\) ANTs registration \(\rightarrow\) HD-BET skull stripping \(\rightarrow\) Z-score normalization \(\rightarrow\) cropping to \(224 \times 224 \times 224\) voxels.
- The ground truth was confirmed by biopsy results.
Network Architecture¶
- A 3D ResNet-18 implemented based on the MONAI framework, extending the original 2D ResNet to three dimensions to process volumetric data.
- Architecture: Initial 3D convolutional layer \(\rightarrow\) 4 residual blocks (3D BatchNorm + ReLU) \(\rightarrow\) global average pooling \(\rightarrow\) fully connected classification layer.
- Residual connections ensure more stable gradient flow and convergence, which is particularly crucial for small medical datasets.
- Multimodal fusion strategy: Channel-wise concatenation, where multiple \(224 \times 224 \times 224\) volumes are stacked along the first dimension.
- Separate and independent models were trained for different modal combination experiments (instead of using shared weights with selective inputs).
- ResNet-18 was selected instead of deeper networks to balance expressiveness and computational efficiency, mitigating overfitting risks in the small-sample scenario of 92 patients.
Loss & Training¶
- Training lasted for 800 epochs with 5-fold cross-validation (folds were fixed at the patient level and remained consistent across all experiments).
- Adam optimizer + cross-entropy loss function.
- A weighted random sampler was employed to ensure balanced training over the two classes (addressing the imbalance of 48 recurrence vs. 32 RICE cases).
- The evaluation metric chosen was the macro F1-score: taking the unweighted average of the F1-scores of both classes, which is more robust under class imbalance.
- Data augmentation: elastic deformation, rotation, scaling, Gaussian noise, brightness, and gamma adjustments.
- Evaluation of the test set utilized a majority voting ensemble strategy of the 5-fold cross-validated models.
Interpretability Analysis¶
- Occlusion sensitivity maps were employed: systematically occluding small 3D cubic regions and observing changes in the output probability.
- Occlusion was performed synchronously across all registered volumes to identify the regions that are most influential for the classification.
Key Experimental Results¶
Ablation Study (F1-score)¶
| Input Modality | Validation F1 | Test F1 |
|---|---|---|
| MRI post-OP Only | \(0.70\) | — |
| MRI event Only | \(0.58\) | — |
| RD map Only | \(0.78\) | — |
| MRI post-OP + MRI event | — | — |
| MRI post-OP + RD | \(0.828\) | — |
| MRI event + RD | \(0.83\) | — |
| All Three (RICE-NET) | \(0.804\) | \(0.916\) |
Key Findings¶
- The radiotherapy dose map is the most informative single-modality input (F1 = \(0.78\) vs. MRI post-OP \(0.70\) vs. MRI event \(0.58\)).
- Integrating MRI with RD further improves performance, validating the complementarity of modalities.
- The ensembled cross-validation model achieved an F1-score of \(0.916\) on the independent test set (via majority voting).
- In experiments using only MRI, the gap between validation and test F1-scores was approximately \(0.35\), reflecting the statistical uncertainty of small datasets.
- Occlusion analysis demonstrates that the model's focus areas are highly correlated with high-dose regions, while also attending to contrast-enhancing lesions.
Highlights & Insights¶
- First End-to-End Classification Fusing Radiotherapy Dose Maps: Radiotherapy planning is utilized as an explicit input, validating its importance as the strongest single-modality signal.
- Longitudinal MRI Modeling: Post-operative baseline and event-timepoint images are utilized simultaneously to capture lesion evolutionary information.
- Systematic Ablation: Full ablation across 7 modal combinations quantifies the diagnostic contribution of each modality.
- Clinical Interpretability: Occlusion sensitivity maps align with clinical areas of interest, facilitating assisted decision-making.
- Use of Routine T1 MRI: No reliance on scarce diffusion MRI, enhancing clinical applicability.
Limitations & Future Work¶
- Extremely Small Sample Size: Only 92 patients (80 for training, 12 for testing), lacking sufficient statistical reliability and showing prominent gaps between validation and test performances.
- Lack of Unaffected Control Group: The dataset only contains recurrence and RICE categories, missing non-lesion controls.
- Simple Channel Fusion: Channel-wise concatenation might not capture complex interaction patterns between the MRI and dose maps.
- Single-center Data: All data originate from Heidelberg University Hospital; thus cross-center generalizability remains unknown.
- Exclusion of Clinical Variables: Patient age, treatment regimens, and other clinical metadata are not integrated.
- All Modal Combination Yields Lower Validation F1 than Some Partial Combinations (\(0.804\) vs. \(0.83\)), indicating possible overfitting or modal conflict.
Related Work & Insights¶
| Method | Characteristics | Comparison with Ours |
|---|---|---|
| Bernhardt et al. | DEGRO guidelines, clinical workflow based on diffusion MRI | RICE-NET uses routine T1 MRI, which is more easily accessible |
| Wang et al. | DTI + DSC-MRI to differentiate pseudoprogression | Relies on diffusion imaging, possessing low clinical accessibility |
| Eichkorn et al. | Analyzing association between RICE and ischemic stroke risk factors | Non-deep learning method, whereas RICE-NET provides automated classification |
| Standard clinical workflow | Multidisciplinary evaluation by tumor boards | RICE-NET can assist in accelerating decision-making, though clinical validation is still required |
Rating¶
- Novelty: ⭐⭐⭐⭐ — First to utilize radiotherapy dose maps as a deep learning input for RICE vs. recurrence classification.
- Experimental Thoroughness: ⭐⭐⭐ — Comprehensive ablation design but the sample size is too small, lacking statistical significance.
- Writing Quality: ⭐⭐⭐⭐ — Clear problem motivation and adequate description of clinical background.
- Value: ⭐⭐⭐ — Crucial clinical problem but requires large-scale multi-center validation to confirm practical value.