Skip to content

Domain-Adaptive Transformer for Data-Efficient Glioma Segmentation in Sub-Saharan MRI

Conference: NeurIPS 2025 arXiv: 2511.02928 Code: None Area: Medical Imaging Keywords: Glioma segmentation, domain adaptation, Transformer, resource-constrained, BraTS-Africa

TL;DR

This paper proposes SegFormer3D+, a domain-adaptive Transformer architecture tailored for heterogeneous MRI data from Sub-Saharan Africa. By integrating histogram matching, radiomics-guided stratified sampling, a frequency-aware dual-path encoder, and a dual attention mechanism, the model achieves a mean Dice of 0.81 for glioma segmentation with only 60 annotated cases for fine-tuning, outperforming nnU-Net by +2.5%.

Background & Motivation

Background: Glioma is the most common malignant primary brain tumor in adults, and MRI is the gold standard for diagnosis and treatment planning. Deep learning segmentation methods such as nnU-Net and Swin-UNETR have demonstrated strong performance on high-quality datasets.

Limitations of Prior Work: Most models are trained on data from well-resourced institutions and suffer severe performance degradation when applied to MRI data from Sub-Saharan Africa (SSA). SSA scans typically exhibit lower resolution, increased motion artifacts, and inconsistent contrast due to aging scanners and heterogeneous acquisition protocols, resulting in substantial domain shift.

Key Challenge: The BraTS-Africa challenge introduced the first annotated glioma MRI dataset from SSA medical centers, yet it contains only 60 training cases. Existing methods individually explore histogram normalization, radiomics features, dual-path encoders, or attention mechanisms, but no prior work has systematically unified these techniques into a single domain-adaptive framework.

Goal: To design a robust segmentation architecture under conditions of severely limited annotated data and pronounced domain shift.

Key Insight: Approaching the problem from a systems engineering perspective, the paper combines multiple well-validated domain adaptation techniques into a unified framework—intensity normalization to address scanner variability, radiomics-based stratification to ensure balanced training, a frequency-aware encoder to capture artifact patterns, and dual attention to enhance fine-grained representations.

Core Idea: To integrate histogram matching, radiomics-guided stratification, a frequency-aware dual-path encoder, and spatial-channel dual attention into a unified domain-adaptive segmentation framework for robust glioma segmentation on low-resource MRI.

Method

Overall Architecture

The SegFormer3D+ pipeline takes multi-parametric MRI (T1, T1CE, T2, FLAIR) as input. It proceeds through histogram matching for intensity normalization → radiomics feature extraction for stratified sampling → a frequency-aware dual-path stem for low- and high-frequency feature extraction → a four-stage hierarchical Transformer encoder → spatial and channel dual attention fusion → a decoder producing segmentation maps for three tumor subregions (WT/TC/ET). Pre-training is performed on BraTS 2023 (\(n=1251\)), followed by fine-tuning on BraTS-Africa (\(n=60\)).

Key Designs

  1. Histogram Matching Intensity Normalization:

    • Function: Eliminates voxel intensity distribution discrepancies across different scanners.
    • Mechanism: A high-quality BraTS 2023 T1CE scan is selected as reference. The cumulative distribution functions \(F_s\) and \(F_r\) are computed for source image \(I_s\) and reference image \(I_r\), respectively. A monotonic mapping \(M(x) = F_r^{-1}(F_s(x))\) is applied to perform voxel-wise transformation: \(\hat{I}_s = M(I_s)\).
    • Design Motivation: Scanners from different SSA centers produce markedly different intensity distributions, constituting one of the primary sources of domain shift.
  2. Radiomics-Guided Stratified Sampling:

    • Function: Ensures the training data spans the domain distribution across varying acquisition quality levels.
    • Mechanism: Eighteen first-order radiomics features (mean, variance, skewness, kurtosis, energy, entropy, etc.) are extracted from normalized T2-FLAIR volumes, reduced to 10 dimensions via PCA, and clustered into \(k=3\) groups using k-means. Stratified 5-fold cross-validation is then applied to BraTS-Africa.
    • Design Motivation: Prevents the model from overfitting to dominant acquisition patterns and ensures that each fold contains scans of diverse quality.
  3. Frequency-Aware Dual-Path Stem:

    • Function: Simultaneously captures low-frequency structural information and high-frequency detail/artifact features at the encoder input.
    • Mechanism: Two-path 3D depthwise separable convolutions approximate low-pass and high-pass filtering: \(x_{\text{low}} = \text{DepthwiseConv3D}(x), \quad x_{\text{high}} = \text{DepthwiseConv3D}(x) - x_{\text{low}}\) \(x_{\text{stem}} = \text{Concat}([x_{\text{low}}, x_{\text{high}}])\) The low-pass path uses uniform initialization (\(1/27\) per kernel weight), while the high-pass path uses Kaiming initialization.
    • Design Motivation: MRI from low-resource environments frequently contains frequency-domain artifacts and noise patterns that a single convolutional stem cannot simultaneously capture; this design also avoids the computational overhead of explicit wavelet transforms.
  4. Spatial-Channel Dual Attention Fusion:

    • Function: Enhances representations of tumor-relevant spatial regions and discriminative feature channels.
    • Mechanism: Spatial attention \(A_s = \sigma(\text{Conv3D}([\text{MaxPool}(F), \text{AvgPool}(F)]))\); channel attention \(A_c = \sigma(W_2 \cdot \text{ReLU}(W_1 \cdot \text{GAP}(F)))\); final features \(F' = F \odot A_s \odot A_c\).
    • Design Motivation: The cascaded spatial and channel attention modules respectively highlight tumor spatial locations and discriminative feature channels, which is particularly important for refining ET subregion boundaries in low-contrast scans.

Loss & Training

  • Composite Dice–cross-entropy loss: \(\mathcal{L} = (1 - \frac{2|P \cap G|}{|P| + |G|}) + CE(P, G)\)
  • Optimizer: AdamW (lr=\(1\text{e}{-4}\), weight decay=\(1\text{e}{-5}\), cosine schedule)
  • Data augmentation: random flipping, affine transforms (±10° rotation, 0.9–1.1 scaling), z-score normalization
  • Pre-training on BraTS 2023 for 75 epochs → fine-tuning on BraTS-Africa for 25 epochs (early stopping, patience=20)
  • Post-processing: connected component analysis retaining the largest connected component per class
  • Random 3D crop of \(96^3\), batch size 2

Key Experimental Results

Main Results (BraTS-Africa Validation Set, \(n=35\))

Method WT Dice TC Dice ET Dice Mean Dice HD95
3D U-Net 0.86±0.03 0.71±0.05 0.68±0.06 0.75
SegFormer3D 0.88±0.03 0.73±0.04 0.70±0.05 0.77
nnU-Net 0.90±0.02 0.76±0.04 0.72±0.05 0.79 13.7+
Swin-UNETR 0.89±0.02 0.77±0.04 0.73±0.05 0.80
SegFormer3D+ 0.91±0.02 0.79±0.03 0.74±0.04 0.81 12.5

Ablation Study

Configuration WT TC ET Mean Dice p-value
Full (Ours) 0.91 0.79 0.74 0.81
w/o Histogram Matching 0.89 0.77 0.72 0.79 (−0.02) .031
w/o Frequency Stem 0.90 0.78 0.73 0.80 (−0.01) .089
w/o Dual Attention 0.89 0.76 0.71 0.79 (−0.02) .019
w/o Radiomics Stratification 0.90 0.78 0.73 0.80 (−0.01) .067
All Removed 0.88 0.73 0.70 0.77 (−0.04) <.001

Key Findings

  • The dual attention module contributes most (Dice drops by 0.02 upon removal, \(p=0.019\)), particularly improving ET boundary refinement.
  • Histogram matching ranks second in contribution (+1.5%), effectively reducing scanner-specific intensity bias.
  • The cumulative gain from all components is +4 percentage points (0.77 → 0.81), with \(p < 0.001\) when all components are removed.
  • HD95 decreases from the baseline range of 13.7–16.1 to 12.5, indicating more precise boundary localization.
  • The transfer learning strategy is effective: large-scale BraTS 2023 pre-training followed by few-shot fine-tuning on BraTS-Africa.

Highlights & Insights

  • Systems engineering perspective: Rather than pursuing a single novel component, the paper systematically integrates multiple validated techniques into a unified framework, which is more practical for resource-constrained scenarios.
  • Radiomics-guided stratification is a distinctive contribution—leveraging established tools from the tumor imaging field to address sampling bias in deep learning training.
  • The frequency-aware stem is elegantly simple: low/high frequency decomposition is achieved solely through different initialization strategies (uniform vs. Kaiming) and residual connections, without the need for complex wavelet transforms.
  • The work has direct equity implications for low-resource healthcare settings in Africa.

Limitations & Future Work

  • Only 60 training cases limit generalizability; future work requires larger SSA cohorts.
  • Self-supervised pre-training is unexplored and may be more effective than supervised pre-training under severe annotation scarcity.
  • Some ablated components yield relatively large p-values (e.g., frequency stem \(p=0.089\)), indicating insufficient statistical significance.
  • No comparison with recent foundation models (e.g., SAM-Med, UniSeg).
  • The choice of reference image for histogram matching may introduce bias.
  • vs. nnU-Net: The self-configuring approach performs well on standard data but falls short under severe domain shift compared to domain-specific designs; this paper achieves +2.5% mean Dice.
  • vs. Swin-UNETR: Both adopt Transformer architectures, but Swin-UNETR is not designed for domain shift; this paper's key advantages lie in dual attention and frequency-aware encoding.
  • vs. isolated domain adaptation techniques: Prior studies typically validate individual techniques in isolation (e.g., histogram matching alone or attention alone); this paper presents the first systematic evaluation of their combined effect.
  • The methodology offers transferable insights for other resource-constrained medical imaging scenarios (e.g., rural ultrasound, mobile CT).

Rating

  • Novelty: ⭐⭐⭐ — All components are based on existing techniques; however, their systematic integration for the specific scenario of SSA glioma segmentation carries engineering value.
  • Experimental Thoroughness: ⭐⭐⭐⭐ — Includes main results, ablation studies, qualitative analysis, and statistical significance testing, though the dataset scale is small.
  • Writing Quality: ⭐⭐⭐⭐ — Well-structured with detailed method descriptions; some equations could be made more concise.
  • Value: ⭐⭐⭐⭐ — Provides practical value for low-resource medical AI deployment and represents an important direction toward fairness and accessibility.