LangDAug: Langevin Data Augmentation for Multi-Source Domain Generalization in Medical Image Segmentation¶
Conference: ICML 2025
arXiv: 2505.19659
Code: https://github.com/backpropagator/LangDAug
Area: Medical Imaging
Keywords: Data Augmentation, Domain Generalization, Langevin Dynamics, Energy-Based Models, Medical Image Segmentation
TL;DR¶
LangDAug utilizes an energy-based model (EBM) trained via contrastive divergence to generate intermediate samples by traversing between source domains using Langevin dynamics, thereby achieving multi-source domain generalization for medical image segmentation, and theoretically proving its induced regularization effect while bounding the Rademacher complexity.
Background & Motivation¶
-
Background: Medical image segmentation models suffer from a severe lack of generalization capability across different domains (e.g., different hospitals, devices, imaging parameters). Domain generalization (DG) methods address this issue through representation learning or data augmentation.
-
Limitations of Prior Work: Representation learning methods seek domain-invariant features but often rely on ad-hoc techniques and lack formal guarantees. Although data augmentation methods yield close or superior performance, existing augmentation strategies (e.g., random style transfer) lack principled designs, and it remains unclear "to what extent the augmentation should be performed."
-
Key Challenge: How can one design a theoretically guaranteed data augmentation strategy that systematically generates effective intermediate samples to bridge the gap between source domains?
-
Goal: Propose a principled data augmentation method based on energy-based models and Langevin dynamics.
-
Key Insight: Model different source domains as distinct valleys in an energy landscape, and use Langevin dynamics to "walk" between domains to generate intermediate domain samples.
-
Core Idea: Train an EBM to capture the joint energy landscape of the source domains, then traverse between domains using a Langevin sampler; the generated intermediate samples are then used to train the segmentation model.
Method¶
Overall Architecture¶
- Input: Medical images and their corresponding segmentation labels from multiple source domains
- First Step: Train an EBM to model the joint distribution of multiple source domains
- Second Step: Starting from any source domain, run a Langevin dynamics MCMC chain to generate intermediate domain samples
- Third Step: Jointly train the segmentation model using both the original and augmented data
- Output: A segmentation model with enhanced generalization capability
Key Designs¶
-
Domain Modeling with Energy-Based Models (EBM):
- Train the EBM using Contrastive Divergence: \(p_\theta(\mathbf{x}) \propto \exp(-E_\theta(\mathbf{x}))\)
- The energy landscape of the EBM naturally encodes the distribution of different source domains.
- Different domains correspond to distinct low-energy areas, while the regions between domains represent high-energy "hills".
- Design Motivation: EBM provides a continuous energy landscape, ensuring smooth transitions between domains.
-
Traversing Between Domains via Langevin Dynamics:
- Starting from a sample in domain A, run Langevin MCMC: \(\mathbf{x}_{k+1} = \mathbf{x}_k - \frac{\eta}{2} \nabla_\mathbf{x} E_\theta(\mathbf{x}_k) + \sqrt{\eta} \epsilon_k\)
- As the number of steps increases, the sample transitions from domain A to the intermediate region between domain A and domain B.
- The number of steps controls the distance of the augmented samples from the source domains.
- Design Motivation: The stationary distribution of Langevin dynamics is precisely the distribution defined by the EBM, which guarantees the validity of the sampling.
-
Theoretical Guarantees:
- Prove the implicit regularization effect induced by LangDAug.
- For Generalized Linear Models (GLMs), LangDAug bounds the Rademacher complexity by the intrinsic dimension of the data manifold.
- This implies that the effectiveness of the augmentation is related to the true complexity of the data rather than the number of model parameters.
- Design Motivation: Provide formal generalization guarantees, rather than merely empirical improvements.
Loss & Training¶
- EBM Training: Contrastive Divergence loss \(\mathcal{L}_{CD} = \mathbb{E}_{p_\text{data}}[E_\theta(\mathbf{x})] - \mathbb{E}_{p_\theta}[E_\theta(\mathbf{x}')]\)
- Segmentation Model Training: Standard segmentation loss (Cross-Entropy + Dice), trained on both original and augmented data.
- LangDAug can be combined and used alongside other domain randomization methods.
Key Experimental Results¶
Main Results¶
| Dataset | Metric | LangDAug | Prev. SOTA DG | Gain |
|---|---|---|---|---|
| Fundus Segmentation (Fundus) | Dice↑ | SOTA | Suboptimal | Significant |
| Prostate MRI | Dice↑ | SOTA | Suboptimal | Significant |
| Fundus + Domain Rand. | Dice↑ | Best | Domain Rand. alone | Complementary Gain |
Ablation Study¶
| Configuration | Dice | Description |
|---|---|---|
| No Augmentation | Baseline | Trained only on source domain data |
| Random Augmentation | Small Gain | Traditional augmentation |
| Domain Randomization (DR) | Moderate Gain | Existing SOTA augmentation |
| LangDAug alone | Better | Outperforms DR |
| LangDAug + DR | Best | Complementary to each other |
| Varying Langevin Steps | Step-sensitive | Excess steps may deviate too far from the source domains |
Key Findings¶
- LangDAug outperforms SOTA domain generalization methods on both the fundus segmentation and prostate MRI segmentation benchmarks.
- LangDAug is complementary to existing domain randomization methods—the combination yields the best results.
- The number of Langevin steps is a key hyperparameter: too few steps are ineffective, while too many may generate out-of-distribution samples.
- The theoretical regularization effect is empirically validated in the experiments.
Highlights & Insights¶
- Solid Theory: The Rademacher complexity bound provides a stronger guarantee than empirical-only methods.
- Physical Intuition: The analogy of energy landscape + Langevin dynamics is intuitive and precise.
- Complementarity: Orthogonal to existing augmentation methods, allowing cumulative usage.
- Cross-Modal Validation: Effective across different modalities including fundus and MRI.
Limitations & Future Work¶
- EBM training itself is unstable and requires careful hyperparameter tuning.
- Langevin sampling is relatively slow, increasing the overall training time.
- Currently validated only on 2D segmentation; the extension to 3D volumetric segmentation remains to be explored.
- Labels of the augmented samples require additional handling (how the segmentation labels of augmented samples are obtained in this paper warrants attention).
Related Work & Insights¶
- Domain generalization methods such as DSU and CIRL are the primary baselines for comparison.
- Augmentation methods like AdvBias and FedDG provide baselines.
- Insight: The EBM + Langevin augmentation strategy can be extended to other medical imaging tasks (e.g., classification, detection).
Rating¶
- Novelty: ⭐⭐⭐⭐ The combination of EBM + Langevin for domain generalization augmentation is novel.
- Experimental Thoroughness: ⭐⭐⭐⭐ Evaluated on two benchmarks along with complementarity experiments.
- Writing Quality: ⭐⭐⭐⭐ Clear theoretical derivations.
- Value: ⭐⭐⭐⭐ Highly practical value for medical image domain generalization.