LangDAug: Langevin Data Augmentation for Multi-Source Domain Generalization in Medical Image Segmentation¶

Conference: ICML 2025
arXiv: 2505.19659
Code: https://github.com/backpropagator/LangDAug
Area: Medical Imaging
Keywords: Data Augmentation, Domain Generalization, Langevin Dynamics, Energy-Based Models, Medical Image Segmentation

TL;DR¶

LangDAug utilizes an energy-based model (EBM) trained via contrastive divergence to generate intermediate samples by traversing between source domains using Langevin dynamics, thereby achieving multi-source domain generalization for medical image segmentation, and theoretically proving its induced regularization effect while bounding the Rademacher complexity.

Background & Motivation¶

Background: Medical image segmentation models suffer from a severe lack of generalization capability across different domains (e.g., different hospitals, devices, imaging parameters). Domain generalization (DG) methods address this issue through representation learning or data augmentation.
Limitations of Prior Work: Representation learning methods seek domain-invariant features but often rely on ad-hoc techniques and lack formal guarantees. Although data augmentation methods yield close or superior performance, existing augmentation strategies (e.g., random style transfer) lack principled designs, and it remains unclear "to what extent the augmentation should be performed."
Key Challenge: How can one design a theoretically guaranteed data augmentation strategy that systematically generates effective intermediate samples to bridge the gap between source domains?
Goal: Propose a principled data augmentation method based on energy-based models and Langevin dynamics.
Key Insight: Model different source domains as distinct valleys in an energy landscape, and use Langevin dynamics to "walk" between domains to generate intermediate domain samples.
Core Idea: Train an EBM to capture the joint energy landscape of the source domains, then traverse between domains using a Langevin sampler; the generated intermediate samples are then used to train the segmentation model.

Method¶

Overall Architecture¶

Input: Medical images and their corresponding segmentation labels from multiple source domains
First Step: Train an EBM to model the joint distribution of multiple source domains
Second Step: Starting from any source domain, run a Langevin dynamics MCMC chain to generate intermediate domain samples
Third Step: Jointly train the segmentation model using both the original and augmented data
Output: A segmentation model with enhanced generalization capability

Key Designs¶

Domain Modeling with Energy-Based Models (EBM):
- Train the EBM using Contrastive Divergence: \(p_\theta(\mathbf{x}) \propto \exp(-E_\theta(\mathbf{x}))\)
- The energy landscape of the EBM naturally encodes the distribution of different source domains.
- Different domains correspond to distinct low-energy areas, while the regions between domains represent high-energy "hills".
- Design Motivation: EBM provides a continuous energy landscape, ensuring smooth transitions between domains.
Traversing Between Domains via Langevin Dynamics:
- Starting from a sample in domain A, run Langevin MCMC: \(\mathbf{x}_{k+1} = \mathbf{x}_k - \frac{\eta}{2} \nabla_\mathbf{x} E_\theta(\mathbf{x}_k) + \sqrt{\eta} \epsilon_k\)
- As the number of steps increases, the sample transitions from domain A to the intermediate region between domain A and domain B.
- The number of steps controls the distance of the augmented samples from the source domains.
- Design Motivation: The stationary distribution of Langevin dynamics is precisely the distribution defined by the EBM, which guarantees the validity of the sampling.
Theoretical Guarantees:
- Prove the implicit regularization effect induced by LangDAug.
- For Generalized Linear Models (GLMs), LangDAug bounds the Rademacher complexity by the intrinsic dimension of the data manifold.
- This implies that the effectiveness of the augmentation is related to the true complexity of the data rather than the number of model parameters.
- Design Motivation: Provide formal generalization guarantees, rather than merely empirical improvements.

Loss & Training¶

EBM Training: Contrastive Divergence loss \(\mathcal{L}_{CD} = \mathbb{E}_{p_\text{data}}[E_\theta(\mathbf{x})] - \mathbb{E}_{p_\theta}[E_\theta(\mathbf{x}')]\)
Segmentation Model Training: Standard segmentation loss (Cross-Entropy + Dice), trained on both original and augmented data.
LangDAug can be combined and used alongside other domain randomization methods.

Key Experimental Results¶

Main Results¶

Dataset	Metric	LangDAug	Prev. SOTA DG	Gain
Fundus Segmentation (Fundus)	Dice↑	SOTA	Suboptimal	Significant
Prostate MRI	Dice↑	SOTA	Suboptimal	Significant
Fundus + Domain Rand.	Dice↑	Best	Domain Rand. alone	Complementary Gain

Ablation Study¶

Configuration	Dice	Description
No Augmentation	Baseline	Trained only on source domain data
Random Augmentation	Small Gain	Traditional augmentation
Domain Randomization (DR)	Moderate Gain	Existing SOTA augmentation
LangDAug alone	Better	Outperforms DR
LangDAug + DR	Best	Complementary to each other
Varying Langevin Steps	Step-sensitive	Excess steps may deviate too far from the source domains

Key Findings¶

LangDAug outperforms SOTA domain generalization methods on both the fundus segmentation and prostate MRI segmentation benchmarks.
LangDAug is complementary to existing domain randomization methods—the combination yields the best results.
The number of Langevin steps is a key hyperparameter: too few steps are ineffective, while too many may generate out-of-distribution samples.
The theoretical regularization effect is empirically validated in the experiments.

Highlights & Insights¶

Solid Theory: The Rademacher complexity bound provides a stronger guarantee than empirical-only methods.
Physical Intuition: The analogy of energy landscape + Langevin dynamics is intuitive and precise.
Complementarity: Orthogonal to existing augmentation methods, allowing cumulative usage.
Cross-Modal Validation: Effective across different modalities including fundus and MRI.

Limitations & Future Work¶

EBM training itself is unstable and requires careful hyperparameter tuning.
Langevin sampling is relatively slow, increasing the overall training time.
Currently validated only on 2D segmentation; the extension to 3D volumetric segmentation remains to be explored.
Labels of the augmented samples require additional handling (how the segmentation labels of augmented samples are obtained in this paper warrants attention).

Domain generalization methods such as DSU and CIRL are the primary baselines for comparison.
Augmentation methods like AdvBias and FedDG provide baselines.
Insight: The EBM + Langevin augmentation strategy can be extended to other medical imaging tasks (e.g., classification, detection).

Rating¶

Novelty: ⭐⭐⭐⭐ The combination of EBM + Langevin for domain generalization augmentation is novel.
Experimental Thoroughness: ⭐⭐⭐⭐ Evaluated on two benchmarks along with complementarity experiments.
Writing Quality: ⭐⭐⭐⭐ Clear theoretical derivations.
Value: ⭐⭐⭐⭐ Highly practical value for medical image domain generalization.