Unsupervised Domain Adaptation with Target-Only Margin Disparity Discrepancy¶
Conference: CVPR 2026 arXiv: 2603.09932 Area: Medical Imaging Keywords: Unsupervised Domain Adaptation, Margin Disparity Discrepancy, CBCT, Liver Segmentation, Interventional Imaging Code: None
TL;DR¶
This paper addresses unsupervised domain adaptation (UDA) for CT→CBCT liver segmentation. It identifies a contradictory term in the classical MDD objective—where the feature extractor is optimized to maximize the discrepancy between \(f\) and \(f'\) on the source domain—and proposes Target-Only MDD, which removes this contradiction and minimizes prediction discrepancy on both domains. The method achieves state-of-the-art UDA performance in both 2D and 3D experiments.
Background & Motivation¶
- Clinical Context: CBCT provides real-time intraoperative 3D guidance in interventional radiology; automatic liver segmentation is critical for surgical planning.
- Data Scarcity: CT benefits from abundant publicly annotated datasets, whereas interventional CBCT data are scarce and unannotated.
- Sources of Domain Shift:
- Scatter artifacts, limited dynamic range, and reconstruction geometry differences in CBCT
- Intra-arterial contrast enhancement causing bright regions inside the liver
- Limited CBCT field of view differing from CT
- Limitations of Existing Methods:
- Foundation models (SAM-MED 2D/3D, MA-SAM): primarily trained on natural images; generalization to CBCT is limited.
- Image alignment methods (SIFA): rely on a shared field-of-view assumption, inapplicable to CT/CBCT pairs.
- Self-training methods (BDCL): pseudo-label quality degrades under large domain shifts.
- Problem with MDD: In the classical MDD formulation, the feature extractor \(\psi\) is optimized to maximize the discrepancy between \(f\) and \(f'\) on the source domain (the red-boxed term in Eq. 3), which directly contradicts the goal of domain alignment.
Method¶
Background: Classical MDD¶
A U-Net is decomposed into a feature extractor \(\psi\), a segmentation head \(f\), and an adversarial segmentation head \(f'\). The classical MDD objective is:
Issue: In practice, \(f\) is excluded from the discrepancy terms, and \(\psi\) is optimized to maximize the discrepancy between \(f\) and \(f'\) on the source domain (the negative sign on the last term), contradicting the objective of aligning features across both domains.
Target-Only MDD (Proposed Method)¶
The contradictory term is removed and the optimization is restructured into three alternating steps:
Step 1 — Optimize segmentation head \(f\): $\(\min_f L^{\text{task}}(f(z^S), y^S)\)$
Step 2 — Optimize adversarial head \(f'\): $\(\min_{f'} \left[ \mathcal{L}_{CE}(f'(z^S), f(z^S)) - \gamma \mathcal{L}_{CE}(f'(z^T), f(z^T)) \right]\)$
\(f'\) is trained to imitate \(f\) on the source domain while diverging from \(f\) on the target domain.
Step 3 — Optimize feature extractor \(\psi\): $\(\min_\psi \left[ L^{\text{task}}(f(z^S), y^S) + \alpha \mathcal{L}_{CE}(f'(z^S), f(z^S)) + \gamma \mathcal{L}_{CE}(f'(z^T), f(z^T)) \right]\)$
The key change: \(\psi\) now minimizes the discrepancy between \(f\) and \(f'\) on both domains (the source domain term changes from negative to positive), eliminating the contradiction.
Few-Shot Extension¶
After UDA training, \(f \circ \psi\) is retained and \(f'\) is discarded; the model is then fine-tuned with a small number of annotated target-domain samples.
Implementation Details¶
- Backbone: U-Net, 5 stages, 64 channels in the first stage
- Hyperparameters: \(\alpha = 7.5 \times 10^{-2}\), \(\gamma = 3 \times 10^{-1}\)
- Data: 573 CBCT cases + 678 CT cases, patient-level segmentation
Key Experimental Results¶
2D CT→CBCT Liver Segmentation¶
| Type | Method | F1 (%) |
|---|---|---|
| Source Only | U-Net | 54.1 |
| Foundation | SAM-MED 2D (5pt) | 67.7 |
| Self-Training | BDCL | 60.0 |
| Feature Align | DANN | 68.3 |
| Feature Align | MDD | 70.0 |
| Feature Align | Ours | 74.4 |
| Few-shot | Ours + 50 vol | 84.6 |
| Upper Bound | Target Only (100%) | 85.5 |
3D CT→CBCT Liver Segmentation¶
| Type | Method | F1 (%) |
|---|---|---|
| Source Only | U-Net | 80.1 |
| Foundation | SAM-MED 3D (5pt) | 65.3 |
| Foundation | MA-SAM | 61.8 |
| Image Align | SIFA | 64.7 |
| Feature Align | DANN | 84.6 |
| Feature Align | Ours | 86.6 |
| Few-shot | Ours + 5 vol | 90.9 |
| Upper Bound | Target Only (100%) | 93.7 |
Key Findings¶
- UDA without annotations outperforms 5-shot target-domain training: Ours (86.6%) > Target Only 5-vol (84.7%).
- Ours + 5 vol (90.9%) ≈ Target Only 20-vol (89.6%).
- Hyperparameter robustness: Performance remains stable across different \(\alpha\) and \(\gamma\) combinations.
- Lowest variance: The proposed method achieves a standard deviation of 9.4%, substantially lower than MA-SAM (18.3%) and SAM-MED 3D (28.8%).
Highlights & Insights¶
- Theory-driven improvement: Rather than stacking complex modules, the paper identifies and corrects a contradictory term in the MDD objective, yielding a principled and concise fix.
- Sober assessment of foundation models: SAM-MED performs substantially worse than UDA on CBCT, demonstrating that domain shift in medical imaging still requires dedicated treatment.
- In-depth analysis of CBCT-specific challenges: Contrast-enhanced bright regions cause under-segmentation of the liver; the proposed method mitigates this effectively via 3D contextual information.
- Natural few-shot extension: A small number of annotations after UDA suffices to approach fully supervised performance.
Limitations & Future Work¶
- Validated on liver segmentation only; generalization to other organs or segmentation tasks remains unexplored.
- The dataset is proprietary, precluding reproducibility.
- Common image translation baselines (e.g., CycleGAN) are not compared in the 2D experiments.
- The MDD theoretical framework is primarily designed for classification; theoretical guarantees for segmentation tasks remain incomplete.
Rating¶
| Dimension | Score |
|---|---|
| Novelty | ⭐⭐⭐ |
| Experimental Thoroughness | ⭐⭐⭐⭐ |
| Writing Quality | ⭐⭐⭐⭐ |
| Value | ⭐⭐⭐⭐ |