SCDL: Semantic Class Distribution Learning for Debiasing Semi-Supervised Medical Image Segmentation¶
Conference: CVPR 2026 arXiv: 2603.05202 Code: github.com/Zyh55555/SCDL Area: Medical Image Segmentation / Semi-Supervised Learning Keywords: Semi-supervised segmentation, class imbalance, distribution alignment, semantic anchor, plug-and-play
TL;DR¶
This paper proposes SCDL, a plug-and-play semantic class distribution learning framework that addresses supervision bias and representation imbalance in semi-supervised medical image segmentation (SSMIS) via two components: Class Distribution Bidirectional Alignment (CDBA), which learns structured class-conditional feature distributions through proxy distributions, and Semantic Anchor Constraint (SAC), which guides proxy distributions toward true class semantics. SCDL achieves state-of-the-art performance on minority class segmentation.
Background & Motivation¶
Background: Semi-supervised medical image segmentation (SSMIS) leverages unlabeled data to reduce annotation burden, commonly through consistency regularization and contrastive representation learning.
Limitations of Prior Work: Medical segmentation data inherently suffers from severe class imbalance, with dominant organs occupying far more pixels than minor ones. Combined with semi-supervised mechanisms, this produces two compounding problems: (i) supervision bias—self-generated pseudo-labels and consistency constraints further reinforce learning of head classes, leaving tail classes underrepresented; (ii) representation imbalance—head-class features become compact while tail-class features drift into head-class-dominated regions, blurring class boundaries.
Key Challenge: Existing debiasing methods (re-weighting, output calibration) operate only at the loss or output level, without directly constraining class-conditional feature distributions. Unlabeled data is predominantly used for local consistency regularization rather than explicitly correcting skewed class-conditional feature distributions.
Key Insight: Learn a trainable proxy distribution for each semantic class in the embedding space, and use bidirectional alignment to provide consistent learning signals across all classes, including minorities.
Core Idea: Elevate debiasing from the loss/output level to the feature distribution level, reshaping class-conditional feature structure through bidirectional proxy distribution alignment and semantic anchor constraints.
Method¶
Overall Architecture¶
Existing segmentation network → encoder output embeddings \(\mathbf{Z} \in \mathbb{R}^{B \times L \times D}\) → CDBA module maintains a learnable proxy distribution \(\mathcal{N}(\mu_c, \text{diag}(\sigma_c^2))\) per class and performs bidirectional alignment → proxy samples generate three types of priors (distribution-weighted / center-similarity / token-sampled) injected into the decoder at multiple stages → SAC extracts semantic anchors from annotated regions to align proxies toward true semantics → plug-and-play, requiring no modification to the baseline training pipeline.
Key Designs¶
-
Class Distribution Bidirectional Alignment (CDBA)
-
Function: Learns a proxy distribution for each class in the embedding space and pulls embeddings and proxies toward each other through bidirectional alignment.
- Mechanism: Each class is modeled as a Gaussian proxy \(p(u|c) = \mathcal{N}(\mu_c, \text{diag}(\sigma_c^2))\). Soft assignment \(P(c|z) = \text{softmax}_c(\cos(z, \mu_c))\) allows each embedding to associate with multiple classes. The E2P loss \(\mathcal{L}_{E2P} = \sum P(c|z) [1 - \cos(z, \mu_c)]\) pulls embeddings toward proxies, while the P2E loss \(\mathcal{L}_{P2E} = \frac{1}{C}\sum_c \exp(-({\mathcal{E}_c^+ - \mathcal{E}_c^-}))\) drives each proxy to actively identify its own embeddings.
-
Design Motivation: Unidirectional alignment (E2P only) allows head-class proxies to dominate gradients, leaving minority-class embeddings still attracted to head-class regions. Bidirectional constraints ensure each proxy actively learns to discriminate its own embeddings, enabling minority-class proxies to form compact distributions as well.
-
Semantic Anchor Constraint (SAC)
-
Function: Provides reliable semantic supervision for each proxy from labeled data, preventing proxies from drifting away from true class semantics.
- Mechanism: For each class \(c\), the mean of encoder embeddings extracted from class-specific regions of annotated images serves as the semantic anchor: \(\text{anchor}_c = \frac{1}{|\mathcal{Z}_c|}\sum_{z \in \mathcal{Z}_c} z\) (detached during backpropagation to prevent gradient flow into the encoder). The alignment loss is \(\mathcal{L}_{SAC} = \frac{1}{C}\sum_c [1 - \cos(\mu_c, \text{anchor}_c)]\).
-
Design Motivation: Proxies are randomly initialized and trained primarily on unlabeled data, leaving them prone to semantic drift. Semantic anchors from labeled data provide the only reliable class-level supervisory signal.
-
Proxy-Sampled Feature Augmentation
-
Function: Samples from the learned proxy distributions to construct three complementary priors injected into different decoder stages.
- Mechanism: (i) Distribution-weighted prior—samples \(S\) times from each proxy to compute a weighted mean \(\mathbf{r}^{dist}\), capturing distribution-level structure including variance; (ii) Center-similarity prior—uses proxy means directly for weighted aggregation \(\mathbf{r}^{center}\), ignoring variance to provide a complementary signal; (iii) Token-sampled prior—locally perturbed sampling for robustness. The three priors are concatenated and injected into the decoder via a lightweight projection.
- Design Motivation: Using only proxy means ignores distributional shape, while using only samples introduces noise. The three complementary priors ensure that both head and tail classes contribute effectively to segmentation.
Loss & Training¶
Total loss = baseline segmentation loss + \(\mathcal{L}_{E2P}\) + \(\mathcal{L}_{P2E}\) + \(\mathcal{L}_{SAC}\). SCDL is integrated as a plug-and-play module into four baselines: GenericSSL, DHC, GA-CPS, and GA-MagicNet. Weight decay is set to 1e-4. Experiments are conducted on an NVIDIA A40 GPU with batch size 4.
Key Experimental Results¶
Main Results¶
| Method | Synapse 20% DSC↑ | Synapse ASD↓ | AMOS 5% DSC↑ | AMOS ASD↓ |
|---|---|---|---|---|
| GenericSSL | 55.94 | 6.14 | 35.73 | 45.82 |
| SCDL-GenericSSL | 58.90 (+2.96) | 5.79 | 47.35 (+11.62) | 22.84 |
| GA-CPS | 66.29 | 5.44 | 50.90 | 13.77 |
| SCDL-GA-CPS | 67.50 (+1.21) | 3.32 | 61.57 (+10.67) | 10.08 |
| GA-MagicNet | 66.00 | 3.42 | 59.15 | 8.66 |
| SCDL-GA-MagicNet | 66.75 (+0.75) | 3.65 | 62.16 (+3.01) | 5.65 |
| DHC | 46.16 | 10.04 | 40.11 | 40.65 |
| SCDL-DHC | 49.17 (+3.01) | 10.59 | 49.28 (+9.17) | 17.47 |
Ablation Study (Per-Class Dice on Synapse)¶
| Class | GA-CPS | SCDL-GA-CPS | Change | Type |
|---|---|---|---|---|
| Gallbladder (Ga) | 26.7 | 25.4 | -1.3 | Tail |
| Esophagus (Es) | 40.2 | 38.7 | -1.5 | Tail |
| Pancreas (PA) | 45.5 | 49.4 | +3.9 | Tail |
| Right Adrenal Gland (RAG) | 44.7 | 49.2 | +4.5 | Tail |
| Spleen (Sp) | 85.5 | 88.2 | +2.7 | Head |
| Liver (Li) | 92.7 | 93.7 | +1.0 | Head |
Key Findings¶
- The most significant gains occur under AMOS 5% annotation (DSC up to +11.62%), confirming that SCDL is most advantageous when labeled data is extremely scarce.
- On the DHC baseline, ASD decreases from 40.65 to 17.47 (\(\downarrow 23.18\)), indicating substantial improvement in boundary-level accuracy.
- Tail classes such as pancreas and adrenal glands show notable improvements (+3.9/+4.5 Dice), though a few tail classes (gallbladder, esophagus) exhibit marginal declines, suggesting that proxy learning for extremely small structures still requires more supervisory signal.
- Consistent gains across four diverse baselines validate the generalizability of the framework.
Highlights & Insights¶
- Elevating debiasing from the loss/output level to the feature distribution level is conceptually more fundamental: rather than instructing the model to "ignore" imbalance, SCDL explicitly restructures the feature space to impose organized class-conditional distributions.
- The bidirectional alignment design is elegant—E2P directs embeddings toward the correct proxy, while P2E drives each proxy to actively discriminate its own embeddings; the two optimization objectives are complementary.
- Detaching encoder gradients in the semantic anchor computation is a critical implementation detail—it updates only the proxies without interfering with the encoder's representation learning.
Limitations & Future Work¶
- A few extremely small tail classes (e.g., gallbladder Dice declining from 26.7 to 25.4) remain unresolved, potentially requiring a minimum sample size guarantee.
- The Gaussian assumption for proxy distributions may be overly simplistic; mixture models or normalizing flows could capture more complex class-conditional structure.
- Validation is limited to CT multi-organ segmentation; other modalities such as pathology and fundus imaging remain untested.
- The number of proxies is tied to the number of classes, which may lack flexibility for fine-grained sub-class scenarios.
Related Work & Insights¶
- vs. CLD/SimiS: Contrastive learning methods operate at the representation level but do not explicitly model class-conditional distributions; SCDL's proxy distributions provide more structured constraints.
- vs. GA-MagicNet/GA-CPS: Gradient aggregation methods reweight gradients to mitigate imbalance; SCDL complementarily reshapes feature structure at the distributional level.
- vs. DHC: Dynamic hybrid consistency debiases at the pseudo-label level, whereas SCDL constrains directly at the embedding level; the two approaches are orthogonal and can be combined.
Rating¶
- Novelty: ⭐⭐⭐⭐ The distribution-level debiasing perspective is novel; the combination of bidirectional alignment and semantic anchors is elegant.
- Experimental Thoroughness: ⭐⭐⭐⭐ Comprehensive validation across four baselines and two datasets, including per-class analysis.
- Writing Quality: ⭐⭐⭐⭐ Problem analysis is clear; the three-paradigm comparison figure is intuitive.
- Value: ⭐⭐⭐⭐ The plug-and-play design is practical and deployment-friendly.