Semantic Class Distribution Learning for Debiasing Semi-Supervised Medical Image Segmentation¶

Conference: CVPR2025
arXiv: 2603.05202
Code: GitHub
Area: Medical Image
Keywords: semi-supervised segmentation, class imbalance, class-conditional distribution, proxy learning, medical image segmentation

TL;DR¶

Proposes the plug-and-play Semantic Class Distribution Learning (SCDL) module, which learns class-conditional proxy distributions and performs Class-conditional Distribution Bi-directional Alignment (CDBA) along with Semantic Anchor Constraint (SAC). This explicitly reshapes the class-conditional feature structure in the embedding space to alleviate supervision bias and representation imbalance in semi-supervised medical image segmentation.

Background & Motivation¶

Limitations of Prior Work¶

Limitations of Prior Work: The class imbalance problem in Semi-Supervised Medical Image Segmentation (SSMIS): Medical segmentation data naturally exhibits a pixel-level long-tailed distribution, where large organs occupy most pixels and dominate gradient updates, leading to insufficient training for small organs.

Background¶

Background: Dual challenges of class imbalance + semi-supervised learning: 1. Supervision signal bias: Large organs occupy more pixels, causing gradients to bias towards head classes; self-generated pseudo-labels and consistency constraints further reinforce the learning of head classes. 2. Representation-level imbalance: Existing methods (reweighting, output calibration) only operate on the loss or output layer, lacking direct constraints on the class-conditional feature distribution, which leads to tail-class features drifting into head-class regions.

Key Challenge¶

Key Challenge: Key Insight: Existing methods utilize unlabeled data primarily for local consistency mapping, rarely using them to explicitly correct the skewed class-conditional feature distribution.

Method¶

SCDL Overall Architecture¶

A plug-and-play module that can be integrated into existing segmentation networks.
Fully exploits labeled data to provide semantic supervision, while guiding unlabeled data to participate in learning at the distribution level.

1. Class-conditional Distribution Bi-directional Alignment (CDBA)¶

Class distribution modeling: Each semantic class \(c\) is represented by a learnable proxy distribution \(\mathcal{N}(\mu_c, \operatorname{diag}(\sigma_c^2))\), where both the mean and variance are trainable parameters.

Soft assignment: Computes the soft assignment probability \(P(c|z)\) of each token embedding to all class proxies using cosine similarity and softmax, allowing each embedding to be associated with multiple classes.

Bi-directional alignment: - E2P (Embedding to Proxy): Weighted cosine distance loss, encouraging embeddings to be close to their soft-assigned proxy distributions. - P2E (Proxy to Embedding): Encourages each proxy to have high similarity to its assigned embeddings and low similarity to other embeddings, enhancing proxy discriminability.

Proxy sampling and feature enrichment: - Distribution-weighted prior: Samples \(S\) samples from the proxy distribution and weights the means of each class using the sampling similarity. - Center similarity prior: Directly weights the means of each class using cosine similarity (ignoring variance). - Token sampling prior: Performs local perturbation sampling for each token. - The three priors are concatenated, projected, and injected into each decoder layer.

2. Semantic Anchor Constraint (SAC)¶

Anchor construction: Extracts class-specific regions using ground-truth masks, obtains class-aware embeddings through the shared encoder, and averages them to serve as semantic anchors.
Anchor alignment loss: Utilizes cosine similarity to constrain each class proxy \(\mu_c\) to align with its corresponding semantic anchor.
Anchors are detached during backpropagation, updating only the proxies without affecting the encoder.
Ensures that the proxy distributions capture the true class semantics, preventing them from being biased by class frequency imbalances.

Key Experimental Results¶

Datasets¶

Synapse: 30 CT scans, 13 organ classes, 20% labeled
AMOS: 360 CT scans, 15 organ classes, 5% labeled

Synapse (20% labeled)¶

Main Results¶

Method	DSC↑	ASD↓
VNet (fully)	68.49	6.08
GA-CPS	66.29	5.44
SCDL-GA-CPS	67.50	3.32
GA-MagicNet	66.00	3.42
SCDL-GA-MagicNet	66.75	3.65

The ASD of SCDL-GA-CPS decreases by 2.12 (\(5.44 \rightarrow 3.32\)), significantly improving boundary quality.

AMOS (5% labeled)¶

Ablation Study¶

Method	DSC↑	ASD↓
GenericSSL	35.73	45.82
SCDL-GenericSSL	47.35 (+11.62)	22.84
DHC	40.11	40.65
SCDL-DHC	49.28 (+9.17)	17.47 (-23.18)
GA-MagicNet	59.15	8.66
SCDL-GA-MagicNet	62.16 (+3.01)	5.65

The improvement is more significant on the highly label-scarce AMOS dataset: SCDL-GenericSSL improves DSC by +11.62%, and SCDL-DHC dramatically drops ASD from 40.65 to 17.47.

Class-wise Dice Analysis¶

The contribution is particularly significant for tail classes: e.g., on Synapse, SCDL-GA-CPS improves the pancreas (PA) from \(45.5 \rightarrow 49.4\), and the right adrenal gland (RAG) from \(44.7 \rightarrow 49.2\).
Classes with a Dice score of 0 on AMOS (such as RAG/LAG in GenericSSL and DHC) obtain non-zero Dice scores after incorporating SCDL.

Highlights & Insights¶

Plug-and-play design: SCDL can be seamlessly integrated into various semi-supervised segmentation baselines (GenericSSL, DHC, GA-MagicNet, GA-CPS), consistently yielding improvements.
Representation-level debiasing: Unlike loss/output-layer regularizations, it directly learns the class-conditional distribution structure in the embedding space, fundamentally addressing representation bias.
Bi-directional alignment mechanism: E2P + P2E complement each other, pushing embeddings close to the correct proxies while making the proxies discriminative.
Semantic anchor guidance: Employs labeled data to provide reliable semantic supervision for proxy learning, preventing proxy divergence.
Significant gain in tail classes: Under extremely scarce labor annotations (5%), it enables previously completely failed classes to achieve meaningful segmentations.

Limitations & Future Work¶

The proxy distribution is assumed to be a diagonal Gaussian, which may not capture complex multi-modal class-conditional distributions.
The semantic anchors in SAC rely on labeled data, so anchor quality may be unstable when annotations are extremely scarce.
The concatenation and injection of the three priors (distribution-weighted, center similarity, and token sampling) is relatively naive and lacks adaptive fusion.
Evaluated only on CT datasets; the applicability to other modalities such as MRI has not been explored.
Sensitivity analysis on hyperparameters, such as the proxy sampling number \(S\) and projection dimension, is not fully presented.

Rating¶

Novelty: 4/5 — Class-conditional proxy distribution modeling + bi-directional alignment is a novel and insightful design in semi-supervised segmentation.
Experimental Thoroughness: 4/5 — Evaluated on two datasets, integrated with multiple baselines, and class-wise analysis provided, though a detailed breakdown of the ablation studies is partially lacking.
Writing Quality: 4/5 — Clear motivation, rigorous method description, and intuitive illustrations.
Value: 4/5 — The plug-and-play module is of direct practical value for real-world medical segmentation scenarios.