SemiTooth: a Generalizable Semi-supervised Framework for Multi-Source Tooth Segmentation¶
Conference: CVPR 2026 arXiv: 2603.11616 Code: N/A Area: Medical Imaging Keywords: tooth segmentation, CBCT, semi-supervised learning, multi-source data, multi-teacher multi-student
TL;DR¶
This paper proposes SemiTooth, a framework that addresses annotation scarcity and cross-source domain discrepancy in multi-source CBCT tooth segmentation via a multi-teacher multi-student architecture and Strict Weighted Confidence (SWC) constraints. It also introduces MS3Toothset, the first multi-source semi-supervised tooth segmentation dataset.
Background & Motivation¶
CBCT tooth structure segmentation is central to intelligent oral diagnosis, yet faces two major challenges:
Annotation scarcity: Voxel-level annotation is costly and time-consuming, leaving large quantities of de-identified CBCT data unutilized.
Multi-source domain discrepancy: CBCT data from different institutions and devices exhibit significant differences in density distribution, intensity distribution, and feature space (verified via Kernel Density Estimation and t-SNE visualization), making cross-source generalization difficult.
Existing semi-supervised medical segmentation methods (Mean Teacher, UA-MT, MCF, etc.) are primarily designed for single-source data and lack cross-source knowledge transfer capability. Multi-source methods such as ASDA and Dual-Teacher either require complex networks or strong supervision, or have not been validated on multi-source CBCT tooth data.
Method¶
Overall Architecture¶
SemiTooth adopts a three-student + two-teacher multi-branch architecture. Data are organized into three subsets: - Main (labeled data from the primary source) - Other (unlabeled data from other sources) - Mixed (unlabeled data with distribution similar to the primary source, selected via Wasserstein distance)
Each subset is processed by a corresponding student network. Two teachers supervise the Mixed and Other students respectively, updated via EMA:
Key Designs¶
-
Multi-Teacher Multi-Student Architecture (SemiTooth):
- Compared to Mean Teacher (single teacher–single student): SemiTooth assigns dedicated students to data from different sources, with teachers providing cross-source knowledge guidance.
- Compared to Co-training (multi-student with shared weights, no teacher): SemiTooth provides stable pseudo-labels via EMA teachers.
- The Mixed subset serves as an inter-source bridge, containing unlabeled samples with distributions similar to the primary source, alleviating training instability from direct cross-source learning.
- Student networks share similar architectures to facilitate knowledge transfer while maintaining sufficient diversity.
-
Strict Weighted Confidence Constraint (SWC):
Addresses the degradation of consistency regularization caused by noise introduced by CBCT heterogeneity. SWC combines region-level gating with voxel-level weighting:
- **Region-level gating**: Samples are uniformly divided into non-overlapping cubic regions $\{r\}$. Region confidence is computed as $c(r) = \mathbb{E}_{i \in r}[\max_c P^T_{i,c}]$; regions below threshold $\tau=0.9$ are deemed unreliable and discarded.
- **Voxel-level weighting**: Within retained regions, voxel-level confidence $c_i = \max_c P^T_{i,c}$ is used to weight the teacher–student alignment:
$\mathcal{SWC}(P^S, P^T) = \mathbb{E}_{r \in \mathcal{R}_\tau}\left[\mathbb{E}_{i \in r}\left[c_i \cdot \mathcal{A}(P^S_i, P^T_i)\right]\right]$
- This design is particularly suited to 3D CBCT data, as region-level filtering exploits the spatial continuity inherent in volumetric data.
-
Multi-Source Dataset MS3Toothset:
- Collected from three sources: ShanghaiTech (public, labeled), PKU-SS, and AFMC (private, unlabeled).
- After preprocessing and filtering: 98 labeled samples (20 for testing) and 438 unlabeled samples.
- The first comprehensive dataset for multi-source semi-supervised tooth segmentation.
Loss & Training¶
The total loss combines a supervised loss and two SWC consistency losses:
- \(\mathcal{L}_{sup} = \text{CE}(P^S(x^l), y)\): supervised loss on labeled data from the primary source.
- \(\mathcal{L}_{cons}^u\), \(\mathcal{L}_{cons}^h\): SWC losses for the Other and Mixed sources, respectively.
- Backbone: V-Net; optimizer: Adam; learning rate: 0.0001; training: 300 epochs.
Key Experimental Results¶
Main Results¶
| Method | mIoU | Dice | Recall | Acc |
|---|---|---|---|---|
| V-Net (fully supervised baseline) | 61.36 | 73.65 | 70.77 | 66.75 |
| Mean Teacher | 67.69 | 78.72 | 78.06 | 73.68 |
| UA-MT | 68.37 | 79.18 | 80.42 | 76.17 |
| ASDA | 73.75 | 83.63 | 80.93 | 78.79 |
| CMT | 76.14 | 85.07 | 87.14 | 84.32 |
| Uni-HSSL | 75.76 | 85.42 | 84.26 | 81.88 |
| SemiTooth | 76.67 | 85.69 | 88.66 | 86.44 |
Ablation Study¶
| Exp | Configuration | mIoU | Dice | Recall | Acc |
|---|---|---|---|---|---|
| 1 | V-Net | 61.36 | 73.65 | 70.77 | 66.75 |
| 2 | + Mean Teacher | 67.69 | 78.72 | 78.06 | 73.68 |
| 3 | + SWC (w/o SemiTooth) | 69.94 | 80.29 | 79.67 | 75.34 |
| 4 | + SemiTooth (w/o SWC) | 75.37 | 84.56 | 83.07 | 80.48 |
| 5 | + SemiTooth + SWC | 76.67 | 85.69 | 88.66 | 86.44 |
Key Findings¶
- The multi-branch SemiTooth architecture contributes the most (Exp 2→4: mIoU +7.68), demonstrating the importance of source-aware learning for multi-source data.
- SWC yields consistent improvements under both single-teacher (Exp 2→3: +2.25 mIoU) and multi-teacher (Exp 4→5: +1.30 mIoU) settings.
- SemiTooth's advantage is most pronounced on the Recall metric (88.66 vs. second-best 87.14), which is clinically meaningful for reducing missed detection rates.
- t-SNE visualizations confirm that SemiTooth effectively reduces feature distribution gaps across sources.
Highlights & Insights¶
- Dataset contribution: MS3Toothset fills a gap in multi-source semi-supervised tooth segmentation benchmarks.
- The two-stage region-level + voxel-level filtering in SWC is intuitively grounded and well-suited to the spatial continuity of 3D medical data.
- Using Wasserstein distance to select distributionally similar samples for the Mixed subset as an inter-source bridge is a simple yet effective design choice.
- The ablation study is well-structured, clearly illustrating both the individual and joint contributions of each component.
Limitations & Future Work¶
- MS3Toothset is relatively small in scale (98 labeled + 438 unlabeled samples).
- Validation is limited to the authors' own dataset; generalizability to public standard benchmarks remains unknown.
- The teacher count and subset partitioning strategy depend on the choice of Wasserstein distance threshold, with no sensitivity analysis provided.
- Only V-Net is used as the backbone; effectiveness with stronger backbones (e.g., nnUNet, Swin UNETR) is not evaluated.
- Ablation of the region size and threshold \(\tau\) in SWC is insufficient.
Related Work & Insights¶
- SemiTooth outperforms recent multi-source methods such as CMT (ACM MM 2024) and Uni-HSSL (CVPR 2025).
- The multi-teacher multi-student paradigm is generalizable to other multi-domain semi-supervised scenarios (e.g., multi-center CT/MRI analysis).
- The SWC constraint can be readily integrated into any Mean Teacher-based framework.
Rating¶
- Novelty: ⭐⭐⭐ The multi-teacher multi-student and confidence-weighting ideas are not entirely new, but their systematic integration for tooth segmentation is valuable.
- Experimental Thoroughness: ⭐⭐⭐ Ablations are complete, but validation is limited to a single in-house dataset without external benchmarking.
- Writing Quality: ⭐⭐⭐ The framework is described clearly, though the paper is short (ICASSP length) and lacks detail in places.
- Value: ⭐⭐⭐ The dataset contribution has clinical value; generalizability of the method remains to be verified.