Developing Foundation Models for Universal Segmentation from 3D Whole-Body Positron Emission Tomography¶
Conference: CVPR2025
arXiv: 2603.11627
Code: To be confirmed
Area: Medical Imaging
Keywords: PET segmentation, foundation models, universal segmentation, prompt-driven, 3D medical imaging
TL;DR¶
Constructed the largest PET segmentation dataset PETWB-Seg11K (11,041 whole-body PET cases + 59,831 segmentation masks) and proposed SegAnyPET—a foundation model for universal PET segmentation based on a 3D architecture + prompt engineering. It demonstrates strong zero-shot generalization capabilities across multi-center, multi-tracer, and multi-disease scenarios.
Background & Motivation¶
- PET is a crucial imaging modality in nuclear medicine, revealing metabolic processes in the body through radioactive tracer distribution, and is irreplaceable in oncology and neurology.
- Accurate organ/lesion segmentation in PET is essential for quantitative analysis, but PET inherently lacks anatomical boundary contrast, making manual annotation time-consuming and inconsistent.
- Breakthroughs in deep learning-based segmentation have mainly focused on CT/MRI, while PET lags far behind due to the high cost of data acquisition and annotation.
- Existing public PET segmentation datasets are limited to specific tumor tasks and lack whole-body multi-organ coverage; task-specific models fail to generalize to new targets.
- The adaptation of foundation models like SAM to the medical field primarily targets CT/MRI/microscopic images, ignoring the diffuse metabolic signals and low-resolution characteristics of PET, leading to poor direct migration performance.
- Design Motivation: The PET field urgently requires a dedicated large-scale dataset and a foundation model to achieve universal, promptable organ/lesion segmentation.
Method¶
Dataset: PETWB-Seg11K¶
- A total of 11,041 whole-body 3D PET scans + 59,831 segmentation masks, curated from 2 public datasets (AutoPET, UDPET) + 3 private cohorts.
- Covers multi-center, multi-scanner, and multi-disease types, exhibiting real-world variability in scanner manufacturers, acquisition protocols, slice thicknesses, etc.
- Validation set design: internal validation (unseen cases from the same centers) + external validation (independent centers + unseen cancer types + different tracers like PSMA-PET).
Model Architecture: SegAnyPET¶
Adopts a SAM-style universal promptable segmentation design, extended to a fully 3D architecture:
- Image Encoder: Extract discrete 3D feature embeddings from the input PET volume.
- Prompt Encoder: Transform user inputs (sparse prompts like clicks, dense prompts like coarse masks) into prompt embeddings via fixed positional encoding + adaptive embedding layers.
- Mask Decoder: Fuse image and prompt features, upsample, and generate the final segmentation output via an MLP.
Key Designs¶
- Point prompts: Fast and efficient 3D interaction; mask prompts: Support iterative refinement, enabling a human-in-the-loop clinical workflow.
- Two variants: SegAnyPET (general-purpose, trained on the full dataset) and SegAnyPET-Lesion (lesion-specialized, fine-tuned on lesion data).
- Essential difference from task-specific models: No need for re-labeling and re-training for new targets; region of interest is dynamically specified via prompts.
Loss & Training¶
- The paper does not provide detailed loss formulations, but follows the standard segmentation losses of the SAM series (Dice + BCE).
- Trained on the large-scale heterogeneous PETWB-Seg11K dataset, covering multi-manufacturer, multi-protocol, and multi-disease distributions.
- SegAnyPET-Lesion is obtained by fine-tuning SegAnyPET on lesion-centric data, improving the sensitivity and boundary accuracy of small, heterogeneous lesions.
- All task-specific baselines are implemented within the standardized nnUNet framework, using its automatic preprocessing, resampling, and augmentation pipelines to ensure a fair comparison.
Key Experimental Results¶
Comparison with Task-Specific Models (Internal Validation, Organ Segmentation)¶
- Without task-specific training, SegAnyPET achieves comparable or superior performance compared to dedicated models such as nnUNet, STUNet, SwinUNETR, and SegResNet.
- nnUNet remains the strongest baseline, but a single SegAnyPET model can replace multiple task-specific networks.
Comparison with Segmentation Foundation Models¶
- Universal foundation models such as SAM-Med3D, SegVol, SAT, nnInteractive, and VISTA3D perform poorly on PET.
- Text prompt models (e.g., SAT) yield DSC close to zero on PET organ segmentation—cross-modal alignment is severely overfitted to CT anatomical structures.
- Point prompt models (such as SAM-Med3D) are slightly better but still insufficient.
- SegAnyPET consistently outperforms state-of-the-art (SOTA) foundation models across all evaluated tasks.
Generalization Capability (External Validation)¶
- Unseen cancer types / independent centers: Robust generalization.
- PET/MRI (different attenuation correction physics): Maintains reliable segmentation.
- PSMA-PET (entirely new tracer): Successful cross-tracer generalization.
Clinical Utility¶
- Annotation efficiency: Compared to purely manual delineation, the SegAnyPET-assisted interactive workflow saves 82.37% and 82.95% of annotation time for two experts, respectively.
- Whole-body metabolic covariance network: Segmentation results can be directly used for downstream inter-organ metabolic network analysis, showing high biological fidelity.
Highlights & Insights¶
- First PET-dedicated foundation model: Fills the gap in segmentation foundation models for functional imaging.
- Largest PET segmentation dataset: Over 11K whole-body PET scans, far exceeding existing datasets in both scale and diversity.
- Strong zero-shot generalization: Effective across different centers, tracers, and disease types.
- Clinical practicality: >82% annotation time savings + human-in-the-loop iterative refinement design.
- Scalability: Incorporates new organs/lesions into the analysis pipeline via prompting without requiring retraining.
- Two-variant design: General-purpose SegAnyPET offers broad coverage, while lesion-specific SegAnyPET-Lesion is optimized for small, heterogeneous lesions, allowing users to choose according to their needs.
Limitations & Future Work¶
- Data for rare diseases, uncommon tracers, and specific anatomical regions remain insufficient.
- There is still room for improvement in quantitative metrics for lesion segmentation, especially for small and disseminated lesions.
- The point prompt paradigm has limited efficiency for diffuse lesions (e.g., multiple lymphoma lesions), requiring clicks on individual lesions.
- Lack of text prompt support—multimodal vision-language PET foundation models represent a future direction.
- Private data accounts for a large proportion, limiting external reproducibility.
- Current evaluations rely mainly on volumetric metrics like DSC, lacking boundary accuracy assessments such as surface distance.
Related Work & Insights¶
- CT/MRI segmentation foundation models: SAM \(\rightarrow\) MedSAM \(\rightarrow\) SAM-Med3D / SegVol / VISTA3D; however, these are trained on structural imaging and generalized poorly to PET.
- PET segmentation datasets: Public datasets like AutoPET and UDPET have limited scale and focus on single tasks.
- Task-specific PET segmentation: Frameworks like nnUNet remain strong baselines but are limited by a fixed label space.
- Universal segmentation: The success of SAM in natural images inspired adaptations in the medical domain, but the functional signal characteristics of PET cause direct transfer to fail.
- PET quantitative analysis: Traditional workflows rely on manual ROI delineation, which suffers from poor inter-observer consistency, highlighting an urgent need for automation.
Rating¶
- Novelty: ⭐⭐⭐⭐⭐ (First-of-its-kind PET-dedicated foundation model, representing a milestone in dataset scale)
- Experimental Thoroughness: ⭐⭐⭐⭐⭐ (Internal and external validation + comparisons with multiple baselines + ablation studies + downstream applications + clinical annotation experiments)
- Writing Quality: ⭐⭐⭐⭐ (Clear and well-structured, with rich figures and tables, although some content is repetitive)
- Value: ⭐⭐⭐⭐⭐ (Significant driving force for the field of PET imaging AI)