Developing Foundation Models for Universal Segmentation from 3D Whole-Body Positron Emission Tomography¶

Conference: CVPR2025
arXiv: 2603.11627
Code: To be confirmed
Area: Medical Imaging
Keywords: PET segmentation, foundation models, universal segmentation, prompt-driven, 3D medical imaging

TL;DR¶

Constructed the largest PET segmentation dataset PETWB-Seg11K (11,041 whole-body PET cases + 59,831 segmentation masks) and proposed SegAnyPET—a foundation model for universal PET segmentation based on a 3D architecture + prompt engineering. It demonstrates strong zero-shot generalization capabilities across multi-center, multi-tracer, and multi-disease scenarios.

Background & Motivation¶

PET is a crucial imaging modality in nuclear medicine, revealing metabolic processes in the body through radioactive tracer distribution, and is irreplaceable in oncology and neurology.
Accurate organ/lesion segmentation in PET is essential for quantitative analysis, but PET inherently lacks anatomical boundary contrast, making manual annotation time-consuming and inconsistent.
Breakthroughs in deep learning-based segmentation have mainly focused on CT/MRI, while PET lags far behind due to the high cost of data acquisition and annotation.
Existing public PET segmentation datasets are limited to specific tumor tasks and lack whole-body multi-organ coverage; task-specific models fail to generalize to new targets.
The adaptation of foundation models like SAM to the medical field primarily targets CT/MRI/microscopic images, ignoring the diffuse metabolic signals and low-resolution characteristics of PET, leading to poor direct migration performance.
Design Motivation: The PET field urgently requires a dedicated large-scale dataset and a foundation model to achieve universal, promptable organ/lesion segmentation.

Method¶

Dataset: PETWB-Seg11K¶

A total of 11,041 whole-body 3D PET scans + 59,831 segmentation masks, curated from 2 public datasets (AutoPET, UDPET) + 3 private cohorts.
Covers multi-center, multi-scanner, and multi-disease types, exhibiting real-world variability in scanner manufacturers, acquisition protocols, slice thicknesses, etc.
Validation set design: internal validation (unseen cases from the same centers) + external validation (independent centers + unseen cancer types + different tracers like PSMA-PET).

Model Architecture: SegAnyPET¶

Adopts a SAM-style universal promptable segmentation design, extended to a fully 3D architecture:

Image Encoder: Extract discrete 3D feature embeddings from the input PET volume.
Prompt Encoder: Transform user inputs (sparse prompts like clicks, dense prompts like coarse masks) into prompt embeddings via fixed positional encoding + adaptive embedding layers.
Mask Decoder: Fuse image and prompt features, upsample, and generate the final segmentation output via an MLP.

Key Designs¶

Point prompts: Fast and efficient 3D interaction; mask prompts: Support iterative refinement, enabling a human-in-the-loop clinical workflow.
Two variants: SegAnyPET (general-purpose, trained on the full dataset) and SegAnyPET-Lesion (lesion-specialized, fine-tuned on lesion data).
Essential difference from task-specific models: No need for re-labeling and re-training for new targets; region of interest is dynamically specified via prompts.

Loss & Training¶

The paper does not provide detailed loss formulations, but follows the standard segmentation losses of the SAM series (Dice + BCE).
Trained on the large-scale heterogeneous PETWB-Seg11K dataset, covering multi-manufacturer, multi-protocol, and multi-disease distributions.
SegAnyPET-Lesion is obtained by fine-tuning SegAnyPET on lesion-centric data, improving the sensitivity and boundary accuracy of small, heterogeneous lesions.
All task-specific baselines are implemented within the standardized nnUNet framework, using its automatic preprocessing, resampling, and augmentation pipelines to ensure a fair comparison.

Key Experimental Results¶

Comparison with Task-Specific Models (Internal Validation, Organ Segmentation)¶

Without task-specific training, SegAnyPET achieves comparable or superior performance compared to dedicated models such as nnUNet, STUNet, SwinUNETR, and SegResNet.
nnUNet remains the strongest baseline, but a single SegAnyPET model can replace multiple task-specific networks.

Comparison with Segmentation Foundation Models¶

Universal foundation models such as SAM-Med3D, SegVol, SAT, nnInteractive, and VISTA3D perform poorly on PET.
Text prompt models (e.g., SAT) yield DSC close to zero on PET organ segmentation—cross-modal alignment is severely overfitted to CT anatomical structures.
Point prompt models (such as SAM-Med3D) are slightly better but still insufficient.
SegAnyPET consistently outperforms state-of-the-art (SOTA) foundation models across all evaluated tasks.

Generalization Capability (External Validation)¶

Unseen cancer types / independent centers: Robust generalization.
PET/MRI (different attenuation correction physics): Maintains reliable segmentation.
PSMA-PET (entirely new tracer): Successful cross-tracer generalization.

Clinical Utility¶

Annotation efficiency: Compared to purely manual delineation, the SegAnyPET-assisted interactive workflow saves 82.37% and 82.95% of annotation time for two experts, respectively.
Whole-body metabolic covariance network: Segmentation results can be directly used for downstream inter-organ metabolic network analysis, showing high biological fidelity.

Highlights & Insights¶

First PET-dedicated foundation model: Fills the gap in segmentation foundation models for functional imaging.
Largest PET segmentation dataset: Over 11K whole-body PET scans, far exceeding existing datasets in both scale and diversity.
Strong zero-shot generalization: Effective across different centers, tracers, and disease types.
Clinical practicality: >82% annotation time savings + human-in-the-loop iterative refinement design.
Scalability: Incorporates new organs/lesions into the analysis pipeline via prompting without requiring retraining.
Two-variant design: General-purpose SegAnyPET offers broad coverage, while lesion-specific SegAnyPET-Lesion is optimized for small, heterogeneous lesions, allowing users to choose according to their needs.

Limitations & Future Work¶

Data for rare diseases, uncommon tracers, and specific anatomical regions remain insufficient.
There is still room for improvement in quantitative metrics for lesion segmentation, especially for small and disseminated lesions.
The point prompt paradigm has limited efficiency for diffuse lesions (e.g., multiple lymphoma lesions), requiring clicks on individual lesions.
Lack of text prompt support—multimodal vision-language PET foundation models represent a future direction.
Private data accounts for a large proportion, limiting external reproducibility.
Current evaluations rely mainly on volumetric metrics like DSC, lacking boundary accuracy assessments such as surface distance.

CT/MRI segmentation foundation models: SAM \(\rightarrow\) MedSAM \(\rightarrow\) SAM-Med3D / SegVol / VISTA3D; however, these are trained on structural imaging and generalized poorly to PET.
PET segmentation datasets: Public datasets like AutoPET and UDPET have limited scale and focus on single tasks.
Task-specific PET segmentation: Frameworks like nnUNet remain strong baselines but are limited by a fixed label space.
Universal segmentation: The success of SAM in natural images inspired adaptations in the medical domain, but the functional signal characteristics of PET cause direct transfer to fail.
PET quantitative analysis: Traditional workflows rely on manual ROI delineation, which suffers from poor inter-observer consistency, highlighting an urgent need for automation.

Rating¶

Novelty: ⭐⭐⭐⭐⭐ (First-of-its-kind PET-dedicated foundation model, representing a milestone in dataset scale)
Experimental Thoroughness: ⭐⭐⭐⭐⭐ (Internal and external validation + comparisons with multiple baselines + ablation studies + downstream applications + clinical annotation experiments)
Writing Quality: ⭐⭐⭐⭐ (Clear and well-structured, with rich figures and tables, although some content is repetitive)
Value: ⭐⭐⭐⭐⭐ (Significant driving force for the field of PET imaging AI)