Unleashing the Power of Prompt-driven Nucleus Instance Segmentation¶

Conference: ECCV 2024
arXiv: 2311.15939
Code: https://github.com/windygoo/PromptNucSeg
Area: Medical Image Segmentation / Nucleus Instance Segmentation
Keywords: Nucleus Instance Segmentation, SAM, Prompt Learning, Negative Prompts, Histopathology

TL;DR¶

The PromptNucSeg framework is proposed, which automatically generates nucleus center point prompts by training a prompter and fine-tunes SAM for nucleus-by-nucleus segmentation. It solves the overlapping nucleus segmentation problem by introducing neighboring nuclei as negative prompts, achieving SOTA performance on three benchmarks without complex post-processing.

Background & Motivation¶

Nucleus instance segmentation is a fundamental task in pathology image analysis, crucial for cancer diagnosis and treatment planning. Current mainstream bottom-up methods first regress nucleus proxy maps (e.g., distance maps, direction maps) and then group pixels into instances via complex post-processing (e.g., watershed). The core issue of this paradigm is: post-processing requires meticulous parameter tuning and is sensitive to noise, hindering practical applications.

SAM (Segment Anything Model) has attracted widespread attention due to its powerful generalization capability and promptable nature, but its potential in nucleus segmentation has not been fully explored. Existing works either only reuse the SAM encoder to build better regression models (CellViT), which still require post-processing, or perform semantic segmentation in a one-prompt-all-nuclei manner (SPPNet), which lacks instance details and relies on manual prompts.

Core Idea: Adapt a one-prompt-one-nucleus paradigm, training a prompter to automatically generate a unique point prompt for each nucleus, and fine-tuning SAM to output the corresponding mask, completely bypassing post-processing. Neighboring nuclei are innovatively introduced as negative prompts to address the over-segmentation of overlapping nuclei.

Method¶

Overall Architecture¶

PromptNucSeg consists of two independently trained models: - Nucleus Prompter: Automatically predicts the center point coordinates and category of each nucleus from the input image. - Segmentor (Fine-tuned SAM): Receives the point prompts generated by the prompter and outputs the corresponding segmentation mask for each nucleus.

Inference pipeline: Input image \(\rightarrow\) Prompter predicts center point prompts for all nuclei \(\rightarrow\) Auxiliary branch filters false positive prompts \(\rightarrow\) SAM generates masks nucleus-by-nucleus \(\rightarrow\) NMS de-duplication \(\rightarrow\) Final instance segmentation results.

Key Designs¶

Adapt SAM to Nucleus Segmentation:
- Function: Fine-tune SAM using nucleus instance segmentation data to adapt it to the medical image domain.
- Mechanism: For each image-label pair, randomly select \(Z\) nucleus instances, randomly sample a positive point prompt \(p_z\) from the foreground area of each instance, and fine-tune SAM to predict the mask of the nucleus.
- Forward pass: \(\widetilde{\mathcal{O}}_z = \mathcal{M}(\mathcal{F}(x), \mathcal{P}(\{p_z\}), [\text{mask}], [\text{IoU}])\)
- Loss: \(\mathcal{L}_{sam} = \omega \cdot \text{FL}(\widetilde{\mathcal{O}}_z, \mathcal{O}_z) + \text{DL}(\widetilde{\mathcal{O}}_z, \mathcal{O}_z) + \text{MSE}(\widetilde{\nu}, \nu)\), consisting of focal loss, dice loss, and IoU regression loss.
- Freeze the prompt encoder, update the image encoder and mask decoder.
- Reduce the input resolution from 1024×1024 to 256×256, significantly reducing GPU memory.
Nucleus Prompter (Automatic prompt generation):
- Function: Automatically predict the center coordinates and category of each nucleus, replacing manual prompts.
- Mechanism: Inspired by P2PNet, uniform anchor points (spaced by \(\lambda\) pixels) are placed on the input image. Multi-scale features are extracted via a feature pyramid, and an MLP predicts the offset \(\delta_i\) and classification logit \(q_i \in \mathbb{R}^{C+1}\) for each anchor.
- Matching strategy: A one-to-one mapping from anchors to ground-truth nucleus centers is established using bipartite maximum weight matching (Hungarian algorithm), where the weight is defined as \(w_{i,j} = q_i(c_j) - \alpha \|\hat{a}_i - b_j\|_2\), taking both classification confidence and spatial distance into account.
- Loss: \(\mathcal{L}_{prompter} = \mathcal{L}_{reg} + \mathcal{L}_{cls} + \mathcal{L}_{aux}\)
  - Classification loss: \(\mathcal{L}_{cls} = -\frac{1}{M}(\sum_{i=1}^N \log q_{\sigma(i)}(c_i) + \beta \sum_{a_i \in \mathcal{A}'} \log q_i(\varnothing))\)
  - Regression loss: \(\mathcal{L}_{reg} = \frac{\gamma}{N} \sum_{i=1}^N \|\hat{a}_{\sigma(i)} - b_i\|_2\)
- Design Motivation: Point prompts are easier to locate than bounding boxes and can separate touching targets more accurately.
Auxiliary Task and Mask-aided Prompt Filtering:
- Function: Introduce a nucleus region segmentation auxiliary task to enhance the prompter's awareness of foreground regions.
- Mechanism: Add a simple mask head (Conv-BN-ReLU-Conv) to the prompter to predict the nucleus probability map \(\hat{S}\) from high-resolution feature \(P_2\), supervised by focal loss.
- Filter false-positive prompts during inference using the predicted nucleus probability map: retain only prompts with a probability > 0.5.
- Design Motivation: Prompter training only involves point annotations and categories. The auxiliary segmentation task introduces rich information such as nucleus size and morphology.
Negative Prompts to Solve Overlapping Nuclei Segmentation:
- Function: Feed neighboring nuclei as negative prompts into SAM to suppress over-segmentation in overlapping regions.
- Problem Analysis: Segmenting two overlapping nuclei using one positive prompt each tends to generate over-segmented masks due to blurry boundaries.
- Key Finding: Applying negative prompts only during inference is ineffective, because fine-tuning with only positive prompts leads to "catastrophic forgetting" of negative prompts by the model.
- Solution: Introduce negative prompts during the fine-tuning stage—for each target nucleus, jointly input its positive prompt \(p_z\) and the \(K\) nearest neighbor points \(\{n_{z,k}\}_{k=1}^K\) as negative prompts: \(\widetilde{\mathcal{O}}_z = \mathcal{M}(\mathcal{F}(x), \mathcal{P}(\{p_z\} \cup \{n_{z,k}\}_{k=1}^K), [\text{mask}], [\text{IoU}])\).
- Similarly use the \(K\) nearest points predicted by the prompter as negative prompts during inference.

Loss & Training¶

The Prompter and Segmentor are trained independently without end-to-end optimization.
Prompter loss: \(\mathcal{L}_{prompter} = \mathcal{L}_{reg} + \mathcal{L}_{cls} + \mathcal{L}_{aux}\)
Segmentor loss: \(\mathcal{L}_{sam} = \omega \cdot \text{FL} + \text{DL} + \text{MSE}\)
Sliding window inference (256×256 tile) is used with overlap area \(\epsilon\) to ensure nucleus integrity, followed by NMS de-duplication.

Key Experimental Results¶

Main Results¶

PanNuke Dataset (Most challenging, 19 tissue types)

Method	bPQ (avg)	mPQ (avg)
HoVer-Net	0.6596	0.4629
CPP-Net	0.6798	0.4847
PointNu-Net	0.6808	0.4957
CellViT-H	0.6793	0.4980
PromptNucSeg-H	0.6924	0.5123

Overcomes prev. SOTA by bPQ +1.1, mPQ +1.4.

Kumar & CPM-17 Dataset

Method	Kumar AJI	Kumar PQ	CPM-17 AJI	CPM-17 PQ
PointNu-Net	0.606	0.603	0.712	0.706
CellViT-H	-	-	-	-
PromptNucSeg-H	0.622	0.627	0.740	0.733

On CPM-17, AJI outperforms prev. SOTA by +1.9, PQ outperforms by +2.8.

Ablation Study¶

Module Effectiveness (CPM-17 Dataset)

Configuration (FT/AUX/MAPF/NP)	AJI	PQ	Description
Vanilla SAM	0.319	0.223	Original SAM used directly
FT only	0.728	0.723	Fine-tuning only
FT + AUX	0.734	0.727	+Auxiliary segmentation task
FT + AUX + MAPF	0.737	0.731	+Mask filtering
FT + AUX + MAPF + NP	0.740	0.733	Full model

Efficiency Comparison (PanNuke)

Method	Params (M)	MACs (G)	FPS	mPQ
HoVer-Net	37.6	150.0	7	0.4629
CPP-Net	122.8	264.4	14	0.4847
PointNu-Net	158.1	335.1	11	0.4957
PromptNucSeg-B	145.6	59.0	27	0.5095

MACs reduced by 4-5 times, inference speed (FPS) increased by 2-4 times.

Key Findings¶

Negative prompts must be used during both training and inference stages: Adding negative prompts only during inference is ineffective due to catastrophic forgetting; experiments verified that 1 negative prompt is optimal, whereas 2 introduces noise and degrades performance.
Auxiliary nucleus region segmentation task is effective: AJI increases by 0.6%, and the generated probability map is simultaneously utilized for prompt filtering, achieving two goals at once.
Prompt quality is the bottleneck: Oracle experiments (using GT center points) yield significantly higher performance than actual performance, indicating that improving prompter accuracy is a key direction for future improvement.
Assigning classification to the prompter is better: Compared to letting the SAM decoder perform classification, it is more rational for the prompter to classify from a global perspective.

Highlights & Insights¶

Paradigm Innovation: Replaces the traditional proxy map regression + post-processing pipeline with a clean and elegant one-prompt-one-nucleus paradigm.
Ingenious Negative Prompting: Addresses the core challenge of overlapping nucleus segmentation, finding a natural solution exploiting SAM's promptable nature.
Efficient Auxiliary Task Design: A simple segmentation head serves both feature enhancement and inference filtering roles.
Strong Practicality: Eliminates the need for post-processing parameter tuning, significantly reduces computational overhead, and substantially improves FPS.
Transferable Insights: The training-inference consistency design of negative prompts can be extended to other application scenarios of SAM variants.

Limitations & Future Work¶

Independent Training of Prompter and Segmentor: End-to-end training might bring further improvements (preliminary attempts mentioned in the paper did not converge).
Dependence on SAM's ViT Backbone: The model parameter size remains large (145M+), limiting deployment on edge devices.
Prompt Quality Bottleneck: The gap between oracle and actual performance shows that prompter accuracy is the main bottleneck; stronger detection/localization methods can be explored.
Limited to H&E Stained Images: Performance on other staining methods like IHC has not been validated.
Sliding Window Strategy may cause boundary inconsistency issues on ultra-large images.

vs CellViT (Hörst et al., 2023): CellViT reuses the SAM encoder but still relies on proxy map regression + post-processing. PromptNucSeg fully utilizes SAM's end-to-end segmentation capability.
vs SPPNet (Xu et al., 2023): SPPNet performs semantic segmentation using a one-prompt-all-nuclei manner, requiring manual prompts and lacking instance details. PromptNucSeg automatically generates prompts and directly outputs instances.
vs Bottom-up Methods (HoVer-Net, StarDist, etc.): Traditional methods rely heavily on heavy post-processing and computation. PromptNucSeg is more efficient and completely post-processing free.
vs Mask R-CNN: Both are top-down, but Mask R-CNN's bounding boxes are prone to overlapping multiple nuclei, and fixed-resolution masks suffer from quantization errors. Point prompts are more precise.

Rating¶

Novelty: ⭐⭐⭐⭐⭐ Perfect combination of SAM's promptable nature with nucleus instance segmentation, featuring an ingenious negative prompt design.
Experimental Thoroughness: ⭐⭐⭐⭐ Comprehensive comparison across three benchmarks, exhaustive ablation, and thorough efficiency analysis.
Writing Quality: ⭐⭐⭐⭐ Clear motivation, well-structured methodology, and intuitive illustrations.
Value: ⭐⭐⭐⭐⭐ Paradigm-shifting, a post-processing-free nucleus segmentation framework with significant value for practical deployment.