Synergistic Bleeding Region and Point Detection in Laparoscopic Surgical Videos¶

Conference: CVPR 2026 arXiv: 2503.22174 Code: GitHub Area: Medical Imaging Keywords: bleeding detection, laparoscopic surgery, SAM2, dual-task synergy, optical flow

TL;DR¶

This work introduces SurgBlood, the first laparoscopic surgical video dataset with annotations for both bleeding regions and bleeding points, and proposes BlooDet, a SAM2-based dual-branch bidirectional guidance online detector that achieves joint bleeding region segmentation and bleeding point localization through synergistic optimization of Mask and Point branches.

Background & Motivation¶

Intraoperative bleeding is a critical emergency that seriously compromises surgical safety in laparoscopic minimally invasive surgery: - Bleeding region detection enables quantification of blood loss and supports intraoperative decision-making - Bleeding point localization helps surgeons rapidly identify the bleeding source for hemostasis

Limitations of existing methods: 1. Most algorithms operate on single frames and lack video temporal modeling 2. The focus is primarily on bleeding regions, leaving the clinical need for bleeding source localization unaddressed 3. Multi-task frameworks have not fully exploited the potential of SAM2 for joint cross-task optimization

No publicly available multi-task real-world bleeding dataset exists.

Challenges include the narrow field of view in laparoscopy, unstable illumination, rapid blood accumulation that alters tissue appearance, and bleeding points occluded by blood or tissue.

Method¶

Overall Architecture¶

BlooDet adopts a dual-branch bidirectional guidance architecture built on SAM2, comprising a Mask branch (bleeding region detection) and a Point branch (bleeding point localization). The two branches achieve synergistic optimization by mutually providing prompts and temporal information. The core objective is a coupled optimization:

\[\{\boldsymbol{\theta}^*, \boldsymbol{\vartheta}^*\} = \arg\min_{\boldsymbol{\theta}, \boldsymbol{\vartheta}} \Big[\mathcal{L}_{\mathtt{m}}\big(\boldsymbol{\theta}(\boldsymbol{\vartheta})\big) + \mathcal{L}_{\mathtt{p}}\big(\boldsymbol{\vartheta}(\boldsymbol{\theta})\big)\Big]\]

This is solved via an alternating optimization strategy: the Mask branch parameters are updated with the Point branch fixed, followed by updating the Point branch with the updated Mask branch fixed.

Key Designs¶

Point Branch — Optical Flow-Guided Bleeding Point Memory Modeling: A frozen PWC-Net estimates inter-frame optical flow \(O_i(x,y)\). Combined with an inverted Mask map to filter unstable flow within bleeding regions, the average viewpoint shift is computed as: \(\bar{O}_i(\Delta x, \Delta y) = \frac{1}{H \times W} \sum_{X=1}^{H} \sum_{Y=1}^{W} (1-M_i) \cdot O_i(x,y)\) Prior-frame Mask memory features are then fused with Point features via self-attention and cross-attention to generate memory-enhanced Point features. The core idea is to use background optical flow to compensate for camera motion while leveraging Mask memory to narrow the bleeding point search space.
Mask Branch — Edge Generator and Adaptive Prompt Embedding: Multi-scale Gabor wavelet Laplacian filters are applied to enhance bleeding boundaries: \(F'_{\text{mask}} = (\text{ReLU}(F_{\text{mask}})) \odot (\mathbf{L}_\mathbf{g}(x,y) * F_{\text{mask}})\) The edge map \(E_m\) and the bleeding point map \(P_m\) generated by the Point branch are combined into an adaptive prompt fed to the Mask decoder, replacing manual interactive prompts.
Bidirectional Cross-Branch Guidance: Predicted bleeding points from the Point branch serve as automatic prompts for the Mask decoder to focus on target regions; predicted masks from the Mask branch provide temporal directional cues and spatial constraints for the Point branch. The two branches mutually constrain and reinforce each other.

Loss & Training¶

Mask branch: \(\mathcal{L}_\mathtt{m} = \lambda_\mathtt{r} \mathcal{L}_\mathtt{r} + \lambda_\mathtt{e} \mathcal{L}_\mathtt{e}\), with both region and edge losses computed as Focal Loss + Dice Loss
Point branch: \(\mathcal{L}_\mathtt{p} = \lambda_\mathcal{P} \mathcal{L}_\mathcal{P} + \lambda_\mathtt{s} \mathcal{L}_\mathtt{s}\), using Smooth L1 Loss for point supervision and BCE for existence prediction
Loss weights: \(\lambda_\mathtt{r}=1, \lambda_\mathtt{e}=1, \lambda_\mathtt{s}=1, \lambda_\mathcal{P}=0.5\)
Alternating optimization: each iteration updates the Mask branch before the Point branch

SurgBlood Dataset: 95 video clips from 42 cholecystectomy procedures, totaling 5,330 frames at 1280×720 resolution, annotated by hepatobiliary surgeons with pixel-level bleeding region masks and bleeding point coordinates. Four bleeding types: gallbladder (21.64%), Calot's triangle (25.01%), vessels (15.78%), and gallbladder bed (37.75%).

Key Experimental Results¶

Main Results¶

Method	SurgBlood IoU ↑	SurgBlood Dice ↑	PCK-5% ↑	PCK-10% ↑
SAM 2†	50.93	67.49	41.68	71.99
MemSAM†	52.84	69.14	31.80	64.91
D-CeLR*	51.30	67.82	24.22	63.92
ConsisTNet	40.43	57.59	32.83	68.15
BlooDet (Ours)	64.88	78.70	55.85	83.69

BlooDet outperforms 13 competing methods on SurgBlood, achieving a 12.05% IoU gain over SAM2 and an 11.70% improvement in PCK-10%. It also attains the best region detection performance on the HemoSet dataset (IoU 59.62, Dice 74.70).

Ablation Study¶

Configuration	SurgBlood DSC ↑	Note
Mask + Point only (no edge generator, no temporal consistency)	~67.49	Baseline SAM2 dual-task
+ Edge generator + cross-branch guidance	78.70	Full BlooDet

(Note: Ablations on XCAV/CAVSA datasets are also reported; the full model achieves DSC 84.39%, dropping to 76.24% without temporal consistency and to 76.71% without confidence regularization.)

Key Findings¶

Region detection methods augmented with a simple point prediction head perform poorly, demonstrating the necessity of dedicated synergistic design
Optical flow combined with Mask memory is critical for bleeding point tracking, resolving camera-motion-induced drift
The edge generator effectively mitigates boundary ambiguity under low-contrast surgical scenes
The alternating optimization strategy enables both branches to reach a joint optimum

Highlights & Insights¶

Novel task definition: The first work to propose joint detection of bleeding regions and bleeding points in laparoscopic surgery
SurgBlood dataset: The first real surgical video dataset providing dual annotations for both bleeding regions and bleeding points
The dual-branch bidirectional guidance design is elegant — the Mask branch provides spatial constraints for the Point branch, while the Point branch supplies precise prompts for the Mask branch
Background optical flow (excluding bleeding regions) is ingeniously exploited to compensate for camera motion drift

Limitations & Future Work¶

The dataset scale is relatively small (95 clips), and generalizability remains to be validated
Validation is limited to cholecystectomy; extension to a broader range of surgical procedures is needed
The Point branch relies on a frozen PWC-Net for optical flow, which may degrade in severely blood-occluded scenes
Multi-bleeding-point scenarios and bleeding intensity quantification are not addressed

The approach of building multi-task frameworks on top of SAM2 — chaining different tasks via the prompt mechanism — is a noteworthy paradigm
The strategy of using optical flow for camera motion compensation in keypoint tracking is transferable to other surgical vision tasks
The cross-validated annotation protocol (4 annotators + 2 reviewers) employed in dataset construction ensures annotation quality

Rating¶

Novelty: ⭐⭐⭐⭐⭐ — First task definition + first dataset + novel dual-branch architecture
Experimental Thoroughness: ⭐⭐⭐⭐ — 13 competing methods + multi-dataset validation + comprehensive ablation
Writing Quality: ⭐⭐⭐⭐ — Clear structure with complete method description
Value: ⭐⭐⭐⭐⭐ — Strong clinical utility and significant dataset contribution