Synergistic Bleeding Region and Point Detection in Laparoscopic Surgical Videos¶
Conference: CVPR 2026 arXiv: 2503.22174 Code: GitHub Area: Medical Imaging Keywords: bleeding detection, laparoscopic surgery, SAM2, dual-task synergy, optical flow
TL;DR¶
This work introduces SurgBlood, the first laparoscopic surgical video dataset with annotations for both bleeding regions and bleeding points, and proposes BlooDet, a SAM2-based dual-branch bidirectional guidance online detector that achieves joint bleeding region segmentation and bleeding point localization through synergistic optimization of Mask and Point branches.
Background & Motivation¶
Intraoperative bleeding is a critical emergency that seriously compromises surgical safety in laparoscopic minimally invasive surgery: - Bleeding region detection enables quantification of blood loss and supports intraoperative decision-making - Bleeding point localization helps surgeons rapidly identify the bleeding source for hemostasis
Limitations of existing methods: 1. Most algorithms operate on single frames and lack video temporal modeling 2. The focus is primarily on bleeding regions, leaving the clinical need for bleeding source localization unaddressed 3. Multi-task frameworks have not fully exploited the potential of SAM2 for joint cross-task optimization
No publicly available multi-task real-world bleeding dataset exists.
Challenges include the narrow field of view in laparoscopy, unstable illumination, rapid blood accumulation that alters tissue appearance, and bleeding points occluded by blood or tissue.
Method¶
Overall Architecture¶
BlooDet adopts a dual-branch bidirectional guidance architecture built on SAM2, comprising a Mask branch (bleeding region detection) and a Point branch (bleeding point localization). The two branches achieve synergistic optimization by mutually providing prompts and temporal information. The core objective is a coupled optimization:
This is solved via an alternating optimization strategy: the Mask branch parameters are updated with the Point branch fixed, followed by updating the Point branch with the updated Mask branch fixed.
Key Designs¶
-
Point Branch — Optical Flow-Guided Bleeding Point Memory Modeling: A frozen PWC-Net estimates inter-frame optical flow \(O_i(x,y)\). Combined with an inverted Mask map to filter unstable flow within bleeding regions, the average viewpoint shift is computed as: \(\bar{O}_i(\Delta x, \Delta y) = \frac{1}{H \times W} \sum_{X=1}^{H} \sum_{Y=1}^{W} (1-M_i) \cdot O_i(x,y)\) Prior-frame Mask memory features are then fused with Point features via self-attention and cross-attention to generate memory-enhanced Point features. The core idea is to use background optical flow to compensate for camera motion while leveraging Mask memory to narrow the bleeding point search space.
-
Mask Branch — Edge Generator and Adaptive Prompt Embedding: Multi-scale Gabor wavelet Laplacian filters are applied to enhance bleeding boundaries: \(F'_{\text{mask}} = (\text{ReLU}(F_{\text{mask}})) \odot (\mathbf{L}_\mathbf{g}(x,y) * F_{\text{mask}})\) The edge map \(E_m\) and the bleeding point map \(P_m\) generated by the Point branch are combined into an adaptive prompt fed to the Mask decoder, replacing manual interactive prompts.
-
Bidirectional Cross-Branch Guidance: Predicted bleeding points from the Point branch serve as automatic prompts for the Mask decoder to focus on target regions; predicted masks from the Mask branch provide temporal directional cues and spatial constraints for the Point branch. The two branches mutually constrain and reinforce each other.
Loss & Training¶
- Mask branch: \(\mathcal{L}_\mathtt{m} = \lambda_\mathtt{r} \mathcal{L}_\mathtt{r} + \lambda_\mathtt{e} \mathcal{L}_\mathtt{e}\), with both region and edge losses computed as Focal Loss + Dice Loss
- Point branch: \(\mathcal{L}_\mathtt{p} = \lambda_\mathcal{P} \mathcal{L}_\mathcal{P} + \lambda_\mathtt{s} \mathcal{L}_\mathtt{s}\), using Smooth L1 Loss for point supervision and BCE for existence prediction
- Loss weights: \(\lambda_\mathtt{r}=1, \lambda_\mathtt{e}=1, \lambda_\mathtt{s}=1, \lambda_\mathcal{P}=0.5\)
- Alternating optimization: each iteration updates the Mask branch before the Point branch
SurgBlood Dataset: 95 video clips from 42 cholecystectomy procedures, totaling 5,330 frames at 1280×720 resolution, annotated by hepatobiliary surgeons with pixel-level bleeding region masks and bleeding point coordinates. Four bleeding types: gallbladder (21.64%), Calot's triangle (25.01%), vessels (15.78%), and gallbladder bed (37.75%).
Key Experimental Results¶
Main Results¶
| Method | SurgBlood IoU ↑ | SurgBlood Dice ↑ | PCK-5% ↑ | PCK-10% ↑ |
|---|---|---|---|---|
| SAM 2† | 50.93 | 67.49 | 41.68 | 71.99 |
| MemSAM† | 52.84 | 69.14 | 31.80 | 64.91 |
| D-CeLR* | 51.30 | 67.82 | 24.22 | 63.92 |
| ConsisTNet | 40.43 | 57.59 | 32.83 | 68.15 |
| BlooDet (Ours) | 64.88 | 78.70 | 55.85 | 83.69 |
BlooDet outperforms 13 competing methods on SurgBlood, achieving a 12.05% IoU gain over SAM2 and an 11.70% improvement in PCK-10%. It also attains the best region detection performance on the HemoSet dataset (IoU 59.62, Dice 74.70).
Ablation Study¶
| Configuration | SurgBlood DSC ↑ | Note |
|---|---|---|
| Mask + Point only (no edge generator, no temporal consistency) | ~67.49 | Baseline SAM2 dual-task |
| + Edge generator + cross-branch guidance | 78.70 | Full BlooDet |
(Note: Ablations on XCAV/CAVSA datasets are also reported; the full model achieves DSC 84.39%, dropping to 76.24% without temporal consistency and to 76.71% without confidence regularization.)
Key Findings¶
- Region detection methods augmented with a simple point prediction head perform poorly, demonstrating the necessity of dedicated synergistic design
- Optical flow combined with Mask memory is critical for bleeding point tracking, resolving camera-motion-induced drift
- The edge generator effectively mitigates boundary ambiguity under low-contrast surgical scenes
- The alternating optimization strategy enables both branches to reach a joint optimum
Highlights & Insights¶
- Novel task definition: The first work to propose joint detection of bleeding regions and bleeding points in laparoscopic surgery
- SurgBlood dataset: The first real surgical video dataset providing dual annotations for both bleeding regions and bleeding points
- The dual-branch bidirectional guidance design is elegant — the Mask branch provides spatial constraints for the Point branch, while the Point branch supplies precise prompts for the Mask branch
- Background optical flow (excluding bleeding regions) is ingeniously exploited to compensate for camera motion drift
Limitations & Future Work¶
- The dataset scale is relatively small (95 clips), and generalizability remains to be validated
- Validation is limited to cholecystectomy; extension to a broader range of surgical procedures is needed
- The Point branch relies on a frozen PWC-Net for optical flow, which may degrade in severely blood-occluded scenes
- Multi-bleeding-point scenarios and bleeding intensity quantification are not addressed
Related Work & Insights¶
- The approach of building multi-task frameworks on top of SAM2 — chaining different tasks via the prompt mechanism — is a noteworthy paradigm
- The strategy of using optical flow for camera motion compensation in keypoint tracking is transferable to other surgical vision tasks
- The cross-validated annotation protocol (4 annotators + 2 reviewers) employed in dataset construction ensures annotation quality
Rating¶
- Novelty: ⭐⭐⭐⭐⭐ — First task definition + first dataset + novel dual-branch architecture
- Experimental Thoroughness: ⭐⭐⭐⭐ — 13 competing methods + multi-dataset validation + comprehensive ablation
- Writing Quality: ⭐⭐⭐⭐ — Clear structure with complete method description
- Value: ⭐⭐⭐⭐⭐ — Strong clinical utility and significant dataset contribution