Topo-R1: Detecting Topological Anomalies via Vision-Language Models¶

Conference: CVPR 2026 arXiv: 2603.13054 Code: To be confirmed Area: Multimodal VLM Keywords: Topological anomaly detection, tubular structure segmentation, GRPO reinforcement learning, clDice, VLM fine-grained perception

TL;DR¶

Topo-R1 is proposed as the first framework to equip VLMs with topology-aware perception. Through an automated data construction pipeline combined with SFT and GRPO reinforcement learning (incorporating a topology-aware composite reward), it enables annotation-free topological anomaly detection and classification in tubular structures.

Background & Motivation¶

Background: Topological correctness in tubular structures (blood vessels, nerve fibers, road networks) is critical. Existing topology-preserving segmentation methods (persistent homology losses, clDice, etc.) rely on pixel-level annotations to constrain training losses.

Limitations of Prior Work: Topological annotation demands domain expertise and is extremely time-consuming; cross-domain transfer is difficult (retinal annotations do not apply to road networks); detecting topological errors in unannotated new domains at deployment time is infeasible.

Key Challenge: Topological anomalies are extremely sparse and localized — among thousands of correct pixels, a single missing pixel may sever a vascular connection. Detecting such "needle-in-a-haystack" errors requires a combination of global structural reasoning and local fine-grained perception, capabilities entirely absent in existing VLMs.

Goal: Enable VLMs to localize and classify topological errors in tubular structures without domain-specific annotations.

Key Insight: Topological anomaly detection is reformulated as a structured visual reasoning task — given an image and a segmentation mask, the model must output bounding boxes with type labels.

Core Idea: An automated data pipeline synthesizes topologically verified anomalies, and a GRPO reinforcement learning procedure incorporating type-aware Hungarian matching and a clDice reward is used to train the VLM.

Method¶

Overall Architecture¶

Two-stage training: Stage 1 uses synthetic data for SFT to bootstrap the VLM from a random initialization; Stage 2 applies GRPO reinforcement learning with a topology-aware composite reward for further optimization. The input consists of an image, a segmentation mask, and a detection prompt; the output is a structured error list of bounding boxes with error types.

Key Designs¶

Automated Data Construction Pipeline:
- Function: Aggregates cross-domain data and injects verifiable topological anomalies.
- Mechanism: Aggregates data from three domains — road networks (60%), crack detection (20%), and retinal vessels (20%); injects four error types (broken connections, false connections, missing branches, spurious branches) onto mask skeletons; automatically verifies correctness via Betti number changes \((\beta_0, \beta_1)\).
- Design Motivation: Manual topological annotation is prohibitively costly. Automated synthesis with Betti number verification ensures the topological correctness of generated data. The four error types exhaustively cover the connectivity and branching axes.
Topology-Aware Composite Reward:
- Function: A multi-objective reward designed for GRPO: \(R_{\text{total}} = 0.10 R_{\text{fmt}} + 0.85 R_{\text{acc}} + 0.05 R_{\text{topo}}\).
- Accuracy Reward \(R_{\text{acc}}\): Comprises a soft F1 detection reward (continuous IoU-based mapping \(\phi\)), a localization reward, and a type reward based on type-aware Hungarian matching.
- Topology Reward \(R_{\text{topo}}\): Computes \((1-\text{clDice})\) over matched pairs to quantify skeletal deviation, multiplied by an area penalty to discourage excessively large bounding boxes.
- Design Motivation: IoU cannot capture topological significance. clDice measures connectivity differences via skeletal overlap, directly encoding the prior that topological errors are defined by connectivity changes.
Type-Aware Hungarian Matching:
- Function: Performs optimal prediction–annotation matching independently within each error type.
- Mechanism: For each error type \(t\), an IoU affinity matrix is constructed and the linear assignment problem is solved for optimal one-to-one matching; TP/FP/FN statistics are aggregated across types.
- Design Motivation: Guarantees globally optimal, order-invariant, one-to-one matching while naturally encoding type correctness.

Loss & Training¶

Stage 1: Full-parameter SFT on approximately 12,900 samples via next-token prediction. Stage 2: GRPO on approximately 50,300 samples — for each query, \(G\) candidate outputs are sampled; advantages are computed using within-group reward normalization, and the policy is optimized with PPO clipping and KL regularization.

Key Experimental Results¶

Main Results (Detection F1@IoU)¶

Model	Method	F1@0.3	F1@0.5	F1@0.75	aF1
GPT-4o	Zero-shot	0.5	0.3	0.0	0.1
GPT-5.2	Zero-shot	0.4	0.2	0.0	0.1
Qwen2.5-VL-3B	Zero-shot	0.0	0.0	0.0	0.0
Qwen2.5-VL-3B	SFT	~15	~10	~3	~5
Qwen2.5-VL-3B	Topo-R1	32.5	22.8	8.1	12.4
Qwen3-VL-8B	Topo-R1	38.7	28.3	11.2	16.0

Ablation Study¶

Configuration	F1@0.5	aF1	Notes
SFT only	10.2	5.1	Supervised fine-tuning only
SFT + GRPO (w/o topo reward)	18.5	9.3	Without topology reward
SFT + GRPO (w/ topo reward)	22.8	12.4	Full Topo-R1
w/o format reward	20.1	10.8	Increased formatting errors

Key Findings¶

The strongest closed-source VLMs (GPT-5.2, Gemini-2.5-Flash) perform near-randomly on topological anomaly detection, confirming that existing VLMs lack topology-aware perception.
SFT bootstraps from random initialization but yields limited gains; the exploratory capacity of GRPO is critical for discovering sparse anomalies.
Despite a weight of only 0.05, the clDice topology reward contributes substantially, suggesting that reward design matters more than reward magnitude.
Cross-domain training (road networks + cracks + vessels) yields better generalization than single-domain training.

Highlights & Insights¶

Pioneering Contribution: This is the first work to apply GRPO reinforcement learning to topological quality assessment, opening a new research direction in VLM topology-aware perception.
Elegant Reward Design: clDice is repurposed from a loss function into an RL reward signal and conditioned on type-aware Hungarian matching, ensuring that only type-correct detections receive topology rewards — preventing misleadingly positive feedback for correct-location but wrong-type predictions.
Practical Value: Topological quality assessment without target-domain annotations enables the framework to serve as a post-processing quality assurance tool for existing segmentation pipelines.

Limitations & Future Work¶

The current framework handles only 2D tubular structures; extension to 3D networks (e.g., cerebrovascular trees, neuronal connectomes) remains an open problem.
Synthetic anomalies may not faithfully reflect the distribution of real post-processing errors (e.g., gradual boundaries from over- or under-segmentation).
The fixed four-class error taxonomy may not cover all practical scenarios (e.g., false positives caused by partial occlusion).
The 256×256 patch size limits the model's ability to perceive topological relationships at larger scales.

vs. AnomalyR1: AnomalyR1 targets industrial anomaly detection; Topo-R1 focuses on topological anomalies, with fundamentally different reward design (clDice vs. IoU).
vs. clDice Loss: clDice was originally used as a training loss to optimize segmentation; Topo-R1 repurposes it as an RL reward signal for detection and classification.

Rating¶

Novelty: ⭐⭐⭐⭐⭐ — First VLM topology-aware framework; both the problem formulation and methodology are pioneering.
Experimental Thoroughness: ⭐⭐⭐⭐ — Multi-backbone and multi-domain evaluation, though real-world application validation is lacking.
Writing Quality: ⭐⭐⭐⭐ — The method section is highly detailed with clear mathematical derivations.
Value: ⭐⭐⭐⭐⭐ — Addresses a practical need for annotation-free topological quality assessment with broad application prospects.