Fine-Grained Representation for Lane Topology Reasoning¶

Conference: AAAI 2026 arXiv: 2511.12590 Authors: Guoqing Xu, Yiheng Li, Yang Yang (Beijing Institute of Technology) Code: GitHub Area: Autonomous Driving Keywords: Lane topology reasoning, fine-grained queries, BEV perception, boundary point topology, denoising training

TL;DR¶

This paper proposes TopoFG, a framework that replaces conventional single-query lane modeling with fine-grained queries (each lane represented by multiple spatially-aware queries), combined with hierarchical prior extraction, region-focused decoding, and boundary-point-based robust topology reasoning, achieving new state-of-the-art results of 48.0% OLS (subset_A) and 45.4% OLS (subset_B) on OpenLane-V2.

Background & Motivation¶

State of the Field¶

Lane topology reasoning is a core task in autonomous driving perception, requiring simultaneous detection of lane centerlines and traffic elements, as well as inference of their topological relationships (lane-lane connectivity and lane-traffic sign association). Accurate topology modeling directly affects navigation and control decisions.

Limitations of Prior Work¶

Insufficient expressiveness of single-query modeling: Methods such as TopoNet and TopoLogic represent an entire lane with a single query vector, making it difficult to capture complex shapes and local geometric variations.
Unreliable instance-level topology reasoning: Connectivity is predicted by computing the overall feature similarity between two lanes; however, two connected lanes may only meet locally at their endpoints, making overall similarity uninformative.
Typical failure scenario: When the endpoint of lane \(a\) is followed by two geometrically similar parallel lanes \(b\) and \(c\), instance-level features may render \(b\) and \(c\) nearly indistinguishable, causing the model to incorrectly predict a connection \(a \to c\).

Core Idea¶

Replace single-query representations with fine-grained query sequences per lane, enabling the model to capture local geometric details. Topology reasoning is then focused on boundary point (start/end) features rather than holistic instance features, improving connection prediction accuracy.

Method¶

Overall Architecture¶

TopoFG consists of three core modules: the Hierarchical Prior Extractor (HPE), the Region-Focused Decoder (RFD), and the Robust Boundary point Topology Reasoning module (RBTR).

Input multi-view images → CNN backbone (ResNet-50) + FPN for multi-scale features → Deformable attention for BEV feature generation → HPE → RFD → RBTR → Output lane lines and topological relationships.

Module 1: Hierarchical Prior Extractor (HPE)¶

Extracts two complementary types of prior information:

Global Spatial Prior: 1. MaskFormer predicts lane masks \(\boldsymbol{M}\) from BEV features. 2. A weight vector \(\boldsymbol{A}\) is computed using threshold \(\tau\), with high-confidence regions emphasized by scaling factor \(\alpha\). 3. Sine-cosine encoding generates BEV grid positional embeddings \(\boldsymbol{P}\); a weighted summation yields the spatial prior \(\boldsymbol{Q}^{\text{pos}}\).

Local Sequence Prior: 1. Learnable queries \(\boldsymbol{Q}'\) are initialized to represent local points on each lane. 2. Keypoints on each lane are assigned ordered indices \(I=\{1,...,k\}\). 3. Positional encoding followed by linear projection transforms these into ordered embeddings \(\boldsymbol{Q}^{\text{seq}}\) that preserve local geometric structure.

Module 2: Region-Focused Decoder (RFD)¶

Fine-Grained Query Initialization: Spatial and sequence priors are fused to produce fine-grained queries: \(\boldsymbol{Q}_{i,t}^F = \boldsymbol{Q}_i^{pos} + \mathcal{F}(\boldsymbol{Q}_t^{seq})\), where \(i\) is the lane instance index and \(t\) is the keypoint index.

Two-Stage Self-Attention: 1. Inter-instance self-attention: captures interactions between different lane instances. 2. Intra-instance self-attention: refines the point-level structure within a single lane.

Region-Guided Cross-Attention: - Reference point sampling guided by lane masks (rather than random initialization) constrains attention to lane-relevant regions. - Deformable attention enables efficient interaction between BEV features and queries.

Module 3: Robust Boundary Point Topology Reasoning (RBTR)¶

Boundary Point Topology Reasoning: - From the fine-grained query sequence of each lane, only the first and last queries are retained as boundary point features: \(f_i^{\text{start}} = Q_{i,1}^F\), \(f_i^{\text{end}} = Q_{i,k}^F\). - For any lane pair \((i,j)\), the endpoint feature of lane \(i\) and the start-point feature of lane \(j\) are concatenated and fed into a shared MLP to predict connection probability. - Euclidean distances between boundary points are also computed to form a geometric topology matrix. - Final topology = similarity topology + geometric topology.

Denoising Training Strategy: - Hungarian matching causes inconsistent supervision matrices across epochs, undermining training stability for topology learning. - Noisy queries are generated from each GT instance, forming \(N_{gt} \times G\) denoising queries (\(G=5\) groups). - The original adjacency matrix is expanded into a block-diagonal form as a fixed supervision signal. - During inference, only vanilla queries are used; denoising queries are discarded.

Loss & Training¶

Following the design of TopoLogic, each fine-grained query is responsible for predicting the coordinate of a single keypoint on the lane.

Key Experimental Results¶

Main Results: OpenLane-V2 Comparison¶

Method	Conference	Dataset	OLS↑	DET_l↑	DET_t↑	TOP_ll↑	TOP_lt↑
STSU	ICCV2021	subset_A	29.3	12.7	43.0	2.9	19.8
TopoNet	Arxiv2023	subset_A	39.8	28.6	48.6	10.9	23.8
TopoMLP	ICLR2024	subset_A	44.1	28.5	49.5	21.7	26.9
TopoLogic	NeurIPS2024	subset_A	44.1	29.9	47.2	23.9	25.4
TopoFG	AAAI2026	subset_A	48.0(+3.9)	33.8(+3.9)	47.2	30.8(+6.9)	30.9(+4.0)
TopoNet	Arxiv2023	subset_B	36.8	24.3	55.0	6.7	16.7
TopoLogic	NeurIPS2024	subset_B	42.3	25.9	54.7	21.6	17.9
TopoFG	AAAI2026	subset_B	45.4(+3.1)	30.0(+4.1)	53.0	27.2(+5.6)	21.7(+3.8)

TopoFG surpasses the second-best method by 3.9 and 3.1 OLS points on subset_A and subset_B, respectively. The topology reasoning metric TOP_ll shows the most notable improvement (+6.9/+5.6).

Ablation Study: Contribution of Each Module¶

Configuration	OLS↑	DET_l↑	DET_t↑	TOP_ll↑	TOP_lt↑
Baseline (TopoLogic)	44.1	29.9	47.2	23.9	25.4
+ HPE	45.4	31.3	47.5	26.1	26.6
+ HPE + RFD	45.8	31.8	47.2	26.8	27.7
+ HPE + RFD + RBTR (Full)	48.0	33.8	47.2	30.8	30.9

All three modules contribute consistent performance gains when stacked incrementally. RBTR contributes the largest improvement (OLS +2.2), confirming that boundary-point topology reasoning and the denoising strategy are critical.

Additional Ablation: Sub-module Analysis¶

HPE sub-modules: Local sequence prior and global spatial prior are each individually effective; their combination achieves 45.4% OLS.
RFD sub-modules: The combination of fine-grained query initialization and sampled reference points yields the best result (45.8% OLS).
RBTR sub-modules: Boundary point topology reasoning (BTR) raises OLS to 46.6%; adding denoising training (DTR) further improves it to 47.3%; combining both reaches 48.0%.

Highlights & Insights¶

New paradigm for fine-grained query modeling: Each lane is represented by \(k=11\) spatially-aware queries rather than a single holistic query, fundamentally enhancing expressiveness for complex lane structures.
Intuitive rationale for boundary point topology reasoning: Lane connectivity is inherently determined by endpoints; reasoning over start/end-point features is more principled than overall similarity, validated by a 6.9% gain in TOP_ll.
Denoising training addresses supervision instability: Label inconsistency caused by Hungarian matching is a practical challenge in training; block-diagonal denoising supervision provides a stable learning signal.
Complementary hierarchical prior design: Global spatial priors provide mask-level localization, while local sequence priors preserve the ordered structure of lane keypoints; their combination outperforms either alone.
Code is open-sourced, and experiments follow the standard OpenLane-V2 v2.1.0 evaluation protocol, ensuring reproducibility.

Limitations & Future Work¶

No improvement on DET_t: Traffic element detection (DET_t) remains at 47.2% on subset_A; fine-grained lane modeling does not benefit traffic element perception, suggesting a dedicated enhancement module may be needed.
Computational overhead not discussed: Expanding each lane from 1 query to 11 significantly increases the self-attention and cross-attention cost in the decoder; inference speed and FLOPs comparisons are not reported.
Evaluated only on OpenLane-V2: Generalization to other topology reasoning benchmarks or real deployment scenarios is not verified.
Lightweight backbone: Only ResNet-50 is used; the impact of stronger backbones (e.g., Swin Transformer) or higher input resolution is not explored.
Denoising queries discarded at inference: Denoising queries are used only during training and fully discarded at inference, leaving potential room for further utilization.
Single-frame inference: Temporal information is not exploited, which may limit performance in continuous scenarios compared to methods with temporal modeling such as BEVFormer v2.

TopoNet (ICLR2023): Models lane and traffic element graphs with GNNs, but instance-level queries limit geometric expressiveness, achieving only 39.8% OLS.
TopoMLP (ICLR2024): Uses a lightweight MLP to predict topological relationships, achieving 44.1% OLS, but lacks fine-grained modeling of intra-lane structure.
TopoLogic (NeurIPS2024): Employs an interpretable topology reasoning strategy based on spatial positional relationships between lanes, achieving 44.1% OLS; used as the baseline in this work, which outperforms it on all key metrics.
LaneSegNet: Models lanes as semantically rich lane segments with a Lane Attention mechanism, but topology reasoning remains at the segment-level.
Topo2Seq: Achieves 33.6% mAP on lane segment detection, while TopoFG surpasses it with 34.4% and superior topology metrics.
Mask2Map: Generates rasterized maps before vectorization; TopoFG draws on its multi-scale BEV feature design but adopts an end-to-end framework.

Rating¶

Novelty: ⭐⭐⭐⭐ — The combination of fine-grained queries, boundary point topology, and denoising training is novel, though each individual component draws inspiration from prior work.
Experimental Thoroughness: ⭐⭐⭐⭐ — Main experiments, multiple ablation studies, and qualitative visualizations are all provided, but efficiency analysis and cross-dataset validation are absent.
Writing Quality: ⭐⭐⭐⭐ — Clear structure, intuitive figures, and well-articulated motivation.
Value: ⭐⭐⭐⭐ — Substantially advances the state of the art on the important lane topology reasoning task with broadly applicable methodology.