Skip to content

Fine-Grained Representation for Lane Topology Reasoning

Conference: AAAI 2026 arXiv: 2511.12590 Authors: Guoqing Xu, Yiheng Li, Yang Yang (Beijing Institute of Technology) Code: GitHub Area: Autonomous Driving Keywords: Lane topology reasoning, fine-grained queries, BEV perception, boundary point topology, denoising training

TL;DR

This paper proposes TopoFG, a framework that replaces conventional single-query lane modeling with fine-grained queries (each lane represented by multiple spatially-aware queries), combined with hierarchical prior extraction, region-focused decoding, and boundary-point-based robust topology reasoning, achieving new state-of-the-art results of 48.0% OLS (subset_A) and 45.4% OLS (subset_B) on OpenLane-V2.

Background & Motivation

State of the Field

Lane topology reasoning is a core task in autonomous driving perception, requiring simultaneous detection of lane centerlines and traffic elements, as well as inference of their topological relationships (lane-lane connectivity and lane-traffic sign association). Accurate topology modeling directly affects navigation and control decisions.

Limitations of Prior Work

  • Insufficient expressiveness of single-query modeling: Methods such as TopoNet and TopoLogic represent an entire lane with a single query vector, making it difficult to capture complex shapes and local geometric variations.
  • Unreliable instance-level topology reasoning: Connectivity is predicted by computing the overall feature similarity between two lanes; however, two connected lanes may only meet locally at their endpoints, making overall similarity uninformative.
  • Typical failure scenario: When the endpoint of lane \(a\) is followed by two geometrically similar parallel lanes \(b\) and \(c\), instance-level features may render \(b\) and \(c\) nearly indistinguishable, causing the model to incorrectly predict a connection \(a \to c\).

Core Idea

Replace single-query representations with fine-grained query sequences per lane, enabling the model to capture local geometric details. Topology reasoning is then focused on boundary point (start/end) features rather than holistic instance features, improving connection prediction accuracy.

Method

Overall Architecture

TopoFG consists of three core modules: the Hierarchical Prior Extractor (HPE), the Region-Focused Decoder (RFD), and the Robust Boundary point Topology Reasoning module (RBTR).

Input multi-view images → CNN backbone (ResNet-50) + FPN for multi-scale features → Deformable attention for BEV feature generation → HPE → RFD → RBTR → Output lane lines and topological relationships.

Module 1: Hierarchical Prior Extractor (HPE)

Extracts two complementary types of prior information:

Global Spatial Prior: 1. MaskFormer predicts lane masks \(\boldsymbol{M}\) from BEV features. 2. A weight vector \(\boldsymbol{A}\) is computed using threshold \(\tau\), with high-confidence regions emphasized by scaling factor \(\alpha\). 3. Sine-cosine encoding generates BEV grid positional embeddings \(\boldsymbol{P}\); a weighted summation yields the spatial prior \(\boldsymbol{Q}^{\text{pos}}\).

Local Sequence Prior: 1. Learnable queries \(\boldsymbol{Q}'\) are initialized to represent local points on each lane. 2. Keypoints on each lane are assigned ordered indices \(I=\{1,...,k\}\). 3. Positional encoding followed by linear projection transforms these into ordered embeddings \(\boldsymbol{Q}^{\text{seq}}\) that preserve local geometric structure.

Module 2: Region-Focused Decoder (RFD)

Fine-Grained Query Initialization: Spatial and sequence priors are fused to produce fine-grained queries: \(\boldsymbol{Q}_{i,t}^F = \boldsymbol{Q}_i^{pos} + \mathcal{F}(\boldsymbol{Q}_t^{seq})\), where \(i\) is the lane instance index and \(t\) is the keypoint index.

Two-Stage Self-Attention: 1. Inter-instance self-attention: captures interactions between different lane instances. 2. Intra-instance self-attention: refines the point-level structure within a single lane.

Region-Guided Cross-Attention: - Reference point sampling guided by lane masks (rather than random initialization) constrains attention to lane-relevant regions. - Deformable attention enables efficient interaction between BEV features and queries.

Module 3: Robust Boundary Point Topology Reasoning (RBTR)

Boundary Point Topology Reasoning: - From the fine-grained query sequence of each lane, only the first and last queries are retained as boundary point features: \(f_i^{\text{start}} = Q_{i,1}^F\), \(f_i^{\text{end}} = Q_{i,k}^F\). - For any lane pair \((i,j)\), the endpoint feature of lane \(i\) and the start-point feature of lane \(j\) are concatenated and fed into a shared MLP to predict connection probability. - Euclidean distances between boundary points are also computed to form a geometric topology matrix. - Final topology = similarity topology + geometric topology.

Denoising Training Strategy: - Hungarian matching causes inconsistent supervision matrices across epochs, undermining training stability for topology learning. - Noisy queries are generated from each GT instance, forming \(N_{gt} \times G\) denoising queries (\(G=5\) groups). - The original adjacency matrix is expanded into a block-diagonal form as a fixed supervision signal. - During inference, only vanilla queries are used; denoising queries are discarded.

Loss & Training

Following the design of TopoLogic, each fine-grained query is responsible for predicting the coordinate of a single keypoint on the lane.

Key Experimental Results

Main Results: OpenLane-V2 Comparison

Method Conference Dataset OLS↑ DET_l↑ DET_t↑ TOP_ll↑ TOP_lt↑
STSU ICCV2021 subset_A 29.3 12.7 43.0 2.9 19.8
TopoNet Arxiv2023 subset_A 39.8 28.6 48.6 10.9 23.8
TopoMLP ICLR2024 subset_A 44.1 28.5 49.5 21.7 26.9
TopoLogic NeurIPS2024 subset_A 44.1 29.9 47.2 23.9 25.4
TopoFG AAAI2026 subset_A 48.0(+3.9) 33.8(+3.9) 47.2 30.8(+6.9) 30.9(+4.0)
TopoNet Arxiv2023 subset_B 36.8 24.3 55.0 6.7 16.7
TopoLogic NeurIPS2024 subset_B 42.3 25.9 54.7 21.6 17.9
TopoFG AAAI2026 subset_B 45.4(+3.1) 30.0(+4.1) 53.0 27.2(+5.6) 21.7(+3.8)

TopoFG surpasses the second-best method by 3.9 and 3.1 OLS points on subset_A and subset_B, respectively. The topology reasoning metric TOP_ll shows the most notable improvement (+6.9/+5.6).

Ablation Study: Contribution of Each Module

Configuration OLS↑ DET_l↑ DET_t↑ TOP_ll↑ TOP_lt↑
Baseline (TopoLogic) 44.1 29.9 47.2 23.9 25.4
+ HPE 45.4 31.3 47.5 26.1 26.6
+ HPE + RFD 45.8 31.8 47.2 26.8 27.7
+ HPE + RFD + RBTR (Full) 48.0 33.8 47.2 30.8 30.9

All three modules contribute consistent performance gains when stacked incrementally. RBTR contributes the largest improvement (OLS +2.2), confirming that boundary-point topology reasoning and the denoising strategy are critical.

Additional Ablation: Sub-module Analysis

  • HPE sub-modules: Local sequence prior and global spatial prior are each individually effective; their combination achieves 45.4% OLS.
  • RFD sub-modules: The combination of fine-grained query initialization and sampled reference points yields the best result (45.8% OLS).
  • RBTR sub-modules: Boundary point topology reasoning (BTR) raises OLS to 46.6%; adding denoising training (DTR) further improves it to 47.3%; combining both reaches 48.0%.

Highlights & Insights

  • New paradigm for fine-grained query modeling: Each lane is represented by \(k=11\) spatially-aware queries rather than a single holistic query, fundamentally enhancing expressiveness for complex lane structures.
  • Intuitive rationale for boundary point topology reasoning: Lane connectivity is inherently determined by endpoints; reasoning over start/end-point features is more principled than overall similarity, validated by a 6.9% gain in TOP_ll.
  • Denoising training addresses supervision instability: Label inconsistency caused by Hungarian matching is a practical challenge in training; block-diagonal denoising supervision provides a stable learning signal.
  • Complementary hierarchical prior design: Global spatial priors provide mask-level localization, while local sequence priors preserve the ordered structure of lane keypoints; their combination outperforms either alone.
  • Code is open-sourced, and experiments follow the standard OpenLane-V2 v2.1.0 evaluation protocol, ensuring reproducibility.

Limitations & Future Work

  • No improvement on DET_t: Traffic element detection (DET_t) remains at 47.2% on subset_A; fine-grained lane modeling does not benefit traffic element perception, suggesting a dedicated enhancement module may be needed.
  • Computational overhead not discussed: Expanding each lane from 1 query to 11 significantly increases the self-attention and cross-attention cost in the decoder; inference speed and FLOPs comparisons are not reported.
  • Evaluated only on OpenLane-V2: Generalization to other topology reasoning benchmarks or real deployment scenarios is not verified.
  • Lightweight backbone: Only ResNet-50 is used; the impact of stronger backbones (e.g., Swin Transformer) or higher input resolution is not explored.
  • Denoising queries discarded at inference: Denoising queries are used only during training and fully discarded at inference, leaving potential room for further utilization.
  • Single-frame inference: Temporal information is not exploited, which may limit performance in continuous scenarios compared to methods with temporal modeling such as BEVFormer v2.
  • TopoNet (ICLR2023): Models lane and traffic element graphs with GNNs, but instance-level queries limit geometric expressiveness, achieving only 39.8% OLS.
  • TopoMLP (ICLR2024): Uses a lightweight MLP to predict topological relationships, achieving 44.1% OLS, but lacks fine-grained modeling of intra-lane structure.
  • TopoLogic (NeurIPS2024): Employs an interpretable topology reasoning strategy based on spatial positional relationships between lanes, achieving 44.1% OLS; used as the baseline in this work, which outperforms it on all key metrics.
  • LaneSegNet: Models lanes as semantically rich lane segments with a Lane Attention mechanism, but topology reasoning remains at the segment-level.
  • Topo2Seq: Achieves 33.6% mAP on lane segment detection, while TopoFG surpasses it with 34.4% and superior topology metrics.
  • Mask2Map: Generates rasterized maps before vectorization; TopoFG draws on its multi-scale BEV feature design but adopts an end-to-end framework.

Rating

  • Novelty: ⭐⭐⭐⭐ — The combination of fine-grained queries, boundary point topology, and denoising training is novel, though each individual component draws inspiration from prior work.
  • Experimental Thoroughness: ⭐⭐⭐⭐ — Main experiments, multiple ablation studies, and qualitative visualizations are all provided, but efficiency analysis and cross-dataset validation are absent.
  • Writing Quality: ⭐⭐⭐⭐ — Clear structure, intuitive figures, and well-articulated motivation.
  • Value: ⭐⭐⭐⭐ — Substantially advances the state of the art on the important lane topology reasoning task with broadly applicable methodology.