Skip to content

ArgoTweak: Towards Self-Updating HD Maps through Structured Priors

Conference: ICCV 2025 arXiv: 2509.08764 Code: https://KTH-RPL.github.io/ArgoTweak/ Area: Interpretability Keywords: HD Map, Map Updating, Change Detection, Dataset, Autonomous Driving

TL;DR

This paper proposes ArgoTweak, the first HD map dataset providing complete triplets of "prior map + current sensor data + up-to-date ground-truth map." It decomposes large-scale map modifications into element-level atomic changes via a bijective change mapping framework, and introduces interpretable evaluation metrics (mAPC/mACC). Models trained on ArgoTweak reduce the sim2real gap by more than 10× compared to synthetic-prior baselines.

Background & Motivation

Background: HD maps are a core component of autonomous driving, providing precise lane-level information. Recent work has shifted from manual annotation toward end-to-end online generation based on BEV features (e.g., MapTR, LaneSegNet). Some methods attempt to leverage prior maps to improve generation quality.

Limitations of Prior Work: - Lack of complete data: No public dataset simultaneously provides triplets of "prior map + current sensor data + up-to-date ground-truth map." Existing methods can only train with synthetic priors (scripted modifications, noise injection, random element deletion, etc.). - Large sim2real gap: Synthetic priors fail to capture the structured, semantically correlated nature of real-world changes (e.g., adding a bike lane entails modifying lane markings and connectivity), leading to significant performance degradation in real scenarios. - Insufficient evaluation metrics: Standard mAP cannot distinguish between "preserving unchanged regions" and "correctly updating changed regions." Models trained on different prior qualities may achieve nearly identical mAP scores (experimentally verified: ArgoTweak vs. synthetic priors differ by only ~1% mAP, despite large qualitative gaps).

Key Challenge: Achieving truly self-updating HD maps requires addressing deficiencies in data, models, and evaluation simultaneously.

Core Idea: Construct a real-prior dataset annotated with a bijective mapping framework, enabling models to learn interpretable element-level change detection and updating.

Method

Overall Architecture

ArgoTweak contributes across three dimensions — dataset, model, and metrics: - Dataset: Built upon the Argoverse 2 Map Change Dataset with manually annotated real prior maps, forming complete triplets. - Model: LaneSegNet backbone + map prior encoder + interpretable change assessment heads. - Metrics: mAPC (change-aware precision) + mACC (change detection accuracy).

Key Designs

  1. Bijective Change Mapping:

    • Function: Systematically decomposes large-scale map modifications into traceable element-level atomic changes.
    • Mechanism: Defines an atomic change set \(\mathbf{A} = \{\)geometry, markings, type, connectivity, insertion, deletion\(\}\) and a structural update set \(\hat{\mathbf{Y}} = \{\)shape, appearance, function, lane graph, lane number\(\}\). A mapping between the two is constructed via surjectivity (every structural change can be explained by a combination of atomic changes) and injectivity (a given structural change cannot have two fundamentally different representations).
    • Ambiguity resolution: Disambiguation rules based on lane graph topology are introduced — when a change also modifies topology (adding/removing connections), it is represented as insertion/deletion; otherwise, in-place edits (geometry/marking/type) are used.
    • Design Motivation: Ensures annotation consistency and interpretability, enabling models to learn structured change patterns rather than random perturbations.
  2. Dataset Construction:

    • Function: Constructs a training set (synthetic yet realistic priors) and a test set (real-world priors).
    • Mechanism: Training priors are generated by applying structured modifications within the bijective framework to Argoverse 2 ground-truth maps. The test set uses real outdated maps from the Argoverse 2 Map Change Dataset validation split, with re-annotated up-to-date ground truth.
    • Scale: 697 training / 102 validation / 111 test scenarios, average duration 56 s.
    • Additional processing: OpenLane-V2-style lane merging and unified crosswalk edge orientation.
  3. Interpretable Prior-Assisted Map Network:

    • Function: Achieves interpretable map updating while maintaining modeling flexibility.
    • Mechanism: Uses LaneSegNet as the backbone and adds a prior encoder (10-point 2D coordinates × left/right/centerline + one-hot marking type). Priors are injected into BEV features via cross-attention. Change assessment operates at two levels: a primary head (multi-class: No Change / Insertion / Deletion / Other) and secondary heads (binary: geometry and marking changes), with mutually exclusive and co-occurring categories handled separately.
    • Loss: \(\mathcal{L} = \lambda_{\text{vec}}\mathcal{L}_{\text{vec}} + \lambda_{\text{seg}}\mathcal{L}_{\text{seg}} + \lambda_{\text{cls}}\mathcal{L}_{\text{cls}} + \lambda_{\text{type}}\mathcal{L}_{\text{type}} + \lambda_{\text{cd,prim}}\mathcal{L}_{\text{cd,prim}} + \sum_i \lambda_{\text{cd,sec}}^i \mathcal{L}_{\text{cd,sec}}^i\)
  4. Change-Aware Dual Metrics (mAPC + mACC):

    • Function: Separately evaluate map stability (ability to preserve unchanged regions) and responsiveness (ability to update changed regions).
    • Mechanism:
      • mAPC: Requires that the predicted change state matches the ground-truth change state \(\hat{c}_V = c_V\) during prediction-to-ground-truth matching, computing AP per change category and averaging.
      • mACC: Computes binary detection accuracy per frame and per change type, averaging over accuracies for positive (changed) and negative (unchanged) cases.
    • Design Motivation: High mACC but low mAPC indicates that changes are detected but not accurately localized; low mACC indicates an overly conservative model. Plain mAP cannot capture these distinctions.

Loss & Training

  • 10 epochs, batch size 8, AdamW optimizer, 8× NVIDIA A10G.
  • ResNet-50 pretrained feature extractor, camera-only input.
  • Map crop 50×50 m², ~4 FPS (single A10G).

Key Experimental Results

Main Results

mAP comparison under different priors (without change annotation):

Prior Type AP_ls AP_pc mAP
No prior (baseline) 32.9 45.9 39.4
Continuous perturbation (noise) 71.6 75.5 73.5
Discrete modification (deletion/shift) 71.0 71.9 71.5
Rule-based editing 74.2 79.3 76.7
ArgoTweak 75.8 79.6 77.7
  • The mAP gap is only ~1%, yet qualitative differences are substantial: the ArgoTweak model captures complex road updates, whereas synthetic-prior models can only make minor corrections or overfit to marking changes.

Sim2Real gap evaluation:

Training Prior Δ mACC (Val→Test)
Rule-based editing -36.0
ArgoTweak -3.5

The sim2real gap is reduced by more than 10× when training on ArgoTweak.

Ablation Study

Comparison of annotation granularity:

Change Annotation mAP mAPC mACC
None 77.7 - -
Binary annotation (c/¬c) 77.5 66.5 70.5
Full atomic annotation 78.8 64.6 8.7*

*Note: mACC is more stringent under fine-grained annotation due to the need to distinguish multiple change types.

  • Change annotations not only support fine-grained evaluation but also serve as auxiliary training signals (annotated models achieve higher mAP).
  • Removing geometry change annotation improves mAPC (current map generation precision is insufficient to reliably distinguish subtle shape changes).

Personal Thoughts

  • Highlights: The first dataset to provide complete "prior–sensor–ground-truth" triplets; the bijective mapping framework is formally elegant and practically useful. Experiments clearly demonstrate the inadequacy of mAP and the severity of the sim2real gap.
  • Limitations: Training set priors are still manually modified (not real outdated maps); the test set is relatively small (111 scenarios); only camera input is used, without exploiting LiDAR.
  • Insights: Interpretable change annotations serve not only as an evaluation tool but also as effective training signals. The "prior-assisted" map generation paradigm holds broad application potential.

Highlights & Insights

Limitations & Future Work

Rating

  • Novelty: TBD
  • Experimental Thoroughness: TBD
  • Writing Quality: TBD
  • Value: TBD