ArgoTweak: Towards Self-Updating HD Maps through Structured Priors¶

Conference: ICCV 2025 arXiv: 2509.08764 Code: https://KTH-RPL.github.io/ArgoTweak/ Area: Interpretability Keywords: HD Map, Map Updating, Change Detection, Dataset, Autonomous Driving

TL;DR¶

This paper proposes ArgoTweak, the first HD map dataset providing complete triplets of "prior map + current sensor data + up-to-date ground-truth map." It decomposes large-scale map modifications into element-level atomic changes via a bijective change mapping framework, and introduces interpretable evaluation metrics (mAPC/mACC). Models trained on ArgoTweak reduce the sim2real gap by more than 10× compared to synthetic-prior baselines.

Background & Motivation¶

Background: HD maps are a core component of autonomous driving, providing precise lane-level information. Recent work has shifted from manual annotation toward end-to-end online generation based on BEV features (e.g., MapTR, LaneSegNet). Some methods attempt to leverage prior maps to improve generation quality.

Limitations of Prior Work: - Lack of complete data: No public dataset simultaneously provides triplets of "prior map + current sensor data + up-to-date ground-truth map." Existing methods can only train with synthetic priors (scripted modifications, noise injection, random element deletion, etc.). - Large sim2real gap: Synthetic priors fail to capture the structured, semantically correlated nature of real-world changes (e.g., adding a bike lane entails modifying lane markings and connectivity), leading to significant performance degradation in real scenarios. - Insufficient evaluation metrics: Standard mAP cannot distinguish between "preserving unchanged regions" and "correctly updating changed regions." Models trained on different prior qualities may achieve nearly identical mAP scores (experimentally verified: ArgoTweak vs. synthetic priors differ by only ~1% mAP, despite large qualitative gaps).

Key Challenge: Achieving truly self-updating HD maps requires addressing deficiencies in data, models, and evaluation simultaneously.

Core Idea: Construct a real-prior dataset annotated with a bijective mapping framework, enabling models to learn interpretable element-level change detection and updating.

Method¶

Overall Architecture¶

ArgoTweak contributes across three dimensions — dataset, model, and metrics: - Dataset: Built upon the Argoverse 2 Map Change Dataset with manually annotated real prior maps, forming complete triplets. - Model: LaneSegNet backbone + map prior encoder + interpretable change assessment heads. - Metrics: mAPC (change-aware precision) + mACC (change detection accuracy).

Key Designs¶

Bijective Change Mapping:
- Function: Systematically decomposes large-scale map modifications into traceable element-level atomic changes.
- Mechanism: Defines an atomic change set \(\mathbf{A} = \{\)geometry, markings, type, connectivity, insertion, deletion\(\}\) and a structural update set \(\hat{\mathbf{Y}} = \{\)shape, appearance, function, lane graph, lane number\(\}\). A mapping between the two is constructed via surjectivity (every structural change can be explained by a combination of atomic changes) and injectivity (a given structural change cannot have two fundamentally different representations).
- Ambiguity resolution: Disambiguation rules based on lane graph topology are introduced — when a change also modifies topology (adding/removing connections), it is represented as insertion/deletion; otherwise, in-place edits (geometry/marking/type) are used.
- Design Motivation: Ensures annotation consistency and interpretability, enabling models to learn structured change patterns rather than random perturbations.
Dataset Construction:
- Function: Constructs a training set (synthetic yet realistic priors) and a test set (real-world priors).
- Mechanism: Training priors are generated by applying structured modifications within the bijective framework to Argoverse 2 ground-truth maps. The test set uses real outdated maps from the Argoverse 2 Map Change Dataset validation split, with re-annotated up-to-date ground truth.
- Scale: 697 training / 102 validation / 111 test scenarios, average duration 56 s.
- Additional processing: OpenLane-V2-style lane merging and unified crosswalk edge orientation.
Interpretable Prior-Assisted Map Network:
- Function: Achieves interpretable map updating while maintaining modeling flexibility.
- Mechanism: Uses LaneSegNet as the backbone and adds a prior encoder (10-point 2D coordinates × left/right/centerline + one-hot marking type). Priors are injected into BEV features via cross-attention. Change assessment operates at two levels: a primary head (multi-class: No Change / Insertion / Deletion / Other) and secondary heads (binary: geometry and marking changes), with mutually exclusive and co-occurring categories handled separately.
- Loss: \(\mathcal{L} = \lambda_{\text{vec}}\mathcal{L}_{\text{vec}} + \lambda_{\text{seg}}\mathcal{L}_{\text{seg}} + \lambda_{\text{cls}}\mathcal{L}_{\text{cls}} + \lambda_{\text{type}}\mathcal{L}_{\text{type}} + \lambda_{\text{cd,prim}}\mathcal{L}_{\text{cd,prim}} + \sum_i \lambda_{\text{cd,sec}}^i \mathcal{L}_{\text{cd,sec}}^i\)
Change-Aware Dual Metrics (mAPC + mACC):
- Function: Separately evaluate map stability (ability to preserve unchanged regions) and responsiveness (ability to update changed regions).
- Mechanism:
  - mAPC: Requires that the predicted change state matches the ground-truth change state \(\hat{c}_V = c_V\) during prediction-to-ground-truth matching, computing AP per change category and averaging.
  - mACC: Computes binary detection accuracy per frame and per change type, averaging over accuracies for positive (changed) and negative (unchanged) cases.
- Design Motivation: High mACC but low mAPC indicates that changes are detected but not accurately localized; low mACC indicates an overly conservative model. Plain mAP cannot capture these distinctions.

Loss & Training¶

10 epochs, batch size 8, AdamW optimizer, 8× NVIDIA A10G.
ResNet-50 pretrained feature extractor, camera-only input.
Map crop 50×50 m², ~4 FPS (single A10G).

Key Experimental Results¶

Main Results¶

mAP comparison under different priors (without change annotation):

Prior Type	AP_ls	AP_pc	mAP
No prior (baseline)	32.9	45.9	39.4
Continuous perturbation (noise)	71.6	75.5	73.5
Discrete modification (deletion/shift)	71.0	71.9	71.5
Rule-based editing	74.2	79.3	76.7
ArgoTweak	75.8	79.6	77.7

The mAP gap is only ~1%, yet qualitative differences are substantial: the ArgoTweak model captures complex road updates, whereas synthetic-prior models can only make minor corrections or overfit to marking changes.

Sim2Real gap evaluation:

Training Prior	Δ mACC (Val→Test)
Rule-based editing	-36.0
ArgoTweak	-3.5

The sim2real gap is reduced by more than 10× when training on ArgoTweak.

Ablation Study¶

Comparison of annotation granularity:

Change Annotation	mAP	mAPC	mACC
None	77.7	-	-
Binary annotation (c/¬c)	77.5	66.5	70.5
Full atomic annotation	78.8	64.6	8.7*

*Note: mACC is more stringent under fine-grained annotation due to the need to distinguish multiple change types.

Change annotations not only support fine-grained evaluation but also serve as auxiliary training signals (annotated models achieve higher mAP).
Removing geometry change annotation improves mAPC (current map generation precision is insufficient to reliably distinguish subtle shape changes).

Personal Thoughts¶

Highlights: The first dataset to provide complete "prior–sensor–ground-truth" triplets; the bijective mapping framework is formally elegant and practically useful. Experiments clearly demonstrate the inadequacy of mAP and the severity of the sim2real gap.
Limitations: Training set priors are still manually modified (not real outdated maps); the test set is relatively small (111 scenarios); only camera input is used, without exploiting LiDAR.
Insights: Interpretable change annotations serve not only as an evaluation tool but also as effective training signals. The "prior-assisted" map generation paradigm holds broad application potential.

Highlights & Insights¶

Limitations & Future Work¶

Rating¶

Novelty: TBD
Experimental Thoroughness: TBD
Writing Quality: TBD
Value: TBD