ArgoTweak: Towards Self-Updating HD Maps through Structured Priors¶
Conference: ICCV 2025 arXiv: 2509.08764 Code: https://KTH-RPL.github.io/ArgoTweak/ Area: Interpretability Keywords: HD Map, Map Updating, Change Detection, Dataset, Autonomous Driving
TL;DR¶
This paper proposes ArgoTweak, the first HD map dataset providing complete triplets of "prior map + current sensor data + up-to-date ground-truth map." It decomposes large-scale map modifications into element-level atomic changes via a bijective change mapping framework, and introduces interpretable evaluation metrics (mAPC/mACC). Models trained on ArgoTweak reduce the sim2real gap by more than 10× compared to synthetic-prior baselines.
Background & Motivation¶
Background: HD maps are a core component of autonomous driving, providing precise lane-level information. Recent work has shifted from manual annotation toward end-to-end online generation based on BEV features (e.g., MapTR, LaneSegNet). Some methods attempt to leverage prior maps to improve generation quality.
Limitations of Prior Work: - Lack of complete data: No public dataset simultaneously provides triplets of "prior map + current sensor data + up-to-date ground-truth map." Existing methods can only train with synthetic priors (scripted modifications, noise injection, random element deletion, etc.). - Large sim2real gap: Synthetic priors fail to capture the structured, semantically correlated nature of real-world changes (e.g., adding a bike lane entails modifying lane markings and connectivity), leading to significant performance degradation in real scenarios. - Insufficient evaluation metrics: Standard mAP cannot distinguish between "preserving unchanged regions" and "correctly updating changed regions." Models trained on different prior qualities may achieve nearly identical mAP scores (experimentally verified: ArgoTweak vs. synthetic priors differ by only ~1% mAP, despite large qualitative gaps).
Key Challenge: Achieving truly self-updating HD maps requires addressing deficiencies in data, models, and evaluation simultaneously.
Core Idea: Construct a real-prior dataset annotated with a bijective mapping framework, enabling models to learn interpretable element-level change detection and updating.
Method¶
Overall Architecture¶
ArgoTweak contributes across three dimensions — dataset, model, and metrics: - Dataset: Built upon the Argoverse 2 Map Change Dataset with manually annotated real prior maps, forming complete triplets. - Model: LaneSegNet backbone + map prior encoder + interpretable change assessment heads. - Metrics: mAPC (change-aware precision) + mACC (change detection accuracy).
Key Designs¶
-
Bijective Change Mapping:
- Function: Systematically decomposes large-scale map modifications into traceable element-level atomic changes.
- Mechanism: Defines an atomic change set \(\mathbf{A} = \{\)geometry, markings, type, connectivity, insertion, deletion\(\}\) and a structural update set \(\hat{\mathbf{Y}} = \{\)shape, appearance, function, lane graph, lane number\(\}\). A mapping between the two is constructed via surjectivity (every structural change can be explained by a combination of atomic changes) and injectivity (a given structural change cannot have two fundamentally different representations).
- Ambiguity resolution: Disambiguation rules based on lane graph topology are introduced — when a change also modifies topology (adding/removing connections), it is represented as insertion/deletion; otherwise, in-place edits (geometry/marking/type) are used.
- Design Motivation: Ensures annotation consistency and interpretability, enabling models to learn structured change patterns rather than random perturbations.
-
Dataset Construction:
- Function: Constructs a training set (synthetic yet realistic priors) and a test set (real-world priors).
- Mechanism: Training priors are generated by applying structured modifications within the bijective framework to Argoverse 2 ground-truth maps. The test set uses real outdated maps from the Argoverse 2 Map Change Dataset validation split, with re-annotated up-to-date ground truth.
- Scale: 697 training / 102 validation / 111 test scenarios, average duration 56 s.
- Additional processing: OpenLane-V2-style lane merging and unified crosswalk edge orientation.
-
Interpretable Prior-Assisted Map Network:
- Function: Achieves interpretable map updating while maintaining modeling flexibility.
- Mechanism: Uses LaneSegNet as the backbone and adds a prior encoder (10-point 2D coordinates × left/right/centerline + one-hot marking type). Priors are injected into BEV features via cross-attention. Change assessment operates at two levels: a primary head (multi-class: No Change / Insertion / Deletion / Other) and secondary heads (binary: geometry and marking changes), with mutually exclusive and co-occurring categories handled separately.
- Loss: \(\mathcal{L} = \lambda_{\text{vec}}\mathcal{L}_{\text{vec}} + \lambda_{\text{seg}}\mathcal{L}_{\text{seg}} + \lambda_{\text{cls}}\mathcal{L}_{\text{cls}} + \lambda_{\text{type}}\mathcal{L}_{\text{type}} + \lambda_{\text{cd,prim}}\mathcal{L}_{\text{cd,prim}} + \sum_i \lambda_{\text{cd,sec}}^i \mathcal{L}_{\text{cd,sec}}^i\)
-
Change-Aware Dual Metrics (mAPC + mACC):
- Function: Separately evaluate map stability (ability to preserve unchanged regions) and responsiveness (ability to update changed regions).
- Mechanism:
- mAPC: Requires that the predicted change state matches the ground-truth change state \(\hat{c}_V = c_V\) during prediction-to-ground-truth matching, computing AP per change category and averaging.
- mACC: Computes binary detection accuracy per frame and per change type, averaging over accuracies for positive (changed) and negative (unchanged) cases.
- Design Motivation: High mACC but low mAPC indicates that changes are detected but not accurately localized; low mACC indicates an overly conservative model. Plain mAP cannot capture these distinctions.
Loss & Training¶
- 10 epochs, batch size 8, AdamW optimizer, 8× NVIDIA A10G.
- ResNet-50 pretrained feature extractor, camera-only input.
- Map crop 50×50 m², ~4 FPS (single A10G).
Key Experimental Results¶
Main Results¶
mAP comparison under different priors (without change annotation):
| Prior Type | AP_ls | AP_pc | mAP |
|---|---|---|---|
| No prior (baseline) | 32.9 | 45.9 | 39.4 |
| Continuous perturbation (noise) | 71.6 | 75.5 | 73.5 |
| Discrete modification (deletion/shift) | 71.0 | 71.9 | 71.5 |
| Rule-based editing | 74.2 | 79.3 | 76.7 |
| ArgoTweak | 75.8 | 79.6 | 77.7 |
- The mAP gap is only ~1%, yet qualitative differences are substantial: the ArgoTweak model captures complex road updates, whereas synthetic-prior models can only make minor corrections or overfit to marking changes.
Sim2Real gap evaluation:
| Training Prior | Δ mACC (Val→Test) |
|---|---|
| Rule-based editing | -36.0 |
| ArgoTweak | -3.5 |
The sim2real gap is reduced by more than 10× when training on ArgoTweak.
Ablation Study¶
Comparison of annotation granularity:
| Change Annotation | mAP | mAPC | mACC |
|---|---|---|---|
| None | 77.7 | - | - |
| Binary annotation (c/¬c) | 77.5 | 66.5 | 70.5 |
| Full atomic annotation | 78.8 | 64.6 | 8.7* |
*Note: mACC is more stringent under fine-grained annotation due to the need to distinguish multiple change types.
- Change annotations not only support fine-grained evaluation but also serve as auxiliary training signals (annotated models achieve higher mAP).
- Removing geometry change annotation improves mAPC (current map generation precision is insufficient to reliably distinguish subtle shape changes).
Personal Thoughts¶
- Highlights: The first dataset to provide complete "prior–sensor–ground-truth" triplets; the bijective mapping framework is formally elegant and practically useful. Experiments clearly demonstrate the inadequacy of mAP and the severity of the sim2real gap.
- Limitations: Training set priors are still manually modified (not real outdated maps); the test set is relatively small (111 scenarios); only camera input is used, without exploiting LiDAR.
- Insights: Interpretable change annotations serve not only as an evaluation tool but also as effective training signals. The "prior-assisted" map generation paradigm holds broad application potential.
Highlights & Insights¶
Limitations & Future Work¶
Related Work & Insights¶
Rating¶
- Novelty: TBD
- Experimental Thoroughness: TBD
- Writing Quality: TBD
- Value: TBD