OrthoLoC: UAV 6-DoF Localization and Calibration Using Orthographic Geodata¶

Conference: NeurIPS 2025 arXiv: 2509.18350 Code: Project Page Area: Other / Visual Localization Keywords: UAV localization, 6-DoF, orthographic geodata, feature matching, domain adaptation

TL;DR¶

OrthoLoC establishes the first large-scale UAV 6-DoF localization benchmark dataset based on orthographic geodata (DOP+DSM), comprising 16,425 real UAV images across 47 regions in Germany and the United States. It further introduces AdHoP (Adaptive Homography Preprocessing), a matching enhancement technique that improves matching performance by 95% and reduces translation error by 63% without modifying the underlying feature matcher.

Background & Motivation¶

Background: UAV visual localization is critical for applications such as digital twins, search and rescue, and infrastructure inspection. Existing methods rely on image database retrieval (imprecise) or 3D model matching (memory- and compute-intensive), making them infeasible in resource-constrained environments.

Limitations of Prior Work: (1) Orthographic geodata (digital orthophoto DOP + digital surface model DSM) is lightweight and increasingly accessible (freely released by EU governments), yet no method fully exploits it; (2) no aligned cross-domain UAV–geodata benchmark exists; (3) severe domain discrepancy exists between perspective UAV images and orthographic reference data.

Key Challenge: Geodata requires only approximately 1/30 the storage of a 3D model for equivalent coverage, yet the fundamental geometric difference between perspective and orthographic projection makes direct feature matching challenging, especially at oblique viewing angles.

Goal: (1) Establish the first large-scale UAV–geodata aligned benchmark supporting 6-DoF localization evaluation; (2) address the matching residuals caused by the perspective–orthographic domain gap.

Key Insight: A paired dataset is constructed to decouple retrieval from localization evaluation, and the ground plane approximation assumption is leveraged to apply homographic preprocessing that narrows the domain gap.

Core Idea: Replace expensive 3D models and image databases with lightweight, publicly available government geodata (DOP+DSM) for UAV 6-DoF localization.

Method¶

Overall Architecture¶

The OrthoLoC localization pipeline proceeds as follows: (1) Initial localization — a feature matcher establishes 2D–2D correspondences between the UAV image and the DOP, which are then lifted to 3D–2D correspondences using the DSM, followed by RANSAC-EPnP pose estimation and Levenberg–Marquardt (LM) optimization for joint refinement of intrinsic and extrinsic parameters; (2) AdHoP refinement — a homography is estimated from the initial matches, the DOP is warped to reduce the perspective gap, a second round of feature matching is performed on the warped DOP, new matches are mapped back to original coordinates via \(\mathbf{H}^{-1}\), lifted to 3D, and used for pose refinement.

Key Designs¶

OrthoLoC Dataset Construction:
Function: Establishes the first large-scale geodata-aligned benchmark supporting 6-DoF UAV localization evaluation.
Mechanism: 16,425 real UAV images spanning 47 regions (19 cities, 2 countries) are collected. Accurate 3D reconstructions are built via SfM+MVS with GCP/RTK georeferencing. Paired DOP/DSM tiles (5 cm/px) are generated from the reconstructions, with matching regions precisely cropped via ray casting.
Design Motivation: The paired structure decouples pose estimation from image retrieval, enabling independent evaluation of localization algorithm performance.
Domain Augmentation Strategy:
Function: Introduces realistic cross-domain discrepancies for robustness evaluation.
Mechanism: Three sample categories are defined — same-domain (DOP/DSM generated from the reconstruction), cross-domain DOP (orthophotos with external visual differences), and cross-domain DOP+DSM (visual and structural differences). Cross-domain data is sourced from European government geoportals and manually verified for alignment with same-domain data.
Design Motivation: In real deployment, DOP/DSM may have been acquired months or years prior, naturally introducing appearance and structural changes that synthetic augmentation cannot replicate.
AdHoP (Adaptive Homography Preprocessing):
Function: Reduces the domain gap between perspective UAV images and orthographic DOP to improve feature matching.
Mechanism: A homography matrix \(\mathbf{H} \in \mathbb{R}^{3\times3}\) is estimated from the initial 2D–2D matches via normalized DLT+RANSAC. The DOP is warped by \(\mathbf{H}\) to approximate the UAV perspective, a second round of feature matching is performed on the warped DOP, new matches are mapped back to original coordinates via \(\mathbf{H}^{-1}\), lifted to 3D using the DSM, and used for pose refinement. The refinement is accepted only if the reprojection error decreases.
Design Motivation: In aerial scenes, elements such as roads, rooftops, and fields approximate planar surfaces, making homographic transformation an effective model for the geometric transition between perspective and orthographic projections. The method is matcher-agnostic and functions as a general-purpose plug-in.

Key Experimental Results¶

Main Results — UAV Localization (Test Set, w/o and w/ AdHoP)¶

Matcher	ME[px]↓	TE[m]↓	RE[°]↓	1m-1°[%]↑
SP+SuperGlue	2.2 / 2.2	0.36 / 0.35	0.15 / 0.15	63.9 / 64.4
GIM+DKM	—	—	—	Highest
XFeat	257.0 / 38.1	—	—	—

AdHoP Improvement Magnitude¶

Metric	Maximum Improvement
Feature matching	+95%
Translation error	−63%
Rotation error	Significant improvement

Dataset Comparison¶

Dataset	Images	Country	DoF	Geodata	Paired	Cross-domain
AnyVisLoc	18K	CN	3	DOP+DSM	✗	✓
UAVD4L	19K	CN	6	DSM	✗	✗
OrthoLoC	16.4K	US+DE	6	DOP+DSM	✓	✓

Key Findings¶

Existing SOTA matchers generalize to aerial viewpoints but suffer significant performance degradation under the perspective–orthographic domain gap.
AdHoP improves performance across all matchers, with the largest gains at oblique viewing angles.
High-resolution geodata (5 cm/px vs. 20 cm/px) substantially improves localization accuracy.
Camera calibration presents unique geometric ambiguities in aerial settings.

Highlights & Insights¶

Leveraging freely available government geodata as a replacement for expensive 3D models represents a practically motivated and impactful research direction.
The paired data structure decouples retrieval from localization evaluation, providing a principled basis for fair comparison.
AdHoP is elegantly simple — it requires no training, makes no dataset-specific assumptions, and imposes no domain constraints, making it a universal post-processing plug-in for arbitrary feature matchers.

Limitations & Future Work¶

The planar assumption underlying AdHoP may fail in highly irregular terrain such as mountainous regions.
The current approach supports only single-frame localization and does not exploit temporal constraints from video sequences.
Data is primarily collected from Germany and the United States; generalization to extreme environments such as tropical or desert regions remains unvalidated.

vs. AnyVisLoc: Supports only 3-DoF and exhibits alignment errors between reference data and aerial imagery.
vs. UAVD4L/LoDLoc: These methods rely on 3D models (LoD or mesh), incurring substantially higher storage overhead than DOP+DSM.
vs. GIM+DKM/RoMA: These serve as the feature matcher backbones evaluated on OrthoLoC; AdHoP is shown to further enhance their performance.

Rating¶

Novelty: ⭐⭐⭐⭐ — First UAV 6-DoF localization benchmark and method leveraging orthographic geodata.
Experimental Thoroughness: ⭐⭐⭐⭐⭐ — Comprehensive evaluation across 47 regions, multiple matchers, and analyses of localization, calibration, and domain variation.
Writing Quality: ⭐⭐⭐⭐ — Systematic presentation with detailed dataset description.
Value: ⭐⭐⭐⭐ — Significant practical value for UAV autonomous navigation and localization in resource-constrained environments.