Skip to content

OrthoLoC: UAV 6-DoF Localization and Calibration Using Orthographic Geodata

Conference: NeurIPS 2025 arXiv: 2509.18350 Code: Project Page Area: Remote Sensing / Visual Localization Keywords: UAV Localization, 6-DoF, Orthographic Geodata, Feature Matching, Domain Adaptation

TL;DR

OrthoLoC introduces the first large-scale UAV 6-DoF localization benchmark built upon orthographic geodata (DOP+DSM), comprising 16,425 real UAV images across 47 regions in Germany and the United States. It further proposes AdHoP (Adaptive Homography Preprocessing), a plug-and-play matching enhancement that improves matching performance by 95% and reduces translation error by 63% without modifying the underlying feature matcher.

Background & Motivation

Background UAV visual localization is critical for applications such as digital twins, search and rescue, and infrastructure inspection. Existing approaches rely on image database retrieval (imprecise) or 3D model matching (memory- and computation-intensive), rendering them infeasible in resource-constrained environments.

Limitations of Prior Work (1) Orthographic geodata (Digital Orthophoto DOP + Digital Surface Model DSM) is lightweight and increasingly accessible (freely released by EU governments), yet no existing method fully exploits it; (2) aligned cross-domain UAV–geodata benchmarks are lacking; (3) severe domain discrepancy exists between perspective UAV imagery and orthographic reference data.

Key Challenge Geodata covering the same area requires roughly 1/30 the storage of a 3D model, yet the fundamental geometric difference between perspective and orthographic projection makes direct feature matching highly challenging, particularly at oblique viewing angles.

Goal (1) Establish the first large-scale UAV–geodata aligned benchmark for 6-DoF localization evaluation; (2) address the residual matching degradation caused by perspective–orthographic domain discrepancy.

Key Insight A paired dataset is constructed to decouple retrieval from localization evaluation, and the ground-plane approximation is exploited to perform homography preprocessing that narrows the domain gap.

Core Idea Replace expensive 3D models and image databases with lightweight, publicly available government geodata (DOP+DSM) for UAV 6-DoF localization.

Method

Overall Architecture

The OrthoLoC localization pipeline proceeds as follows: (1) Initial localization — a feature matcher establishes 2D–2D correspondences between the UAV image and DOP, which are then lifted to 3D–2D correspondences using the DSM, followed by RANSAC-EPnP pose estimation and Levenberg–Marquardt (LM) optimization for joint refinement of intrinsic and extrinsic parameters; (2) AdHoP refinement — a homography is estimated from the initial matches and applied to warp the DOP closer to the UAV perspective, after which a second round of feature matching is performed, the new matches are mapped back to original coordinates via \(\mathbf{H}^{-1}\), lifted to 3D, and used to refine the pose.

Key Designs

  1. OrthoLoC Dataset Construction:

    • Function: Establish the first large-scale geodata-aligned benchmark for 6-DoF UAV localization evaluation.
    • Mechanism: 16,425 real UAV images spanning 47 regions (19 cities, 2 countries) are collected. Accurate 3D reconstructions are built via SfM+MVS with GCP/RTK geo-referencing. Paired DOP/DSM tiles at 5 cm/px are generated from the reconstructions, with corresponding image regions precisely cropped via ray casting.
    • Design Motivation: The paired structure decouples pose estimation from image retrieval, enabling independent evaluation of localization algorithms.
  2. Domain Augmentation Strategy:

    • Function: Introduce realistic cross-domain variation for robustness evaluation.
    • Mechanism: Three sample categories are defined — same-domain (DOP/DSM synthesized from the reconstruction), cross-domain DOP (orthophotos with visual appearance differences from external sources), and cross-domain DOP+DSM (combined visual and structural differences). Cross-domain data are sourced from European government geoportals and manually verified for alignment.
    • Design Motivation: In real deployments, DOP/DSM may have been captured months or years prior, naturally introducing appearance and structural changes that synthetic augmentation cannot faithfully reproduce.
  3. AdHoP (Adaptive Homography Preprocessing):

    • Function: Reduce the domain gap between perspective UAV images and orthographic DOP to improve feature matching.
    • Mechanism: A homography matrix \(\mathbf{H} \in \mathbb{R}^{3\times3}\) is estimated from initial 2D–2D matches using normalized DLT with RANSAC. The DOP is then warped by \(\mathbf{H}\) to approximate the UAV perspective, a second round of feature matching is performed on the warped DOP, the resulting matches are mapped back to original coordinates via \(\mathbf{H}^{-1}\), lifted to 3D using the DSM, and used to refine the pose. The refinement is accepted only when the reprojection error decreases after the second pass.
    • Design Motivation: In aerial scenes, elements such as roads, rooftops, and fields are approximately planar; homographic transformation can therefore effectively approximate the geometric conversion from perspective to orthographic projection. The method is matcher-agnostic and functions as a universal plug-in post-processing step.

Key Experimental Results

Main Results — UAV Localization (Test Set, w/o and w/ AdHoP)

Matcher ME[px]↓ TE[m]↓ RE[°]↓ 1m-1°[%]↑
SP+SuperGlue 2.2/2.2 0.36/0.35 0.15/0.15 63.9/64.4
GIM+DKM Highest
XFeat 257.0/38.1

AdHoP Improvement Margins

Metric Maximum Improvement
Feature Matching +95%
Translation Error −63%
Rotation Error Substantial improvement

Dataset Comparison

Dataset # Images Country DoF Geodata Paired Cross-Domain
AnyVisLoc 18K CN 3 DOP+DSM
UAVD4L 19K CN 6 DSM
OrthoLoC 16.4K US+DE 6 DOP+DSM

Key Findings

  • Existing SOTA matchers generalize to aerial viewpoints but suffer notable performance degradation under perspective–orthographic domain discrepancy.
  • AdHoP consistently improves all evaluated matchers, with the largest gains observed at oblique viewing angles.
  • High-resolution geodata (5 cm/px vs. 20 cm/px) substantially improves localization accuracy.
  • Camera calibration in aerial settings faces unique geometric ambiguities.

Highlights & Insights

  • Leveraging freely available government geodata as a substitute for expensive 3D models represents a highly pragmatic research direction.
  • The paired data structure decouples retrieval from localization evaluation, providing a principled basis for fair comparison.
  • AdHoP is deliberately minimalist — it requires no training, is dataset-agnostic, makes no domain-specific assumptions, and can serve as a universal post-processing wrapper for any feature matcher.

Limitations & Future Work

  • The planar assumption underlying AdHoP may break down in terrain with significant elevation variation (e.g., mountainous regions).
  • The current approach supports only single-frame localization, leaving temporal constraints from video sequences unexploited.
  • The dataset is primarily collected from Germany and the United States; generalization to extreme environments such as tropical or desert regions remains unverified.
  • vs. AnyVisLoc: Supports only 3-DoF localization and exhibits alignment errors between reference and aerial data.
  • vs. UAVD4L/LoDLoc: These methods rely on 3D models (LoD or mesh), incurring storage overhead far exceeding that of DOP+DSM.
  • vs. GIM+DKM/RoMA: These serve as the matcher backbones evaluated on OrthoLoC; AdHoP further enhances their performance.

Rating

  • Novelty: ⭐⭐⭐⭐ First UAV 6-DoF localization benchmark and method built upon orthographic geodata.
  • Experimental Thoroughness: ⭐⭐⭐⭐⭐ Comprehensive evaluation across 47 regions, multiple matchers, and full analysis covering localization, calibration, and domain shift.
  • Writing Quality: ⭐⭐⭐⭐ Systematic presentation with thorough dataset description.
  • Value: ⭐⭐⭐⭐ Significant practical value for UAV autonomous navigation and localization in resource-constrained environments.