🛰️ Remote Sensing¶

🎞️ ECCV2024 · 6 paper notes

📌 Same area in other venues: 📷 CVPR2026 (63) · 🔬 ICLR2026 (11) · 🧪 ICML2026 (3) · 🤖 AAAI2026 (7) · 🧠 NeurIPS2025 (12) · 📹 ICCV2025 (11)

🔥 Top topics: Remote Sensing ×3

Adapting Fine-Grained Cross-View Localization to Areas without Fine Ground Truth: To address the performance degradation of fine-grained cross-view localization models when deployed in new areas, a weakly-supervised learning method based on knowledge self-distillation is proposed. Employing three strategies—mode-based pseudo GT generation, coarse-level supervision, and outlier filtering—it reduces localization errors by 12% to 20% on VIGOR and KITTI using only ground-to-aerial image pairs from the target area (without requiring precise GT).
ConGeo: Robust Cross-View Geo-Localization Across Ground View Variations: This paper proposes ConGeo, a model-agnostic single-view + cross-view contrastive learning framework. By enforcing feature consistency across different ground view variations at the same location, it enables a single model to achieve robust cross-view geo-localization under arbitrary orientations and fields of view (FoV).
Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach: This work constructs G2A-VReID, the first ground-to-aerial cross-platform video person re-identification dataset, and proposes the VSLA-CLIP method, which adapts CLIP to video ReID tasks through visual-semantic alignment and a parameter-efficient Video Set-Level-Adapter.
Learning Representations of Satellite Images From Metadata Supervision: This paper proposes SatMIP (Satellite Metadata-Image Pretraining), which represents satellite image metadata (such as time, geographic location, sensor information, etc.) as text descriptions to align images and metadata in a shared embedding space via an image-metadata contrastive learning task. This constructs satellite image representations that encode both visual features and semantic information. It further introduces SatMIPS (combining image self-supervision and metadata supervision), which outperforms purely visual self-supervised methods like SimCLR on multiple remote sensing downstream tasks.
Masked Angle-Aware Autoencoder for Remote Sensing Images: The authors propose MA3E, which explicitly introduces angle variations into MAE pre-training (by constructing rotational crops via scaling center crop) and automatically assigns reconstruction targets using an optimal transport loss. This allows the model to perceive the diverse angles of remote sensing objects and learn rotation-invariant representations.
Weakly-Supervised Camera Localization by Ground-to-Satellite Image Registration: Proposes the first weakly-supervised ground-to-satellite image registration localization method. By training an orientation estimator on satellite-to-satellite pairs in a self-supervised manner and training a translation estimator via contrastive learning, it achieves the best cross-area generalization performance without requiring accurate ground-truth (GT) pose labels, outperforming most fully-supervised SOTA methods.