🛰️ Remote Sensing¶
📹 ICCV2025 · 11 paper notes
- AstroLoc: Robust Space to Ground Image Localizer
-
This paper proposes AstroLoc, the first space-to-ground localization model trained on 300K manually annotated astronaut photographs. Through a query-satellite pairwise loss and unsupervised mining technique, the model learns robust representations of Earth's surface, achieving an average improvement of 35% in Recall@1, consistently exceeding 99% in Recall@100, and has already localized over 500K photographs in real-world deployment.
- CityNav: A Large-Scale Dataset for Real-World Aerial Navigation
-
This paper introduces CityNav, the first large-scale aerial vision-and-language navigation dataset for real-world urban environments, comprising 32,637 human demonstration trajectories covering 4.65 km². A Geo-Semantic Map (GSM) auxiliary representation is proposed and shown to significantly improve baseline navigation performance.
- GeoDistill: Geometry-Guided Self-Distillation for Weakly Supervised Cross-View Localization
-
This paper proposes GeoDistill, a framework that enhances locally discriminative feature learning via a Field-of-View (FoV) occlusion-based teacher-student self-distillation paradigm. Under weakly supervised conditions (requiring only coarse GPS annotations), it achieves robust cross-view localization with performance improvements exceeding 10%, and can be applied as a plug-and-play component to different localization frameworks.
- GeoExplorer: Active Geo-Localization with Curiosity-Driven Exploration
-
This paper proposes GeoExplorer, an active geo-localization (AGL) agent that integrates goal-directed extrinsic rewards with curiosity-driven intrinsic rewards. By jointly modeling action-state dynamics and curiosity-based exploration within a reinforcement learning framework, GeoExplorer achieves more robust UAV search strategies and demonstrates superior generalization to unseen targets and environments.
- Information-Bottleneck Driven Binary Neural Network for Change Detection
-
This paper proposes BiCD, the first binary neural network specifically designed for change detection. By introducing an auxiliary objective module guided by the Information Bottleneck (IB) principle, BiCD enhances the feature representation capability and separability of BNNs, achieving state-of-the-art performance among BNN-based methods on both street-view and remote sensing change detection benchmarks, while achieving 30× memory compression and 2.5× inference acceleration.
- Pan-Crafter: Learning Modality-Consistent Alignment for Pan-Sharpening
-
PAN-Crafter proposes a modality-consistent alignment framework that explicitly addresses cross-modal misregistration between PAN and MS images via Modality-Adaptive Reconstruction (MARs) and Cross-Modal Misalignment-aware Multi-scale Attention (CM3A), achieving state-of-the-art performance on multiple remote sensing benchmarks while running 1110× faster than diffusion-based methods.
- RS-vHeat: Heat Conduction Guided Efficient Remote Sensing Foundation Model
-
This work is the first to introduce the physical heat conduction process into a remote sensing foundation model. RS-vHeat replaces the attention mechanism with a Heat Conduction Operator (HCO) to model local region correlations in remote sensing images, achieving strong performance across 4 tasks and 10 datasets while reducing GPU memory by 84%, FLOPs by 24%, and improving throughput by 2.7× compared to the attention-based baseline.
- SkySense V2: A Unified Foundation Model for Multi-Modal Remote Sensing
-
This paper proposes SkySense V2, which employs a single unified Transformer backbone to process three remote sensing modalities — high-resolution optical, multispectral, and SAR imagery — and introduces Adaptive Patch Merging (APM), modality-specific prompt tokens, and Query-based Semantic Aggregation Contrastive Learning (QSACL) for pre-training. With only 665M parameters (vs. 1.26B in the predecessor SkySense), SkySense V2 achieves an average improvement of 1.8 points across 7 tasks on 16 datasets.
- SMARTIES: Spectrum-Aware Multi-Sensor Auto-Encoder for Remote Sensing Images
-
This paper proposes SMARTIES, a unified sensor-agnostic foundation model for remote sensing that maps heterogeneous sensor data into a shared space via spectrum-aware projection. Combined with cross-sensor token mixing and masked reconstruction for self-supervised pre-training, SMARTIES surpasses sensor-specific models on both unimodal and multimodal tasks and generalizes to sensors unseen during pre-training.
- Towards a Unified Copernicus Foundation Model for Earth Vision
-
This work presents a unified Earth observation foundation model system covering all major Copernicus Sentinel tasks, comprising the Copernicus-Pretrain dataset with 18.7 million aligned images, the Copernicus-FM model supporting arbitrary spectral and non-spectral sensors, and the Copernicus-Bench evaluation benchmark spanning 15 hierarchical downstream tasks.
- WildSAT: Learning Satellite Image Representations from Wildlife Observations
-
This paper proposes WildSAT, which leverages millions of geotagged wildlife observations from citizen science platforms to align satellite images, species locations, and textual descriptions via contrastive learning, substantially improving remote sensing representation quality and enabling zero-shot text-based retrieval.