🛰️ Remote Sensing¶
🔬 ICLR2026 · 6 paper notes
- AutoFly: Vision-Language-Action Model for UAV Autonomous Navigation in the Wild
-
This paper proposes AutoFly, an end-to-end VLA model for UAV autonomous navigation in the wild. It infers spatial information from RGB inputs via a pseudo-depth encoder, and is trained on a newly constructed autonomous navigation dataset (13K+ trajectories including 1K real flights). AutoFly achieves 3.9% higher success rate and 2.6% lower collision rate than OpenVLA in both simulated and real environments.
- Earth-Agent: Unlocking the Full Landscape of Earth Observation with Agents
-
Earth-Agent is the first Earth observation agent framework built upon an MCP-based tool ecosystem. It unifies RGB and spectral remote sensing data, dynamically invoking 104 expert tools to enable cross-modal, multi-step, and quantitative spatiotemporal reasoning. The accompanying Earth-Bench benchmark comprises 248 expert-curated tasks and 13,729 images. Experiments demonstrate that Earth-Agent substantially outperforms both general-purpose agents and remote sensing MLLMs.
- Measuring the Intrinsic Dimension of Earth Representations
-
This work presents the first systematic measurement of the intrinsic dimension (ID) of Geographic Implicit Neural Representations (Geographic INR), finding that 256–512-dimensional embeddings have true IDs of only 2–10. Higher ID in frozen embedding spaces correlates positively with downstream performance, while lower ID in supervised task-head activation spaces correlates positively with performance, revealing a dual mechanism of "representativeness vs. task alignment."
- Spectral Gaps and Spatial Priors: Studying Hyperspectral Downstream Adaptation Using TerraMind
-
This paper investigates whether TerraMind, a multimodal geospatial foundation model not pretrained on hyperspectral data, can be effectively adapted to hyperspectral downstream tasks via channel adaptation strategies (naive band selection vs. SRF-based grouping). Results demonstrate that naive band selection consistently outperforms the physically-informed SRF approach, with the performance gap widening as the spectral complexity of the task increases.
- TAMMs: Change Understanding and Forecasting in Satellite Image Time Series with Temporal-Aware Multimodal Models
-
TAMMs is proposed as the first unified framework that jointly performs Temporal Change Description (TCD) and Future Satellite Image Forecasting (FSIF) within a single MLLM-diffusion architecture. A Temporal Adaptation Module (TAM) awakens the temporal reasoning capability of a frozen MLLM, while a Semantic Fusion Control Injection (SFCI) mechanism converts change understanding into generative control signals.
- Task-free Adaptive Meta Black-box Optimization
-
This paper proposes ABOM—a task-free adaptive meta black-box optimizer that eliminates the need for predefined training task distributions. By parameterizing evolutionary operators (selection, crossover, mutation) as differentiable attention modules and leveraging self-generated data for online parameter updates during optimization, ABOM achieves competitive zero-shot performance on synthetic benchmarks and UAV path planning tasks.