Skip to content

🛰️ Remote Sensing

🔬 ICLR2026 · 6 paper notes

AutoFly: Vision-Language-Action Model for UAV Autonomous Navigation in the Wild

This paper proposes AutoFly, an end-to-end VLA model for UAV autonomous navigation in the wild. It infers spatial information from RGB inputs via a pseudo-depth encoder, and is trained on a newly constructed autonomous navigation dataset (13K+ trajectories including 1K real flights). AutoFly achieves 3.9% higher success rate and 2.6% lower collision rate than OpenVLA in both simulated and real environments.

Earth-Agent: Unlocking the Full Landscape of Earth Observation with Agents

Earth-Agent is the first Earth observation agent framework built upon an MCP-based tool ecosystem. It unifies RGB and spectral remote sensing data, dynamically invoking 104 expert tools to enable cross-modal, multi-step, and quantitative spatiotemporal reasoning. The accompanying Earth-Bench benchmark comprises 248 expert-curated tasks and 13,729 images. Experiments demonstrate that Earth-Agent substantially outperforms both general-purpose agents and remote sensing MLLMs.

Measuring the Intrinsic Dimension of Earth Representations

This work presents the first systematic measurement of the intrinsic dimension (ID) of Geographic Implicit Neural Representations (Geographic INR), finding that 256–512-dimensional embeddings have true IDs of only 2–10. Higher ID in frozen embedding spaces correlates positively with downstream performance, while lower ID in supervised task-head activation spaces correlates positively with performance, revealing a dual mechanism of "representativeness vs. task alignment."

Spectral Gaps and Spatial Priors: Studying Hyperspectral Downstream Adaptation Using TerraMind

This paper investigates whether TerraMind, a multimodal geospatial foundation model not pretrained on hyperspectral data, can be effectively adapted to hyperspectral downstream tasks via channel adaptation strategies (naive band selection vs. SRF-based grouping). Results demonstrate that naive band selection consistently outperforms the physically-informed SRF approach, with the performance gap widening as the spectral complexity of the task increases.

TAMMs: Change Understanding and Forecasting in Satellite Image Time Series with Temporal-Aware Multimodal Models

TAMMs is proposed as the first unified framework that jointly performs Temporal Change Description (TCD) and Future Satellite Image Forecasting (FSIF) within a single MLLM-diffusion architecture. A Temporal Adaptation Module (TAM) awakens the temporal reasoning capability of a frozen MLLM, while a Semantic Fusion Control Injection (SFCI) mechanism converts change understanding into generative control signals.

Task-free Adaptive Meta Black-box Optimization

This paper proposes ABOM—a task-free adaptive meta black-box optimizer that eliminates the need for predefined training task distributions. By parameterizing evolutionary operators (selection, crossover, mutation) as differentiable attention modules and leveraging self-generated data for online parameter updates during optimization, ABOM achieves competitive zero-shot performance on synthetic benchmarks and UAV path planning tasks.