🛰️ Remote Sensing¶
📷 CVPR2026 · 19 paper notes
- ACPV-Net: All-Class Polygonal Vectorization for Seamless Vector Map Generation from Aerial Imagery
-
ACPV-Net is the first framework that generates topologically consistent all-class polygonal vector maps from aerial imagery in a single pass, employing a semantically supervised conditional diffusion model for vertex heatmap generation and proposition-driven PSLG reconstruction to ensure zero gaps and zero overlaps.
- AVION: Aerial Vision-Language Instruction from Offline Teacher to Prompt-Tuned Network
-
AVION proposes a knowledge distillation framework that generates semantically rich text prototypes via LLMs and employs visual-textual dual-side prompt tuning with tri-aspect alignment distillation, addressing semantic poverty and visual rigidity in remote sensing VLM adaptation and comprehensively surpassing SOTA on few-shot classification, base-to-novel generalization, and cross-modal retrieval.
- AVION: Aerial Vision-Language Instruction from Offline Teacher to Prompt-Tuned Network
-
AVION proposes a knowledge distillation framework that uses LLM-generated semantically rich remote sensing text prototypes as teacher supervision while injecting learnable prompts into both the visual and text encoders of the student, achieving tri-aspect alignment distillation that significantly outperforms existing PEFT methods on few-shot classification and cross-modal retrieval.
- Conflated Inverse Modeling for Urban Vegetation Patterns
-
A framework conflating a forward prediction model with a diffusion-based inverse generative model to produce diverse yet physically plausible urban vegetation spatial configurations (NDVI patterns) under specified temperature change targets, achieving 3.4× diversity improvement while reducing temperature control error by 37%.
- Cross-modal Fuzzy Alignment Network for Text-Aerial Person Retrieval and A Large-scale Benchmark
-
A cross-modal fuzzy alignment network (CFAN) that leverages fuzzy logic to quantify token-level reliability for fine-grained alignment and introduces ground-view bridging to alleviate the semantic gap between aerial images and text descriptions, along with a large-scale text-aerial person retrieval benchmark AERI-PEDES.
- Exploring Spatiotemporal Feature Propagation for Video-Level Compressive Spectral Reconstruction
-
The first work to advance spectral compressed imaging (SCI) from image-level to video-level reconstruction, introducing the first high-quality dynamic hyperspectral dataset DynaSpec (30 sequences / 300 frames), and proposing PG-SVRT with spatial-then-temporal attention plus bridge tokens that achieves 41.52 dB PSNR with optimal temporal consistency at lower FLOPs (28.18G) than several image-level SOTAs.
- GeoFlow: Real-Time Fine-Grained Cross-View Geolocalization via Iterative Flow Prediction
-
GeoFlow is a lightweight flow-matching-inspired framework for fine-grained cross-view geolocalization (FG-CVG). It learns probabilistic displacement fields combined with an iterative refinement sampling (IRS) algorithm to achieve precise 2-DoF localization from ground to satellite images in continuous space, reaching SOTA-competitive accuracy at 29 FPS real-time speed.
- GeoFlow: Real-Time Fine-Grained Cross-View Geolocalization via Iterative Flow Prediction
-
GeoFlow reformulates fine-grained cross-view geolocalization (FG-CVG) as probabilistic displacement regression—the model learns displacement fields (distance + direction probability distributions) from arbitrary hypothesis positions to true locations, combined with an iterative refinement sampling (IRS) algorithm that flows multiple random hypotheses from different starting points toward a consensus position, achieving 29 FPS real-time inference with 7.8× fewer parameters and 4× less computation while maintaining competitive localization accuracy.
- GeoMMBench and GeoMMAgent: Toward Expert-Level Multimodal Intelligence in Geoscience and Remote Sensing
-
This work proposes GeoMMBench (1053 expert-level geoscience multiple-choice questions) and GeoMMAgent (a retrieval-perception-reasoning multi-agent framework), systematically evaluating 36 MLLMs in the remote sensing domain and revealing systematic deficiencies in domain knowledge, perceptual grounding, and reasoning capabilities.
- Joint and Streamwise Distributed MIMO Satellite Communications with Multi-Antenna Ground Users
-
This paper studies downlink transmission from multiple LEO satellites jointly serving multi-antenna ground users. Two non-coherent transmission modes are proposed—joint transmission and streamwise transmission—with precoders designed under the WMMSE framework and stream-to-satellite association solved via the Hungarian algorithm, achieving near-optimal spectral efficiency while substantially reducing fronthaul overhead.
- Joint and Streamwise Distributed MIMO Satellite Communications with Multi-Antenna Ground Users
-
Two downlink transmission schemes (joint transmission & streamwise transmission) are proposed for distributed LEO satellite systems serving multi-antenna ground users. Through WMMSE precoding design based on statistical CSI and a stream-satellite association strategy based on the Hungarian algorithm, the proposed framework achieves a flexible trade-off between high spectral efficiency and low fronthaul overhead without requiring inter-satellite phase synchronization.
- Lumosaic: Hyperspectral Video via Active Illumination and Coded-Exposure Pixels
-
This paper presents Lumosaic, an active hyperspectral video system that synchronizes an array of 12 narrowband LEDs with a coded-exposure pixel (CEP) camera at microsecond precision. Within 158 sub-frames per video frame, the system jointly encodes spatial, temporal, and spectral information, achieving motion-robust hyperspectral video reconstruction at 30 fps, VGA resolution, and 31 spectral channels (400–700 nm), with PSNR exceeding passive snapshot systems by more than 10 dB.
- MetaSpectra+: A Compact Broadband Metasurface Camera for Snapshot Hyperspectral+ Imaging
-
This paper presents MetaSpectra+, the first multifunctional metasurface imaging system operating across the full visible spectrum (250 nm bandwidth). Through a dual-layer metasurface design enabling beam splitting and precise dispersion control, the system acquires a hyperspectral data cube together with HDR/polarization images in a single snapshot, achieving 33.31 dB PSNR on benchmark datasets with a total track length (TTL) of only 17 mm.
- MetaSpectra+: A Compact Broadband Metasurface Camera for Snapshot Hyperspectral+ Imaging
-
MetaSpectra+ proposes a metasurface–refractive hybrid optical paradigm that employs a dual-layer metasurface to independently control the dispersion, exposure, and polarization of four channels, enabling snapshot hyperspectral+HDR/polarization multi-functional imaging over a 250 nm bandwidth with a minimum total track length (TTL) of 17 mm. On the KAUST benchmark, it achieves a PSNR of 33.31 dB, comprehensively surpassing existing snapshot hyperspectral systems.
- No Labels, No Look-Ahead: Unsupervised Online Video Stabilization with Classical Priors
-
This paper proposes LightStab, an unsupervised online video stabilization framework built upon the classical three-stage pipeline (motion estimation → motion propagation → motion compensation) augmented with multi-threaded asynchronous buffering. LightStab is the first online method to comprehensively match offline SOTA across 5 benchmark datasets, and introduces UAV-Test, the first multimodal UAV aerial stabilization benchmark covering both visible-light and infrared imagery.
- Olbedo: An Albedo and Shading Aerial Dataset for Large-Scale Outdoor Environments
-
Olbedo introduces the first large-scale real-world aerial albedo–shading decomposition dataset (5,664 UAV images, 4 terrain types, multi-year multi-illumination conditions). A physics-based inverse rendering pipeline generates multi-view-consistent pseudo-ground-truth annotations. Results demonstrate that synthetic pre-training combined with Olbedo LoRA fine-tuning substantially improves outdoor albedo prediction and supports downstream applications including relighting, material editing, and scene change analysis.
- Are Pretrained Image Matchers Good Enough for SAR-Optical Satellite Registration?
-
This paper evaluates 24 families of pretrained image matchers on SAR-optical satellite registration under a zero-shot setting, finding that deployment protocol choices (geometric model, tile size, etc.) can affect accuracy by up to 33×, sometimes surpassing the effect of switching the matcher itself.
- RHO: Robust Holistic OSM-Based Metric Cross-View Geo-Localization
-
This paper introduces CV-RHO, the first OSM-based metric cross-view geo-localization benchmark targeting adverse weather and sensor noise (2.72M+ images), and proposes RHO, a dual-branch Pin-Pan architecture integrating panoramic undistortion (SUM) and position-orientation fusion (POF) mechanisms, achieving up to 20% localization improvement under diverse degradation conditions.
- SDF-Net: Structure-Aware Disentangled Feature Learning for Optical-SAR Ship Re-identification
-
This paper proposes SDF-Net, a physics-guided structure-aware disentangled feature learning network that enforces cross-modal geometric consistency via intermediate-layer gradient energy (SCL) and decouples shared/modality-specific features at the terminal layer (DFL) with parameter-free additive fusion, achieving 60.9% mAP (+3.5% vs. SOTA TransOSS) on HOSS-ReID.