Skip to content

📈 Time Series

📹 ICCV2025 · 4 paper notes

I²-World: Intra-Inter Tokenization for Efficient Dynamic 4D Scene Forecasting

This paper proposes I²-World, which decouples 3D scene tokenization into two complementary processes — intra-scene multi-scale residual quantization and inter-scene temporal quantization — thereby retaining the high compression ratio of 3D tokenizers while incorporating the temporal modeling capability of 4D tokenizers, enabling efficient and high-quality 4D occupancy forecasting.

V2XPnP: Vehicle-to-Everything Spatio-Temporal Fusion for Multi-Agent Perception and Prediction

This paper proposes V2XPnP, a V2X spatio-temporal fusion framework built upon a unified Transformer architecture, which achieves multi-agent end-to-end perception and prediction under a one-step communication strategy. The work also introduces the first large-scale real-world sequential dataset supporting all V2X collaboration modes, achieving state-of-the-art performance on both perception and prediction tasks.

VA-MoE: Variables-Adaptive Mixture of Experts for Incremental Weather Forecasting

This paper proposes a novel incremental weather forecasting paradigm and the VA-MoE framework. Through a variables-adaptive MoE architecture and index embedding mechanism, VA-MoE achieves forecasting accuracy comparable to full training with only 25% trainable parameters and 50% of the initial training data.

VLRMBench: A Comprehensive and Challenging Benchmark for Vision-Language Reward Models

This paper proposes VLRMBench, a comprehensive and challenging benchmark for vision-language reward models (VLRMs) comprising 12,634 questions across 12 tasks, covering three dimensions: process understanding, outcome judgment, and criticism generation. Extensive experiments on 26 models reveal significant deficiencies in current VLRMs.