📈 Time Series¶
🧠 NeurIPS2025 · 54 paper notes
📌 Same area in other venues: 📷 CVPR2026 (7) · 🔬 ICLR2026 (121) · 💬 ACL2026 (8) · 🧪 ICML2026 (45) · 🤖 AAAI2026 (31) · 📹 ICCV2025 (4)
🔥 Top topics: Time-Series Forecasting ×30 · Diffusion Models ×3 · GNNs ×2 · Adversarial Robustness ×2 · Self-Supervised Learning ×2
- A Graph Neural Network Approach for Localized and High-Resolution Temperature Forecasting
-
This paper proposes a GCN-GRU hybrid framework for community-scale (2.5 km) high-resolution temperature forecasting (1–48 hours), validated across three regions in southwestern Ontario, Canada. The largest region achieves an average MAE of 1.93°C and a 48-hour MAE of 2.93°C. The work explores ClimateBERT language model embeddings as a standardized input scheme, and provides a transferable lightweight forecasting framework targeting data-scarce regions in the Global South.
- AERO: A Redirection-Based Optimization Framework Inspired by Judo for Robust Probabilistic Forecasting
-
AERO proposes an optimization paradigm inspired by the judo principle of "redirecting force rather than resisting it," attempting to redirect adversarial perturbations into beneficial optimization signals. The framework is theoretically grounded in 15 axioms and 4 theorems, constructing an energy-conservation-based gradient redirection system. However, the actual implementation is substantially simplified to momentum SGD with Gaussian noise injection, and validation is conducted solely on a single private solar energy price prediction dataset without any baseline comparisons.
- AttentionPredictor: Temporal Patterns Matter for KV Cache Compression
-
AttentionPredictor is the first learning-based method that directly predicts attention patterns for KV cache compression and critical token identification. By leveraging a lightweight CNN to capture spatiotemporal patterns in attention scores, it achieves 13× KV cache compression and 5.6× inference speedup, with a unified prediction model of only 21 KB shared across all Transformer layers.
- BubbleFormer: Forecasting Boiling with Transformers
-
This paper proposes BubbleFormer, a Transformer architecture based on decomposed spatiotemporal attention for forecasting boiling dynamics—including the notoriously difficult spontaneous bubble nucleation events—accompanied by the BubbleML 2.0 dataset (160+ high-fidelity simulations), achieving accurate spatiotemporal boiling predictions across diverse fluids, geometries, and wall conditions.
- Causal Masking on Spatial Data: An Information-Theoretic Case for Learning Spatial Datasets with Unimodal Language Models
-
This paper demonstrates that applying causal masking directly to spatial data (chess board states in FEN format) for training a unimodal LLM outperforms first linearizing the data into sequences (PGN move records) and then applying causal masking — Llama 1.3B trained with FEN + causal masking achieves ~2630 Elo, whereas PGN + causal masking yields only ~2130 Elo.
- CausalDynamics: A Large-Scale Benchmark for Structural Discovery of Dynamical Causal Models
-
This paper introduces CausalDynamics — the largest benchmark to date for causal discovery in dynamical systems (14,000+ graphs, 50M+ samples) — encompassing a three-tier progressively complex hierarchy ranging from 3-dimensional chaotic ODE/SDE systems and hierarchically coupled systems to realistic climate models. The benchmark comprehensively evaluates 10 state-of-the-art causal discovery algorithms, revealing the shortcomings of current deep learning methods on high-dimensional nonlinear dynamical systems.
- Channel Matters: Estimating Channel Influence for Multivariate Time Series
-
This paper proposes Channel-wise Influence (ChInf)—the first influence function method capable of quantifying the effect of individual channels on model performance in multivariate time series (MTS). By decomposing TracIn from the holistic sample level to the channel level, ChInf enables two downstream applications: channel-level anomaly detection and channel pruning, achieving state-of-the-art performance on 5 anomaly detection benchmarks.
- Decomposition of Small Transformer Models
-
This paper extends Stochastic Parameter Decomposition (SPD) to Transformers by designing a sequence-aware causal importance function and a novel partial reconstruction loss. On a toy induction head task, the method recovers the expected two-step circuit; on GPT-2-small, it localizes rank-1 parameter subspaces corresponding to interpretable concepts such as "golf" and "basketball."
- DemandCast: Global hourly electricity demand forecasting
-
DemandCast is an open-source machine learning framework that leverages XGBoost to integrate historical electricity demand, ERA5 temperature data, and socioeconomic features for hourly electricity demand forecasting across 56 countries/regions worldwide. By normalizing the target variable as a fraction of annual demand, the framework achieves cross-country comparability and attains a MAPE of 9.2% on a temporally held-out test set.
- Diffusion Transformers as Open-World Spatiotemporal Foundation Models
-
This paper proposes UrbanDiT, the first open-world urban spatiotemporal foundation model based on Diffusion Transformers. It integrates heterogeneous data types (grid/graph) and diverse tasks (prediction, interpolation, extrapolation, imputation) through a unified prompt learning framework, achieving state-of-the-art performance across multiple cities and scenarios while demonstrating strong zero-shot generalization.
- Exploring Neural Granger Causality with xLSTMs: Unveiling Temporal Dependencies in Complex Data
-
This paper proposes GC-xLSTM, which leverages the xLSTM architecture combined with a novel dynamic sparsity optimization strategy to uncover Granger causal relationships in multivariate time series, achieving state-of-the-art performance on multiple datasets.
- Frequency Matters: When Time Series Foundation Models Fail Under Spectral Shift
-
This paper identifies spectral shift—a mismatch between the dominant frequencies of downstream data and those covered by pretraining data—as the key reason for generalization failure of Time Series Foundation Models (TSFMs) in industrial settings. The hypothesis is validated through an industrial-scale mobile game player engagement prediction task and controlled synthetic experiments.
- Fern: Chaining Spectral Pearls — Ellipsoidal Forecasting Beyond Trajectories for Time Series
-
This paper proposes Fern (Forecasting with Ellipsoidal RepresentatioN), which replaces conventional trajectory prediction with patch-wise ellipsoidal transport (rotation–scaling–translation). Fern substantially outperforms baselines on chaotic systems while remaining competitive on standard LTSF benchmarks.
- Graph-based Neural Space Weather Forecasting
-
This paper proposes a graph neural network-based neural emulator for space weather, trained on Vlasiator hybrid-Vlasov simulation data, enabling both deterministic and probabilistic autoregressive forecasting of near-Earth space conditions. The emulator achieves over 100× speedup relative to the original simulator and quantifies forecast uncertainty through latent-variable ensemble generation.
- How Foundational are Foundation Models for Time Series Forecasting?
-
Through systematic experiments on synthetic and real-world electricity consumption data, this paper reveals that the zero-shot generalization capability of time series foundation models (TSFMs) is highly dependent on the pretraining data distribution. Under domain shift, SAMFormer—a lightweight specialized model with only 49.5K parameters trained from scratch—outperforms fine-tuned TimesFM with 500M+ parameters.
- How Patterns Dictate Learnability in Sequential Data
-
This paper proposes an information-theoretic framework based on predictive information \(\mathbf{I}(X_{\text{past}}; X_{\text{future}})\) to quantify the strength of temporal patterns in sequential data. It derives theoretical bounds linking predictive information to the minimum achievable risk, thereby enabling a distinction between "insufficient model capacity" and "intrinsically unpredictable data."
- Human-Machine Ritual: Synergic Performance through Real-Time Motion Recognition
-
This paper proposes a lightweight real-time motion recognition system that leverages wearable IMU sensors combined with the MiniRocket time-series classifier to achieve dancer-specific motion recognition with <50ms latency and 96.05% accuracy. Through "embodied memory mapping," the system encodes each dancer's personal movement-sound associations, establishing a human-machine collaborative performance paradigm that respects the expressive depth of the human body.
- Improving Time Series Forecasting via Instance-aware Post-hoc Revision (PIR)
-
PIR proposes an instance-aware post-hoc revision framework that identifies poorly predicted instances via uncertainty estimation and applies a residual combination of local correction (covariate + exogenous variable Transformer) and global correction (retrieval-based weighted average over similar training instances) as a plug-and-play module, reducing SparseTSF MSE by 25.87% and PatchTST MSE by 8.99%.
- In-Context Learning of Stochastic Differential Equations with Foundation Inference Models
-
This paper proposes FIM-SDE, a pretrained recognition model capable of zero-shot (in-context) estimation of drift and diffusion functions of low-dimensional SDEs from noisy time series data, and further surpasses all baseline methods via rapid fine-tuning.
- IonCast: A Deep Learning Framework for Forecasting Ionospheric Dynamics
-
This paper proposes IonCast, a framework comprising a GraphCast-based GNN model and a ConvLSTM baseline that integrates multi-source heterogeneous space weather data (TEC maps, solar wind, geomagnetic indices, orbital mechanics, etc.) for global spatiotemporal forecasting of ionospheric total electron content (TEC). IonCast outperforms persistence baselines and the IRI empirical model under geomagnetic storm conditions.
- Learning Time-Scale Invariant Population-Level Neural Representations
-
This paper proposes Time-Scale Augmented Pretraining (TSAP), a strategy that introduces data augmentation over multiple temporal window lengths during pretraining, enabling population-level neural signal foundation models to achieve invariance to input time scales and substantially improving decoding performance at both matched and unseen time scales.
- Learning with Calibration: Exploring Test-Time Computing of Spatio-Temporal Forecasting
-
This paper proposes ST-TTC, a lightweight test-time computing paradigm that corrects periodic biases in spatio-temporal forecasting during inference via a frequency-domain phase-amplitude calibrator and a flash gradient update mechanism, consistently improving the performance of diverse backbone models without modifying their architectures.
- Less is More: Unlocking Specialization of Time Series Foundation Models via Structured Pruning
-
This paper reveals that pretrained time series foundation models (TSFMs) exhibit inherent task-relevant sparsity, and proposes a Prune-then-Finetune paradigm—removing task-irrelevant parameters via structured pruning so that a pruned-then-finetuned smaller model significantly outperforms direct fine-tuning of the full model, and even surpasses strong specialized baselines.
- MAESTRO: Adaptive Sparse Attention and Robust Learning for Multimodal Dynamic Time Series
-
This paper proposes MAESTRO, a framework that addresses modality heterogeneity and arbitrary missingness in multimodal time series via symbolic tokenization, adaptive attention budgeting, sparse cross-modal attention, and dynamic MoE routing, substantially outperforming baselines under both complete and missing modality scenarios.
- MIRA: Medical Time Series Foundation Model for Real-World Health Data
-
This paper presents MIRA, a foundation model specifically designed for irregular medical time series. Through continuous-time rotary position encoding (CT-RoPE), frequency-specific Mixture-of-Experts (MoE), and a Neural ODE-based extrapolation module, MIRA is pretrained on 454 billion observation points and achieves zero-shot forecasting performance that reduces average error by 8% and 6% in OOD and in-distribution (ID) settings, respectively.
- Multi-Scale Finetuning for Encoder-based Time Series Foundation Models
-
This paper proposes MSFT (Multi-Scale FineTuning), which leverages causal analysis to reveal that naive fine-tuning suffers from scale confounding, and designs a multi-scale modeling framework for efficient fine-tuning of encoder-based time series foundation models, significantly outperforming both naive fine-tuning and from-scratch SOTA methods.
- Neural MJD: Neural Non-Stationary Merton Jump Diffusion for Time Series Prediction
-
This paper proposes Neural MJD, which parameterizes a non-stationary Merton Jump Diffusion model via neural networks, casting prediction as an SDE simulation problem. The framework combines a time-varying Itô diffusion (capturing continuous drift) with a time-varying compound Poisson process (modeling abrupt jumps), and employs likelihood truncation together with an Euler-Maruyama with Restart solver to enable scalable learning and inference.
- Neural Stochastic Flows: Solver-Free Modelling and Inference for SDE Solutions
-
This paper proposes Neural Stochastic Flows (NSF), which directly learns the transition distribution \(p(x_t \mid x_s)\) of an SDE via conditional normalising flows. The architecture is constrained to satisfy stochastic flow properties (identity, Markov, Chapman-Kolmogorov), enabling single-step sampling without numerical solvers and achieving up to two orders of magnitude speedup at distant time points.
- NSW-EPNews: A News-Augmented Benchmark for Electricity Price Forecasting with LLMs
-
This paper introduces NSW-EPNews, the first electricity price forecasting benchmark augmented with news text, systematically evaluating both traditional models and LLMs on multimodal electricity price prediction. Key findings show that news features provide marginal gains for traditional models, while LLMs suffer from severe hallucination issues.
- OmniCast: A Masked Latent Diffusion Model for Weather Forecasting Across Time Scales
-
OmniCast is proposed as a weather forecasting method that combines a masked generative framework with a latent diffusion model. By jointly generating future weather sequences rather than iterating autoregressively, it mitigates error accumulation, achieves state-of-the-art performance at the subseasonal-to-seasonal (S2S) scale, remains competitive for medium-range forecasting, and offers inference speeds 10–20× faster.
- Parallelization of Non-linear State-Space Models: Scaling Up Liquid-Resistance Liquid-Capacitance Networks for Efficient Sequence Modeling
-
This paper proposes LrcSSM, which achieves exact and efficient parallelization of nonlinear RNNs by constraining the Jacobian matrix of Liquid-Resistance Liquid-Capacitance (LRC) networks to be diagonal, surpassing Transformer, LRU, S5, and Mamba on long-sequence classification benchmarks.
- Physics-informed Reduced Order Modeling of Time-dependent PDEs via Differentiable Solvers
-
This paper proposes Φ-ROM, a framework that embeds differentiable PDE solvers into the training loop of nonlinear reduced order models. By leveraging solver feedback to directly constrain latent space dynamics, Φ-ROM significantly outperforms purely data-driven ROMs and other physics-informed methods in generalization to unseen parameters/initial conditions, long-horizon extrapolation, and solution recovery from sparse observations.
- PlanU: Large Language Model Reasoning through Planning under Uncertainty
-
This paper proposes PlanU—an LLM decision-making method that models node returns via quantile distributions within MCTS and balances exploration and exploitation through an Upper Confidence Bounds with Curiosity (UCC) score. PlanU is the first approach to systematically and simultaneously address both LLM uncertainty and environmental uncertainty, achieving substantial improvements over existing methods across multiple stochastic environment benchmarks.
- Probability Calibration for Precipitation Nowcasting
-
This paper proposes the Expected Threshold Calibration Error (ETCE) as a more appropriate metric for probability calibration in precipitation nowcasting, and extends post-hoc calibration techniques from computer vision to the forecasting domain. By incorporating a lead-time-conditioned Selective Scaling method, the proposed approach reduces model calibration error by up to 23.5%.
- RiverMamba: A State Space Model for Global River Discharge and Flood Forecasting
-
The first deep learning model capable of 7-day river discharge forecasting on a 0.05° (~5.5 km) global grid — global grid points are serialized via space-filling curves into 3D spatiotemporal point sequences fed into bidirectional Mamba blocks, driven by ECMWF HRES meteorological forecasts, achieving F1 = 0.459 on flood detection across 1.5–500-year return periods, surpassing LSTM (0.358) and the physical model GloFAS.
- Rotary Masked Autoencoders are Versatile Learners
-
This paper proposes RoMAE, which extends Rotary Position Embedding (RoPE) to continuous positions and integrates it with Masked Autoencoders (MAE). Without any time-series-specific architectural modifications, RoMAE matches or surpasses specialized models across diverse modalities including irregular time series, images, and audio.
- Scalable Signature Kernel Computations for Long Time Series via Local Neumann Series Expansions
-
This paper proposes PowerSig, which efficiently computes signature kernels via locally adaptive truncated Neumann series expansions, reducing memory from \(O(\ell^2)\) to \(O(\ell P)\) and enabling signature kernel computation on time series of length exceeding one million on a single GPU.
- Selective Learning for Deep Time Series Forecasting
-
This paper proposes a Selective Learning strategy that employs a dual-mask mechanism—comprising an uncertainty mask and an anomaly mask—to identify generalizable time steps for MSE loss computation. The approach achieves an average MSE reduction of 37.4% for Informer, 8.4% for TimesNet, and 6.5% for iTransformer across 8 benchmark datasets.
- SEMPO: Lightweight Foundation Models for Time Series Forecasting
-
This paper proposes SEMPO — a lightweight time series foundation model with only 6.5M parameters pretrained on 83M time points — that combines energy-aware spectral decomposition with a mixture-of-prompts Transformer to surpass large foundation models with over 100× more parameters in zero-shot and few-shot forecasting.
- Simple and Efficient Heterogeneous Temporal Graph Neural Network
-
This paper proposes SE-HTGNN, which integrates temporal modeling into spatial learning via a dynamic attention mechanism and initializes attention coefficients using LLM-generated priors, achieving up to 10× speedup over prior methods while maintaining state-of-the-art predictive accuracy on heterogeneous temporal graph tasks.
- Statistical Guarantees for High-Dimensional Stochastic Gradient Descent
-
This work introduces coupling techniques from high-dimensional nonlinear time series into online learning, providing the first rigorous moment convergence bounds and high-probability concentration inequalities—under \(\ell^s\) and \(\ell^\infty\) norms—for constant learning rate SGD and its Ruppert–Polyak averaged variant (ASGD) in high dimensions.
- StRap: Spatio-Temporal Pattern Retrieval for Out-of-Distribution Generalization
-
This paper proposes StRap, a framework that constructs a multi-dimensional pattern memory bank comprising spatial, temporal, and spatio-temporal key-value pairs. At inference time, StRap retrieves historical patterns most similar to the current input and adaptively fuses them into the model representation, effectively addressing the Spatio-Temporal Out-Of-Distribution (STOOD) problem in streaming spatio-temporal data.
- Structured Sparse Transition Matrices to Enable State Tracking in State-Space Models
-
This paper proposes PD-SSM, a structured sparse parameterization for the state transition matrix of state-space models (SSMs). The core idea is to factorize the transition matrix as a product of a column-wise one-hot matrix P and a complex diagonal matrix D (i.e., \(A = PD\)), achieving expressiveness equivalent to unstructured (dense) SSMs while retaining computational efficiency comparable to diagonal SSMs at \(\Theta(LN)\). A single layer suffices to simulate any \(N\)-state finite-state automaton (FSA). The paper provides theoretical guarantees on BIBO stability and optimal state dimensionality, with strong empirical results on FSA simulation, multivariate time-series classification, long-sequence benchmarks, and natural-language state-tracking tasks.
- Synthetic Series-Symbol Data Generation for Time Series Foundation Models
-
This paper proposes the Series-Symbol (S²) data generation mechanism and SymTime, a dual-modality foundation model. Grounded in Takens' theorem and symbolic dynamics theory, the framework generates unlimited synthetic time series–symbol paired data (40M pairs / 50B tokens). Through cross-modal contrastive pre-training, SymTime achieves performance competitive with models pre-trained on real data across five time series tasks.
- SynTSBench: Rethinking Temporal Pattern Learning in Deep Learning Models for Time Series
-
This paper proposes SynTSBench, a synthetic data-driven evaluation paradigm that systematically assesses the actual modeling capabilities of time series forecasting models across dimensions such as trend, periodicity, dependency, and noise robustness, through programmable feature configurations and theoretically optimal benchmarks.
- Time-O1: Time-Series Forecasting Needs Transformed Label Alignment
-
This paper proposes Time-O1, which addresses the autocorrelation bias and task overload of the TMSE loss in time series forecasting by transforming label sequences into decorrelated, importance-ranked principal components. The method achieves state-of-the-art performance while remaining compatible with a wide range of forecasting models.
- TimePerceiver: An Encoder-Decoder Framework for Generalized Time-Series Forecasting
-
TimePerceiver proposes a unified encoder-decoder framework that generalizes the forecasting task (encompassing extrapolation, interpolation, and imputation) and employs a latent bottleneck encoder with a query-based decoder, achieving comprehensive state-of-the-art performance across 8 standard benchmarks.
- TiRex: Zero-Shot Forecasting Across Long and Short Horizons with Enhanced In-Context Learning
-
This paper proposes TiRex, a pretrained time series forecasting model based on xLSTM. By introducing a Contiguous Patch Masking (CPM) strategy and data augmentation techniques, TiRex with only 35M parameters comprehensively outperforms larger models such as Chronos Bolt (200M) and TimesFM (500M) on the GiftEval and Chronos-ZS benchmarks, achieving state-of-the-art performance in both short- and long-horizon zero-shot forecasting.
- Towards Self-Supervised Foundation Models for Critical Care Time Series
-
A self-supervised foundation model for critical care time series is constructed by pre-training a Biaxial Transformer (BAT) architecture on multiple ICU datasets, substantially outperforming supervised baselines in low-data regimes.
- Transformer Embeddings for Fast Microlensing Inference
-
This paper combines a Transformer encoder with Neural Posterior Estimation (NPE) to perform fast, well-calibrated parameter inference directly from sparse, noisy, and irregularly sampled microlensing light curves, achieving speedups exceeding \(10^4\times\) over traditional MCMC methods.
- Universal Spectral Tokenization via Self-Supervised Panchromatic Representation Learning
-
This paper proposes the first universal spectral tokenizer that jointly trains on heterogeneous astronomical spectra (SDSS/DESI/GALAH/APOGEE) on their native wavelength grids via continuous wavelength embeddings and self-supervised reconstruction objectives, producing aligned, uniform, and physically meaningful representations.
- WaLRUS: Wavelets for Long-range Representation Using SSMs
-
This paper proposes WaLRUS, a state space model (SSM) built upon Daubechies wavelets as a novel instantiation of the SaFARi framework, expanding the diversity of the SSM family and demonstrating unique advantages in long-range dependency modeling.
- Wavelet Canonical Coherence for Nonstationary Signals
-
This paper proposes WaveCanCoh, a framework that extends classical canonical coherence analysis to the wavelet domain. Built upon the multivariate locally stationary wavelet (MvLSW) model, it enables estimation of time-varying, scale-specific canonical coherence between two groups of nonstationary multivariate time series.
- xLSTM-Mixer: Multivariate Time Series Forecasting by Mixing via Scalar Memories
-
This paper proposes xLSTM-Mixer, the first architecture to combine the Extended Long Short-Term Memory network (sLSTM) with a Mixer framework. Through a three-stage design comprising temporal mixing, joint temporal-variate mixing, and multi-view mixing, the model achieves state-of-the-art performance on multivariate long-term time series forecasting while maintaining an extremely low memory footprint.