ICML2025 Time Series AI paper notes paper summaries Time-Series Forecasting Personalized Generation LLM Sentiment Analysis Medical Imaging GNNs

📈 Time Series¶

🧪 ICML2025 · 21 paper notes

📌 Same area in other venues: 📷 CVPR2026 (7) · 🔬 ICLR2026 (121) · 💬 ACL2026 (8) · 🧪 ICML2026 (45) · 🤖 AAAI2026 (31) · 🧠 NeurIPS2025 (54)

🔥 Top topics: Time-Series Forecasting ×14

A Generalizable Physics-Enhanced State Space Model for Long-Term Dynamics Forecasting in Complex Environments: This paper proposes Phy-SSM, which integrates partially known physical knowledge into deep state space models (SSMs). Through dynamics decomposition (known/unknown matrices) and physical state regularization, it achieves accurate long-term dynamics forecasting and extrapolation for noisy, irregularly sampled data.
Are LLMs Prescient? A Continuous Evaluation using Daily News as the Oracle: This paper proposes Daily Oracle, a continuous evaluation benchmark that automatically generates predictive QA pairs from daily news. It systematically reveals a smooth decay in LLM predictive performance as pre-training data becomes outdated, showing an average accuracy drop of 21.55% on True/False (TF) questions and 11.33% on Multiple Choice (MC) questions, which cannot be fully mitigated even with RAG.
Causal Discovery from Conditionally Stationary Time Series: Proposed SDCI (State-Dependent Causal Inference)—a causal discovery method for conditionally stationary time series. It models non-stationary behavior using discrete latent state variables to perform state-dependent causal structure recovery, with its effectiveness validated on physical particle systems, gene regulatory networks, and NBA player motion prediction.
Channel Normalization for Time Series Channel Identification: This work proposes Channel Normalization (CN), which enhances the Channel Identifiability (CID) of time series models by assigning independent affine transformation parameters to each channel. It further extends to an adaptive version ACN (dynamically adjusting parameters) and a prototypical version PCN (supporting unknown/variable channel counts), achieving significant performance improvements across various time series models.
Customizing the Inductive Biases of Softmax Attention using Structured Matrices: This paper proposes replacing the low-rank scoring function in softmax attention with efficient structured matrices (BTT and MLR). This both addresses the low-rank bottleneck of standard attention and introduces a distance-dependent computational bias through MLR, yielding improvements in in-context regression, language modeling, and long-range time series forecasting.
Event-Aware Sentiment Factors from LLM-Augmented Financial Tweets: A Transparent Framework for Interpretable Quant Trading: This study leverages Large Language Models (LLMs) to perform multi-label event classification and annotation on financial tweets, transforming unstructured social media text into structured, interpretable, event-driven quantitative factors. It discovers that specific event categories (e.g., rumor/speculation) possess significant negative Alpha signals (with Sharpe ratios as low as -0.38).
Foundation Models for Clinical Records at Health System Scale: Proposes GPT-EHR, a generative pre-training framework based on next-visit event prediction. By training a decoder-only Transformer on longitudinal EHR data of 1.29 million patients from NYU Langone, GPT-EHR predicts the onset of dementia and knee osteoarthritis in a zero-shot manner. Its performance is comparable to fully fine-tuned BERT baselines, while successfully uncovering and addressing a critical pitfall where repeated event tokens artificially inflate evaluation metrics.
HyperIMTS: Hypergraph Neural Network for Irregular Multivariate Time Series Forecasting: HyperIMTS is proposed to represent the observations and their dependencies in irregular multivariate time series (IMTS) using a hypergraph structure. By leveraging three message passing mechanisms (node-to-hyperedge, hyperedge-to-hyperedge, and hyperedge-to-node), it achieves irregularity-aware temporal and variable dependency learning. It achieves SOTA performance on 5 IMTS datasets with superior computational efficiency compared to padding methods.
IMTS is Worth Time × Channel Patches: Visual Masked Autoencoders for Irregular Multivariate Time Series Prediction: The VIMTS framework is proposed, which converts irregular multivariate time series (IMTS) into an image-like time × channel patch structure. By leveraging the sparse multi-channel modeling capability of a visual MAE pre-trained on large-scale RGB images, combined with GCN-based cross-channel imputation and a coarse-to-fine prediction strategy, VIMTS achieves SOTA performance and strong few-shot capabilities on IMTS prediction tasks.
Learning Soft Sparse Shapes for Efficient Time-Series Classification: The SoftShape model is proposed to replace the traditional hard filtering of shapelets with soft sparsification based on contribution scores. Combining MoE-driven intra-shape and shared-expert-driven inter-shape dual-mode temporal pattern learning, it achieves SOTA classification accuracy on 128 UCR datasets.
Lyapunov Learning at the Onset of Chaos: This work proposes the Lyapunov Learning algorithm. By viewing the neural network as a dynamical system and incorporating a Lyapunov exponent regularization term into the loss function, the network is pushed toward the edge of chaos. This enables rapid self-adaptation when regime shifts occur in non-stationary time series, reducing the post-shift MSE by approximately 96% in Lorenz system experiments.
Risk and Cross Validation in Ridge Regression with Correlated Samples: Utilizing random matrix theory and free probability techniques, this work derives exact risk asymptotic formulas for high-dimensional ridge regression with training samples having arbitrary correlation, and proposes a corrected generalized cross-validation estimator, CorrGCV, which accurately predicts out-of-sample risk under sample-correlated conditions.
TCP-Diffusion: A Multi-modal Diffusion Model for Global Tropical Cyclone Precipitation Forecasting with Change Awareness: This paper proposes TCP-Diffusion, a conditional diffusion model that integrates historical precipitation, multimodal meteorological variables, and NWP forecasts. By predicting precipitation changes rather than absolute values through an Adjacent Residual Prediction (ARP) mechanism, it outperforms authoritative NWP methods such as ECMWF in global tropical cyclone precipitation forecasting.
TQNet: Temporal Query Network for Efficient Multivariate Time Series Forecasting: This paper proposes the Temporal Query (TQ) technique, which utilizes periodically shifted learnable vectors as queries in the attention mechanism to capture global variable-to-variable correlation patterns, while keys/values are derived from the raw data to preserve sample-level local information. Built upon this, TQNet uses only a single-layer multi-head attention and a shallow MLP to achieve overall state-of-the-art (SOTA) performance across 12 real-world datasets, with computation efficiency approaching the linear model DLinear.
TimePoint: Accelerated Time Series Alignment via Self-Supervised Keypoint and Descriptor Learning: Proposes TimePoint—a self-supervised method inspired by 2D keypoint detection but rewritten for 1D signals. It learns sparse representations of time series by detecting keypoints and extracting descriptors, applying DTW to sparse keypoints instead of the full signal. This significantly accelerates alignment while frequently improving alignment accuracy.
TimePro: Efficient Multivariate Long-term Time Series Forecasting with Variable- and Time-Aware Hyper-state: This paper proposes the Mamba-based TimePro model. By constructing variable- and time-aware hyper-states, it adaptively selects key time steps to modulate the hidden states of variable dimensions, achieving efficient multivariate long-term time series forecasting with linear complexity.
TransPL: VQ-Code Transition Matrices for Pseudo-Labeling of Time Series Unsupervised Domain Adaptation: This paper proposes TransPL, which discretizes time series patches into VQ codes and constructs class-channel-level transition matrices, leveraging Bayes' theorem to generate interpretable pseudo-labels in the target domain, achieving average improvements of 6.1% in accuracy and 4.9% in F1 score for time series unsupervised domain adaptation.
Understanding the Limits of Deep Tabular Methods with Temporal Shift: This paper reveals the root causes of deep tabular models failing under temporal distribution shifts—namely, model selection failure caused by training lag and validation bias, and the loss of periodic/trend information in model representations. It proposes an improved temporal splitting strategy and a plug-and-play temporal embedding method based on Fourier series.
VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters: By reconstructing time series as images, VisionTS leverages ImageNet-pretrained MAE (Masked Autoencoders) for time series forecasting in a zero-shot setting, matching or even outperforming specialized time series foundation models without any training on time series data.
WAVE: Weighted Autoregressive Varying Gate for Time Series Forecasting: Introduces the classic statistical ARMA (autoregressive moving average) structure into the autoregressive Transformer attention mechanism. By employing an indirect MA weight generation method, it decouples short- and long-term temporal patterns without increasing time complexity or parameter count, significantly improving time series forecasting performance.
Winner-takes-all for Multivariate Probabilistic Time Series Forecasting: Proposes TimeMCL, which introduces the Winner-Takes-All (WTA) loss of Multiple Choice Learning to multivariate probabilistic time series forecasting. Through a single forward pass of a multi-head network, it generates diverse and representative future trajectories, successfully balancing prediction quality and computational efficiency.