Skip to content

📈 Time Series

🧪 ICML2026 · 10 paper notes

📌 Same area in other venues: 💬 ACL2026 (4) · 📷 CVPR2026 (6) · 🔬 ICLR2026 (36) · 🤖 AAAI2026 (35) · 🧠 NeurIPS2025 (56) · 📹 ICCV2025 (4)

🔥 Top topics: Time-Series Forecasting ×8 · Reasoning ×2

CombinationTS: A Modular Framework for Understanding Time-Series Forecasting Models

CombinationTS decomposes time-series forecasting models into five orthogonal modules: Input Transformation / Embedding / Encoder / Decoder / Output Transformation. It performs paired Monte Carlo sampling over a shared "evaluation condition space," replacing fragile single-point MSE with marginal performance \(\mu\) and stability \(\sigma\). The main conclusion: with well-designed data views (Embedding), a parameter-free Identity Encoder can match or even outperform complex Transformers. Much of the "SOTA gain" in time-series forecasting stems from how data is viewed, not from modeling capacity.

DAG: A Dual Correlation Network for Time Series Forecasting with Exogenous Variables

For time series forecasting with known future covariates (TSF-X), DAG designs a dual-pathway network: one pathway captures "historical exogenous → future exogenous" attention patterns along the temporal dimension and injects them into "historical endogenous → future endogenous" prediction; the other captures "historical exogenous → historical endogenous" patterns along the channel dimension and injects them into "future exogenous → future endogenous" prediction. On 12 public/new TSF-X datasets, DAG achieves the best MSE in 10/10 cases, significantly outperforming TimeXer, TFT, TiDE, CrossLinear, PatchTST, etc.

Doubly Outlier-Robust Online Infinite Hidden Markov Model

This paper proposes BR-iHMM: combining "robust observation update (WoLF)" with "batched state inference (degenerate sticky HDP prior)" to provide bounded Posterior Influence Function (PIF) in both observation and state spaces for online infinite HMMs. On streaming data with outliers from financial order books, electricity load, and synthetic regression, one-step prediction RMSE is reduced by up to 67%.

Ellipsoidal Time Series Forecasting

Fern reformulates long-term time series forecasting as "optimal transport from a fixed Gaussian source to a data-dependent ellipsoid," leveraging the Brenier theorem to restrict the search space to SPD (symmetric positive definite) class Jacobians. Using low-rank spectral decomposition via Householder reflections, the computational cost is reduced from \(O(n^3)\) to \(O(Rn)\). In non-stationary shock scenarios, Fern achieves up to 790× stability improvement over baselines like DLinear/Koopa.

FRACTAL: State Space Model with Fractional Recurrent Architecture for Computational Temporal Analysis of Long Sequences

This work generalizes the probabilistic measure underlying the HiPPO framework to a fractional power-law measure with a tunable singularity index \(\alpha\), thereby, for the first time, achieving "full-history retention + recent sensitivity + scale invariance" simultaneously. This theory is instantiated as an LTI diagonalized SSM—FRACTAL matches S5 with an 87.11% average on Long Range Arena and achieves 61.85% on ListOps.

From Observations to States: Latent Time Series Forecasting

The authors observe that even state-of-the-art TSF models with high prediction accuracy often exhibit "temporal disorder" (Latent Chaos) in their latent spaces. They propose LatentTSF: first, an AutoEncoder compresses observations into a high-dimensional latent state space; then, any mainstream backbone predicts future states in this space (using a Pred + Align dual loss); finally, the predictions are decoded back to the observation space. On six standard benchmarks, this approach consistently reduces MSE/MAE and restores temporal locality and spectral structure in the latent representations.

HELIX: Hybrid Encoding with Learnable Identity and Cross-dimensional Synthesis for Time Series Imputation

A learnable "identity embedding" is assigned to each feature as a persistent semantic anchor, combined with a time-feature double helix attention mechanism. HELIX achieves first place across all 21 missing data scenarios on 5 public multivariate time series datasets, outperforming the next-best ImputeFormer by over 25% MAE reduction on datasets like ETT-h1.

PATRA: Pattern-Aware Alignment and Balanced Reasoning for Time Series Question Answering

For time series question answering (TSQA), PATRA explicitly decomposes sequences into full/trend/season patterns at the representation level, and performs deep cross-alignment with text via three sets of learnable alignment tokens. In training, a two-stage SFT + GRPO reinforcement learning approach is used, mapping both discriminative and generative task rewards to \([0,2]\) to address difficulty imbalance, thereby comprehensively surpassing text LLMs, ChatTS, and other multimodal temporal LLMs across four TSQA tasks.

Time-series Forecasting Through the Lens of Dynamics

The authors propose the PRO-DYN nomenclature using Allen's interval algebra, decomposing any time-series forecasting (TSF) model into "Pre-processing PRO → Dynamics DYN → Post-processing PRO" three stages. They discover two empirical rules: (i) DYN must be learnable and complete to outperform LTSF-Linear, (ii) DYN must be placed at the very end of the pipeline (PRE-DYN configuration) to fully leverage long lookback benefits. By adding a linear DYN layer to Informer/FEDformer/MICN/FiLM, performance consistently improves; moving DYN to the front in iTransformer/PatchTST/Crossformer degrades performance, experimentally validating both rules.

TSRBench: A Comprehensive Multi-task Multi-modal Time Series Reasoning Benchmark for Generalist Models

TSRBench constructs a time series reasoning benchmark covering 14 domains, 4 major dimensions (perception/reasoning/prediction/decision-making), 15 tasks, 4125 questions, and supports four input modalities: text, visualization, text+image, and embedding. It systematically evaluates 30+ mainstream LLMs, VLMs, and TSLLMs, revealing key findings such as "scaling holds for perception/reasoning but fails for prediction" and "text and visualization modalities are highly complementary, but current models can hardly fuse them."