📈 Time Series¶

🔬 ICLR2026 · 39 paper notes

Adapt Data to Model: Adaptive Transformation Optimization for Domain-shared Time Series Foundation Models: This paper proposes TATO, a framework that automatically optimizes data preprocessing pipelines (including context trimming, scale normalization, and outlier correction) to adapt frozen large time-series models (LTMs) to diverse downstream domains without fine-tuning, achieving an average MSE reduction of 13.6% and up to 65.4%.
Brain-Semantoks: Learning Semantic Tokens of Brain Dynamics with a Self-Distilled Foundation Model: This paper proposes Brain-Semantoks, an fMRI foundation model based on a semantic tokenizer and a self-distillation objective. It aggregates functional network signals into robust semantic tokens and learns abstract brain dynamic representations through cross-temporal view consistency, achieving state-of-the-art performance under a linear probing setting.
Contextual and Seasonal LSTMs for Time Series Anomaly Detection: To address "small-magnitude point anomalies" and "slowly rising anomalies" that existing methods struggle to detect in univariate time series, this paper proposes CS-LSTMs, a dual-branch architecture in which S-LSTM models periodic evolution in the frequency domain and C-LSTM captures local trends in the time domain. Combined with a wavelet-based noise decomposition strategy, the method comprehensively outperforms state-of-the-art approaches on four benchmarks while improving inference speed by 40%.
CPiRi: Channel Permutation-Invariant Relational Interaction for Multivariate Time Series Forecasting: This paper proposes the CPiRi framework, which achieves channel permutation-invariant (CPI) cross-channel relational modeling via a frozen pretrained temporal encoder, a lightweight spatial Transformer, and a channel-shuffling training strategy. CPiRi attains state-of-the-art performance on 5 benchmarks with negligible degradation under channel permutation (\(\Delta\)WAPE < 0.25%).
CPiRi: Channel Permutation-Invariant Relational Interaction for Multivariate Time Series Forecasting: This paper proposes the CPiRi framework, which achieves channel permutation invariance (CPI) without sacrificing cross-channel modeling capability by combining a frozen pretrained temporal encoder, a trainable permutation-equivariant spatial module, and a channel shuffling training strategy. CPiRi achieves state-of-the-art performance on multiple traffic benchmarks.
Delta-XAI: A Unified Framework for Explaining Prediction Changes in Online Time Series Monitoring: This paper proposes Delta-XAI, a unified framework that adapts 14 existing XAI methods to the scenario of explaining prediction changes in online time series monitoring via a wrapper function. It further introduces SWING (Shifted Window Integrated Gradients), which constructs integration paths using past observations to capture temporal dependencies, consistently outperforming existing methods across multiple evaluation metrics.
Dissecting Chronos: Sparse Autoencoders Reveal Causal Feature Hierarchies in Time Series Foundation Models: This work is the first to apply Sparse Autoencoders (SAEs) to a time series foundation model (Chronos-T5-Large), revealing a depth-dependent feature hierarchy through 392 causal ablation experiments: mid-layer encoders concentrate causally critical change-point detection features, whereas the semantically richest final encoder layer exhibits the lowest causal importance.
EDINET-Bench: Evaluating LLMs on Complex Financial Tasks using Japanese Financial Statements: This paper constructs EDINET-Bench, a financial benchmark derived from ten years of Japanese EDINET annual reports, comprising three expert-level tasks—accounting fraud detection, earnings forecasting, and industry classification—and finds that even state-of-the-art LLMs only marginally outperform logistic regression.
Enhancing Multivariate Time Series Forecasting with Global Temporal Retrieval: This paper proposes the Global Temporal Retriever (GTR), a lightweight plug-and-play module that maintains adaptive global period embeddings and leverages absolute time indices to retrieve temporally aligned global periodic information, enabling arbitrary forecasting models to transcend the look-back window constraint and effectively capture global periodic patterns far exceeding the input length.
FeDaL: Federated Dataset Learning for General Time Series Foundation Models: This paper proposes FeDaL, a federated framework that trains a general time series foundation model from scratch via client-side Domain Bias Elimination (DBE) and server-side Global Bias Elimination (GBE), achieving competitive or superior performance across 8 downstream task types with significantly fewer parameters than centralized TSFMs.
Free Energy Mixer: This paper proposes Free Energy Mixer (FEM), which reframes attention value retrieval as a free energy (log-sum-exp) optimization problem, enabling value-aware posterior selection at the per-channel level. FEM addresses the inherent bottleneck of standard attention—lossless storage but lossy reading—and serves as a plug-and-play replacement for softmax/linear attention/RNN/SSM, yielding consistent improvements across NLP, vision, and time series tasks.
From Samples to Scenarios: A New Paradigm for Probabilistic Forecasting: This paper proposes the Probabilistic Scenarios paradigm, in which a model directly outputs a finite set of {scenario, probability} pairs in place of sampling, and introduces TimePrism — a model consisting of only three parallel linear layers — that achieves 9/10 SOTA results across 5 benchmark datasets.
GTM: A General Time-series Model for Enhanced Representation Learning of Time-Series Data: This paper proposes GTM, a general time-series foundation model that captures temporally granularity-aware features via a frequency-domain attention mechanism. Combined with a hybrid masking pre-training strategy, GTM is the first model to support all generative time-series tasks without any task-specific architectural modifications.
GTM: A General Time-series Model for Enhanced Representation Learning: GTM is a general time-series foundation model that captures temporal granularity-aware features via a Fourier attention mechanism and unifies reconstruction and autoregressive pre-training objectives through hybrid masking, achieving state-of-the-art performance across forecasting, imputation, anomaly detection, and classification tasks.
HiVid: LLM-Guided Video Saliency For Content-Aware VOD And Live Streaming: This paper proposes HiVid, the first framework to leverage LLMs as human proxies for generating content importance weights for video chunks. Through a Perception module (sliding-window scoring), a Ranking module (LLM-guided merge sort to eliminate scoring bias), and a Prediction module (multimodal time series forecasting with adaptive latency), HiVid enables content-aware streaming, achieving an 11.5% improvement in VOD PLCC, a 26% gain in live streaming prediction, and a 14.7% improvement in human MOS correlation.
Language in the Flow of Time: Time-Series-Paired Texts Weaved into a Unified Temporal Narrative: This paper identifies that time-series-paired texts exhibit periodicity analogous to that of time series (Chronological Textual Resonance), and proposes the TaTS framework, which transforms text representations into auxiliary variables to enhance the forecasting and imputation performance of arbitrary existing time series models in a plug-and-play manner.
Learning Recursive Multi-Scale Representations for Irregular Multivariate Time Series Forecasting: This paper proposes ReIMTS, a plug-and-play framework that preserves the original sampling patterns of irregular multivariate time series (IMTS) via time-period-based recursive partitioning (rather than resampling), combined with an irregularity-aware representation fusion mechanism for multi-scale modeling. ReIMTS achieves an average improvement of 27.1% across six IMTS backbones.
PAANO: Patch-Based Representation Learning for Time-Series Anomaly Detection: This paper proposes PaAno, a lightweight patch-level representation learning method for time-series anomaly detection. It employs a 1D-CNN encoder trained with triplet loss and pretext loss to learn a patch embedding space, and computes anomaly scores by measuring the distance between query patches and normal patches stored in a memory bank. PaAno achieves comprehensive state-of-the-art performance on the TSB-AD benchmark while requiring only 0.3M parameters and seconds of inference time.
Rating Quality of Diverse Time Series Data by Meta-learning from LLM Judgment: This paper proposes TSRating, a framework that leverages LLMs to perform pairwise quality comparisons of time series (TS) data segments across four dimensions—trend, frequency, amplitude, and pattern. Pairwise judgments are converted to scalar quality scores via the Bradley-Terry model. A TSRater model (MOMENT encoder + MLP) is then trained using MAML meta-learning across 9 domains and 22 subsets, enabling efficient and unified cross-domain TS data quality assessment.
Reasoning on Time-Series for Financial Technical Analysis: This paper proposes the Verbal Technical Analysis (VTA) framework, which combines the linguistic reasoning capabilities of LLMs with the pattern-capturing capacity of time-series models. Time-GRPO reinforcement learning is employed to optimize reasoning chains, and inferred attributes are used to condition time-series forecasting, achieving financial time-series prediction that is both accurate and interpretable.
Relational Feature Caching for Accelerating Diffusion Transformers: This paper proposes Relational Feature Caching (RFC), a framework that enhances the accuracy of cached feature prediction by exploiting the strong correlation between input and output features of DiT modules. RFC comprises two components: RFE, which estimates output change magnitude from input variations, and RCS, which uses input prediction error as a proxy to determine when full computation is required. RFC significantly outperforms existing temporal extrapolation-based caching methods on both image and video generation tasks.
Relational Transformer: Toward Zero-Shot Foundation Models for Relational Data: This paper proposes the Relational Transformer (RT) architecture, which leverages task table prompting, cell tokenization, and a Relational Attention mechanism to enable zero-shot transfer to unseen datasets and tasks after pretraining on multiple relational databases. The 22M-parameter model achieves 93% of fully supervised AUROC in the zero-shot setting, significantly outperforming a 27B LLM at 84%.
ResCP: Reservoir Conformal Prediction for Time Series Forecasting: This work is the first to integrate Reservoir Computing (Echo State Network) into conformal prediction. By using randomly initialized ESNs to encode the temporal dynamics of residual sequences, the method leverages state similarity to adaptively reweight historical residuals for constructing local prediction intervals—requiring no training—and achieves state-of-the-art Winkler scores on 4 real-world datasets while running 20–80× faster than HopCPT.
Routing Channel-Patch Dependencies in Time Series Forecasting with Graph Spectral Decomposition: This paper proposes xCPD, a plug-and-play plugin that refines the modeling unit of multivariate time series from "channels" to "channel-patches." It constructs spectral embeddings via a shared graph Fourier basis, groups nodes into low/mid/high frequency bands based on spectral energy responses, and applies dynamic MoE routing to adaptively select frequency-specific filter experts. xCPD can be seamlessly integrated into any existing CI/CD model to consistently improve both long- and short-term forecasting performance, and supports zero-shot transfer.
SciTS: Scientific Time Series Understanding and Generation with LLMs: This paper proposes SciTS—a scientific time series benchmark spanning 12 scientific domains, 43 tasks, and 54K+ samples—and introduces the TimeOmni framework, which unifies understanding and generation tasks via multi-patch expert routing and an LLM backbone, achieving the best overall performance across the full benchmark.
scits scientific time series understanding and generation with llms: This work proposes the SciTS benchmark covering 43 tasks across 12 scientific domains with 54K+ instances (lengths from \(10^0\) to \(10^7\), frequencies up to 10 MHz), systematically evaluates 17 models and finds that general-purpose LLMs generalize better than specialized time-series models while text/image encodings each have distinct limitations, and accordingly designs the TimeOmni framework, which employs multi-patch experts with a routing mechanism and patch reprogramming to explicitly model temporal dynamics in joint training with an LLM backbone.
SwiftTS: A Swift Selection Framework for Time Series Pre-trained Models via Multi-task Meta-Learning: SwiftTS is proposed as the first model selection framework for time series pre-trained models. It employs a dual-encoder architecture to independently embed patch-level temporal features of datasets and model meta-information (architecture / topology / function), computes compatibility scores via patch-level cross-attention, and incorporates horizon-adaptive mixture-of-experts together with cross-domain/cross-horizon meta-learning. On 14 datasets × 8 models, it achieves an average weighted Kendall \(\tau_\omega = 0.442\), substantially outperforming all baselines.
T1: One-to-One Channel-Head Binding for Multivariate Time-Series Imputation: This paper proposes T1, a CNN-Transformer hybrid architecture whose core innovation is Channel-Head Binding (CHead Attention): a shared depthwise convolution extracts \(C\) types of temporal features (trend, periodicity, abrupt changes, etc.) for each variable, and each CNN channel is then bound one-to-one to a single attention head, enabling cross-variable information transfer to proceed independently at the feature level. When missing data prevents a channel from extracting a valid pattern, the corresponding attention head automatically down-weights, achieving adaptive missing-data handling without explicit design. On 11 benchmark datasets, the average MSE is reduced by 46%, with even larger gains under 70% extreme missingness.
Tensor learning with orthogonal, Lorentz, and symplectic symmetries: This paper provides a complete parameterization of equivariant polynomial functions under the diagonal action of the orthogonal group \(O(d)\), the indefinite orthogonal group (including the Lorentz group), and the symplectic group \(Sp(d)\) on tensors. The framework is applied to design learnable sparse vector recovery algorithms that outperform existing sum-of-squares spectral methods across multiple data-generating assumptions.
Test-Time Efficient Pretrained Model Portfolios for Time Series Forecasting: This paper proposes Chroma — a portfolio framework of small pretrained time series models: frequency/domain expert models are derived from a general model via post-training (achieving 10× training speedup), and at test time predictions are combined through model selection or greedy ensemble. A 4M-parameter portfolio matches the performance of 205M–500M parameter monolithic models on Chronos Benchmark II, while requiring far less inference computation than test-time fine-tuning.
TimeOmni-1: Incentivizing Complex Reasoning with Time Series in Large Language Models: TimeOmni-1 proposes the first unified time series reasoning model, leveraging TSR-Suite (the first reasoning-oriented time series dataset suite) and a two-stage training paradigm (SFT for injecting temporal priors + RL for refining reasoning), achieving significant improvements over GPT-4.1 across multiple time series reasoning tasks.
TimeSliver: Symbolic-Linear Decomposition for Explainable Time Series Classification: TimeSliver is an explainability-driven deep learning framework that jointly leverages raw time series data and symbolic abstractions (binning) to construct representations that preserve the original temporal structure. Each element linearly encodes the contribution of its corresponding temporal segment to the final prediction, yielding per-timestep positive/negative attribution scores. TimeSliver surpasses competing methods by 11% in temporal attribution accuracy across 7 datasets while achieving performance on par with SOTA on 26 UEA benchmarks.
Towards Generalizable PDE Dynamics Forecasting via Physics-Guided Invariant Learning: This paper proposes iMOOE, a framework that explicitly formalizes two-level physical invariance principles — operator invariance and compositional invariance — within PDE systems, and instantiates them via a mixture-of-operator-experts network and a frequency-enhanced risk equalization objective, achieving state-of-the-art zero-shot PDE dynamics forecasting across diverse OOD scenarios without any test-time adaptation.
Towards Robust Real-World Multivariate Time Series Forecasting: A Unified Framework: This paper proposes ChannelTokenFormer (CTF), a unified Transformer framework that simultaneously addresses three core challenges in real-world multivariate time series forecasting: (1) complex inter-channel dependencies — via channel token cross-channel attention; (2) asynchronous sampling across channels — via frequency-domain dynamic patching that preserves original resolution; (3) block-wise missingness at test time — via patch masking during training and direct removal of fully-missing patches at inference. CTF achieves comprehensive state-of-the-art results across six datasets including ETT, SolarWind, Weather, EPA, and CHS.
TSPulse: Tiny Pre-Trained Models with Disentangled Representations for Rapid Time Series: This paper proposes TSPulse, an ultra-lightweight time series pre-trained model with only 1M parameters, which surpasses models 10–100× larger on four tasks — classification (+5–16%), anomaly detection (+20%), imputation (+50%), and similarity retrieval (+25%) — through dual-space masked reconstruction and dual-embedding disentanglement.
TSRating: Rating Quality of Diverse Time Series Data by Meta-learning from LLM Judgment: TSRating leverages the prior knowledge of LLMs to conduct pairwise quality judgments of time series data chunks across four dimensions—trend, frequency, amplitude, and pattern—converts these comparisons into scalar scores via the Bradley-Terry model, and trains a cross-domain generalizable TSRater via meta-learning, enabling efficient and accurate time series data quality assessment.
Tuning the Burn-in Phase in RNN Training Improves Performance: This paper provides a theoretical analysis of the critical role played by the burn-in length \(m\) in Truncated Backpropagation Through Time (TBPTT) training of RNNs. It establishes upper bounds on training regret and validates through system identification and time series forecasting experiments that appropriately tuning the burn-in phase can reduce prediction error by more than 60%.
VoT: Event-Driven Reasoning and Multi-Level Alignment Unlock the Value of Text for Time Series Forecasting: This paper proposes VoT, a multimodal time series forecasting method that fully exploits the value of textual information through event-driven reasoning (leveraging LLMs to perform structured reasoning over exogenous text for numerical prediction) and multi-level alignment (representation-level endogenous text alignment + prediction-level adaptive frequency fusion). VoT comprehensively outperforms existing methods on real-world datasets spanning 10 domains.
WARP: Weight-Space Linear Recurrent Neural Networks: This paper proposes WARP (Weight-space Adaptive Recurrent Prediction), which explicitly parameterizes the hidden state of a linear RNN as the weights and biases of an auxiliary MLP. Input differences drive a linear recurrence to update these weights, and a nonlinear decoding step enables efficient sequence modeling. WARP achieves state-of-the-art performance on classification, forecasting, and dynamical system reconstruction tasks.