Skip to content

📈 Time Series

🔬 ICLR2026 · 121 paper notes

📌 Same area in other venues: 📷 CVPR2026 (7) · 💬 ACL2026 (8) · 🧪 ICML2026 (45) · 🤖 AAAI2026 (31) · 🧠 NeurIPS2025 (54) · 📹 ICCV2025 (4)

🔥 Top topics: Time-Series Forecasting ×77 · Diffusion Models ×8 · Anomaly Detection ×5 · LLM ×5 · Medical Imaging ×4

A General Spatio-Temporal Backbone with Scalable Contextual Pattern Bank for Urban Continual Forecasting

STBP employs a general spatio-temporal backbone based on "frequency domain + linear graph attention" to extract stable and transferable representations, supplemented by an incrementally scalable "contextual pattern bank" acting as prompts. By freezing the backbone and expanding only the pattern bank, the model achieves anti-forgetting, robust modeling, and scalability on urban streaming data with growing nodes and shifting distributions.

A Spectral-Grassmann Wasserstein metric for operator representations of dynamical systems

This paper represents the Koopman / transfer operators of dynamical systems as discrete distributions consisting of "eigenvalues + spectral projection subspaces." It defines the Spectral-Grassmann Optimal Transport (SGOT) distance on spectral spaces and Grassmann geometry, enabling dynamical systems under different sampling frequencies to be compared, classified, and interpolated via Fréchet barycenters.

A Study of Posterior Stability in Time-Series Latent Diffusion

This paper systematically analyzes the posterior collapse issue in latent diffusion for time series—proving that collapse causes the model to degenerate into a weakened version of a VAE—and proposes the "Posterior-Stable Latent Diffusion" framework. It reinterprets the diffusion process as variational inference to eliminate the dangerous KL regularization and utilizes the diffusion process to simulate collapse to penalize decoder insensitivity toward latent variables.

A Unified Federated Framework for Trajectory Data Preparation via LLMs

FedTDP unifies "Trajectory Data Preparation" (ten categories of tasks including denoising, completion, and map matching) into a cross-regional federated learning problem without sharing raw data. It utilizes a lightweight privacy autoencoder for data protection, a trajectory knowledge enhancer to transform general LLMs into "trajectory cleaning brains" with spatio-temporal awareness, and parallel optimization to reduce communication costs. It outperforms 13 SOTA methods across 10 tasks on 6 datasets.

Adapt Data to Model: Adaptive Transformation Optimization for Domain-shared Time Series Foundation Models

The TATO framework is proposed to adapt frozen Large Time-series Models (LTMs) to diverse downstream domains without fine-tuning by automatically optimizing data preprocessing pipelines (including context trimming, scale normalization, and outlier correction), achieving an average MSE reduction of 13.6% and up to 65.4%.

Are Global Dependencies Necessary? Scalable Time Series Forecasting via Local Cross-Variate Modeling

Addressing the bottleneck in multivariate time series forecasting where global attention for modeling cross-variate dependencies leads to quadratic complexity growth relative to the number of variables, this paper proposes the "Local Sufficiency Hypothesis"—suggesting that in dense systems, a finite local neighborhood likely contains sufficient predictive signals. Based on this, VPNet is designed: it rearranges patch embeddings into a 2D "Variate \(\times\) Patch" field and uses depthwise separable 2D convolutions for local mixing. This ensures complexity grows linearly with the number of variables, achieving SOTA accuracy and significant efficiency advantages across 8 benchmarks.

ASTGI: Adaptive Spatio-Temporal Graph Interactions for Irregular Multivariate Time Series Forecasting

ASTGI directly encodes each discrete observation in irregular multivariate time series as a "point" in a learnable spatio-temporal space, preserving the original sampling structure without interpolation or alignment. It dynamically constructs a causal graph for each point using nearest neighbor search and performs relation-aware message passing based on relative spatio-temporal positions. Finally, it unifies forecasting as "aggregating neighborhood information for a query point to perform regression," reducing MSE by approximately 6% compared to the second-best method across four public datasets.

Aurora: Towards Universal Generative Multimodal Time Series Forecasting

Aurora is the first multimodal time series foundation model: it is pre-trained on a cross-domain corpus of "time series + textual description + endogenous images." It utilizes modality-guided attention to inject domain knowledge from text/images into time series modeling and employs "prototype-guided flow matching" for generative probabilistic forecasting. This allows it to achieve SOTA performance in both deterministic and probabilistic forecasting under zero-shot and few-shot cross-domain scenarios.

AutoDA-Timeseries: Automated Data Augmentation for Time Series

AutoDA-Timeseries is the first general automated data augmentation (AutoDA) framework for time series. It feeds the statistical features of each time series into a learnable policy generator. Stacked augmentation layers differentiably select transformation types and adaptively adjust their probabilities and intensities using Gumbel-Softmax. Optimized jointly with the downstream model in a single stage, it consistently outperforms existing strong baselines across five major tasks: classification, long/short-term forecasting, regression, and anomaly detection.

Battery Fault: A Comprehensive Dataset and Benchmark for Battery Fault Diagnosis

This paper constructs CH-BatteryGen, the first battery system fault diagnosis dataset for electric vehicles (EVs) under real-world operating conditions. By combining "real vehicle data + mechanism-constrained generation models," it balances authenticity and scale, covering 1000 vehicles, two mainstream chemical systems, four fault labels, and three severity levels, accompanied by two benchmark tasks: fault classification and fault grading.

Benchmarking ECG FMs: A Reality Check Across Clinical Tasks

A comprehensive "reality check" benchmarking of eight ECG foundation models across 12 datasets and 26 clinical tasks reveals that ECG-CPC, a compact Structured State Space Model (SSM), outperforms large-scale Transformers in five out of seven task categories, proving that architectural design is more critical than model scale.

Beyond Accuracy: Are Time Series Foundation Models Well-Calibrated?

The authors evaluate 5 Time Series Foundation Models (TSFMs) and 2 traditional baselines using a metric system specifically designed to measure "calibration rather than sharpness." They find that TSFMs not only provide more accurate point predictions but also consistently outperform baselines in probabilistic calibration, without exhibiting the systematic overconfidence typical of vision or language foundation models.

Brain-Semantoks: Learning Semantic Tokens of Brain Dynamics with a Self-Distilled Foundation Model

Brain-Semantoks is proposed as an fMRI foundation model based on a semantic tokenizer and a self-distillation objective. It aggregates brain functional networks into robust semantic tokens and learns abstract brain dynamic representations through consistency across temporal views, achieving SOTA performance under linear probing settings.

Bridging Past and Future: Distribution-Aware Alignment for Time Series Forecasting

To address the distribution mismatch in time series forecasting caused by "forcing historical statistical patterns onto future distributions," this paper proposes TimeAlign—a plug-and-play dual-branch framework. It utilizes a "future reconstruction" branch (present only during training) to provide a target distribution for alignment. By employing global and local alignment, the prediction branch's representation is pulled toward the true future distribution, reducing MSE/MAE by 3.27%/5.20% relative to the state-of-the-art across 8 benchmarks.

Can we generate portable representations for clinical time series data using LLMs?

This paper proposes Record2Vec: using a frozen LLM to convert irregular ICU time-series records into concise clinical handoff-style natural language summaries, then encoding these summaries into fixed-length vectors using a frozen text embedding model for a standard predictor. Across three hospital cohorts and five task categories, it is not only competitive in-distribution but, more importantly, experiences less performance decay during cross-hospital transfer, requires less data for few-shot scenarios, and does not increase demographic privacy leakage.

CauKer: Classification Time Series Foundation Models Can Be Pretrained on Synthetic Data

CauKer combines Gaussian Process (GP) kernel compositions with Structural Causal Models (SCM) to generate purely synthetic time series that possess both realistic structures and inherent cluster properties. Using only this data for pretraining classification Time Series Foundation Models (TSFMs), the method nearly matches the performance of original models trained on real-world datasets orders of magnitude larger across 128 UCR datasets, while demonstrating clean data/model scaling laws for the first time.

Characteristic Root Analysis and Regularization for Linear Time Series Forecasting

This paper revisits linear time series forecasting models through the characteristic root theory of classical linear difference equations. It proves that noise leads models to learn "spurious roots" and that suppressing such noise requires disproportionately more data. Consequently, it proposes two types of "root reconstruction" regularization methods for the weight matrix—Reduced-Rank Reduction (RRR / DWRR) and an adaptive Root Purge training loss—pushing simple linear models to SOTA across multiple standard benchmarks.

Context parroting: A simple but tough-to-beat baseline for foundation models in scientific machine learning

The authors propose a minimalist baseline called "context parroting"—which simply identifies the most similar segment in historical trajectories and copies the subsequent evolution as the prediction. On zero-shot forecasting of low-dimensional chaos, turbulence, coupled oscillators, and ECG signals, this method outperforms leading foundation models like Chronos, TimesFM, Time-MoE, Moirai, and DynaMix in both accuracy and long-term attractor reconstruction, while being six orders of magnitude cheaper in inference, thereby exposing that current foundation models have not truly "learned physics."

ConvT3: Structured State Kernels for Convolutional State Space Models

ConvT3 extends the state kernel in Convolutional State Space Models (ConvSSM), previously forced to degenerate into \(1\times1\), to an equivalent \(3\times3\) convolution. This is achieved by constructing the state tensor using a "diagonalizable SSM matrix + proportionally constrained tridiagonal Toeplitz tensor," enabling stronger spatial modeling capabilities while maintaining linear-time parallel scan trainability. It achieves SOTA on long-range video generation (Moving-MNIST) and physical system (PDEBench) modeling, with more stable training than ConvS5.

CoRA: Boosting Time Series Foundation Models for Multivariate Forecasting through Correlation-aware Adapter

CoRA is a lightweight plug-and-play adapter that enables "channel-independent" Time Series Foundation Models (TSFMs)—which typically ignore inter-channel correlations—to simultaneously learn dynamic, heterogeneous (positive/negative), and partial (existing only between specific channels) correlations during downstream fine-tuning. It significantly improves multivariate forecasting accuracy across 10 real-world datasets in few-shot settings (using only 5% of samples) while introducing only linear complexity overhead during inference.

COSA: Context-aware Output-Space Adapter for Test-Time Adaptation in Time Series Forecasting

COSA attaches a lightweight linear adapter that operates exclusively in the output space to a frozen time series forecasting model. It computes a residual using "base model predictions + recent ground truth statistics," constrained by a gating mechanism for calibration. During deployment, these few parameters are updated only on ground truth that arrives with a delay. This approach is much simpler than existing "input + output" dual-adapter schemes, yet it reduces MSE by 13.91–17.03% relative to non-TTA baselines and 10.48–13.05% relative to SOTA TTA methods across 6 datasets, while being 88–90% faster in inference.

CPiRi: Channel Permutation-Invariant Relational Interaction for Multivariate Time Series Forecasting

The CPiRi framework is proposed, which achieves Channel Permutation Invariance (CPI) without sacrificing cross-channel modeling capabilities by combining a frozen pre-trained temporal encoder, a trainable permutation-equivariant spatial module, and a channel-shuffled training strategy. It achieves SOTA performance on several traffic benchmarks.

CTBench: Cryptocurrency Time Series Generation Benchmark

CTBench is the first benchmark specifically designed for time series generation (TSG) in cryptocurrency markets. Utilizing hourly data from 452 coins, 13 financial metrics, and a "Predictive Utility + Statistical Arbitrage" dual-task evaluation framework, it systematically benchmarks 8 SOTA generative models across 5 families. The study reveals a core trade-off: "high statistical fidelity \(\neq\) actual profitability," and provides a practical guide for model selection based on market conditions.

Decentralized Attention Fails Centralized Signals: Rethinking Transformers for Medical Time Series

The TeCh framework is proposed, utilizing a Core Token Aggregation-Redistribution (CoTAR) module to replace standard attention in Transformers for modeling channel dependencies in medical time series. By introducing a global "core token" as a proxy to aggregate and redistribute channel information, complexity is reduced from \(O(n^2)\) to \(O(n)\). It achieves 86.86% accuracy on the APAVA dataset (outperforming Medformer by 12.13%), using only 33% memory and 20% inference time.

DeepFRC: An End-to-End Deep Learning Model for Functional Registration and Classification

DeepFRC integrates "curve registration (alignment)" and "curve classification" into a single end-to-end deep network for joint training. This model employs a 1D-CNN to learn diffeomorphic time warping, utilizes Fourier bases for smooth spectral embedding, and applies a class-aware contrastive loss to unify alignment and classification. This work provides the first theoretical registration approximation and generalization bounds for such a joint model, outperforming SOTA in both alignment quality and classification accuracy across five real-world datasets.

DeepPrim: a Physics-Driven 3D Short-term Weather Forecaster via Primitive Equation Learning

DeepPrim explicitly incorporates advection, force terms, and source-sink terms from atmospheric primitive equations into a Neural ODE forecasting framework. By utilizing 3D-BiViT to learn the coupled dynamics across longitude, latitude, and pressure levels, it significantly outperforms most data-driven baselines in global and regional weather forecasting for 6-24 hour horizons.

Delta-XAI: A Unified Framework for Explaining Prediction Changes in Online Time Series Monitoring

A unified Delta-XAI framework is proposed to adapt 14 existing XAI methods to the scenario of explaining prediction changes in online time series via a wrapper function. Furthermore, the SWING (Shifted Window Integrated Gradients) method is introduced, which utilizes past observations to construct integration paths for capturing temporal dependencies, consistently outperforming existing methods across multiple evaluation metrics.

DeNOTS: Stable Deep Neural ODEs for Time Series

DeNOTS shifts the "depth" of Neural CDE from decreasing solver tolerance to explicitly lengthening integration time, and stabilizes long-duration integration with anti-phase negative feedback, achieving stronger expressivity, more stable trajectories, and lower discretization error accumulation across irregular time series classification, regression, and forecasting tasks.

Detection of Unknown Unknowns in Autonomous Systems

Addressing "unknown unknowns" (U2) scenarios that are only exposed after the deployment of autonomous systems (e.g., UAVs, autonomous driving, automated drug delivery), this paper notes that such risks do not cause marginal distribution shifts. Consequently, existing multivariate time series anomaly detection (MTAD) methods relying on "distribution shift" collectively fail. The authors propose SPIE-AD, which continuously recovers the underlying sparse dynamics model from signals and utilizes conformal inference to determine if the model deviates from the normal range. This achieves true zero-shot U2 detection, outperforming all baselines across 8 U2 benchmarks and 6 real-world datasets without any cheating tricks.

DistDF: Time-series Forecasting Needs Joint-distribution Wasserstein Alignment

Addressing the fundamental issue where the MSE loss generates "autocorrelation bias" when label sequences exhibit autocorrelation, DistDF shifts from estimating conditional likelihood to directly aligning the conditional distributions of predicted and label sequences. It employs the "Joint-distribution Wasserstein distance" (with a provable upper bound) as a proxy objective, leveraging the Bures–Wasserstein closed-form solution under Gaussian assumptions. This serves as a plug-and-play regularization term added to MSE, consistently achieving state-of-the-art results across multiple datasets and backbone models.

ECHO: Toward Contextual Seq2Seq Paradigms in Large EEG Models

ECHO shifts Electroencephalogram (EEG) modeling from the "encoder-centric representation + lightweight classification head" paradigm to a "decoder-centric sequence-to-sequence generation" approach. By utilizing a series of support samples as in-context examples, a unified model can automatically identify task types and predict labels without fine-tuning, outperforming task-specific Large EEG Models in multi-task settings.

EDINET-Bench: Evaluating LLMs on Complex Financial Tasks using Japanese Financial Statements

The study constructs EDINET-Bench, a financial benchmark based on ten years of Japanese EDINET annual reports. It includes three expert-level tasks: accounting fraud detection, earnings forecasting, and industry classification, finding that even SOTA LLMs only slightly outperform logistic regression.

Efficient Autoregressive Inference for Transformer Probabilistic Models

The paper proposes a Causal AR Buffer that decouples "one-time encoding of static context" from "autoregressive modeling of dependencies between targets." Without significant loss in prediction quality, it transforms the high-cost process of joint sampling and joint density evaluation—which typically requires repeated re-encoding—into an efficient, cacheable, and parallelizable workflow. This achieves up to approximately 20x inference acceleration and 7x memory savings across multiple tasks.

Enabling Arbitrary Inference in Spatio-Temporal Dynamic Systems: A Physics-Inspired Perspective

PhySTA integrates continuous neural operators with discrete graph neural networks. It employs the Graph-Temporal Fourier Neural Operator (GT-FNO) based on the Magnetic Laplacian to learn continuous dynamics, supplemented by the Adaptive Multi-scale Interaction (AMI) module using node-edge coupled convolutions to correct discrete interaction errors. This enables efficient and generalizable inference for unobserved regions and arbitrary spatio-temporal points in graph-structured systems.

End-to-End Probabilistic Framework for Learning with Hard Constraints

ProbHardE2E proposes the Differentiable Probabilistic Projection Layer (DPPL), which directly applies hard constraints to distribution parameters to enable end-to-end training. It simultaneously supports strict constraint satisfaction and uncertainty quantification in both probabilistic time series forecasting and PDE solving.

Enhancing Sparse Event Detection in Healthcare Time-Series via Adaptive Gate of Context–Detail Interaction

A coarse-to-fine framework named GCE-LDI-AGM is proposed. It integrates global context and local details via an adaptive gating mechanism, complemented by Conditional Gating Scaling (CGS) and Positional Gaussian Injection (PGI) as auxiliary supervision. This approach significantly enhances the joint detection of categories and boundaries for extremely sparse events in medical time-series.

EVEREST: A Transformer for Probabilistic Rare-Event Anomaly Detection with Evidential and Tail-Aware Uncertainty

EVEREST utilizes a compact Transformer for multivariate time-series rare-event prediction. It attaches three auxiliary heads to a shared backbone—active only during training (an Evidential NIG head for calibration, an EVT head for tail risk, and a Precursor head for early supervision). During inference, only the classification head remains, resulting in zero additional overhead. On ten years of solar flare data, it achieves TSS scores of 0.973/0.970/0.966 for C-class flares at 24/48/72-hour horizons and transfers seamlessly to the industrial anomaly dataset SKAB (F1=98.16%) without architectural changes.

Extreme Weather Nowcasting via Local Precipitation Pattern Prediction

A deterministic nowcasting framework, exPreCast, is proposed. By utilizing local spatiotemporal attention, Cubic Dual-path Upsampling (CDU), and a Temporal Extractor (TE), it approaches the extreme precipitation prediction accuracy of diffusion ensemble models on SEVIR/MeteoNet and a newly constructed balanced KMA radar dataset with only 1/30 of the computational cost.

FeDaL: Federated Dataset Learning for General Time Series Foundation Models

Ours proposes the FeDaL federated framework, which trains a general time series foundation model from scratch through client-side Domain Bias Elimination (DBE) and server-side Global Bias Elimination (GBE). It achieves competitive or superior performance on 8 types of downstream tasks with significantly fewer parameters than centralized TSFMs.

Flow-based Conformal Prediction for Multi-dimensional Time Series

This paper proposes FCP, which utilizes flows with classifier-free guidance to learn multi-dimensional predictive residual distributions conditioned on historical context. It maps probability balls in a Gaussian source space into flexible prediction sets, maintaining target coverage while significantly reducing set volume on wind power, traffic, and solar radiation data.

Free Energy Mixer

This work proposes the Free Energy Mixer (FEM), which reformulates the reading of attention values as a free energy (log-sum-exp) optimization problem. By achieving channel-wise value-aware posterior selection, it overcomes the inherent bottleneck of standard attention, characterized as "lossless storage but lossy retrieval." FEM can serve as a plug-and-play replacement for softmax/linear attention, RNNs, and SSMs, delivering consistent improvements across NLP, vision, and time-series tasks.

From Samples to Scenarios: A New Paradigm for Probabilistic Forecasting

The authors propose the Probabilistic Scenarios paradigm, which replaces sampling by directly outputting a finite set of {scenario, probability} pairs. Using TimePrism—a model consisting of only three parallel linear layers—they achieve 9/10 SOTA results across 5 benchmark datasets.

GARLIC: Graph Attention-based Relational Learning of Multivariate Time Series in Intensive Care

GARLIC chains "exponential decay imputation + time-lagged signal graph message passing + cross-dimensional sequence attention" into an end-to-end pipeline. It not only achieves a new SOTA for ICU irregular multivariate time series prognosis but also provides endogenous explanations at the observation, signal, and edge levels using learned attention weights and graph edges.

GCGNet: Graph-Consistent Generative Network for Time Series Forecasting with Exogenous Variables

GCGNet addresses time series forecasting with exogenous variables by converting both generated and ground-truth complete sequences into patch-level graph structures. It constrains the generator with graph consistency and refines predictions using sparse graph convolutions. It achieves top performance across most metrics on 12 real-world datasets and maintains strong robustness when future exogenous variables are missing or masked.

GTM: A General Time-series Model for Enhanced Representation Learning

GTM is proposed as a general time-series foundation model that captures time-granularity-aware features through a frequency-domain attention mechanism and unifies reconstruction and autoregressive pre-training objectives via hybrid masking. It achieves SOTA performance across multiple tasks, including forecasting, imputation, anomaly detection, and classification.

HiVid: LLM-Guided Video Saliency For Content-Aware VOD And Live Streaming

The HiVid framework is proposed, marking the first use of LLMs as human proxies to generate content importance weights for video chunks. Through a perception module (sliding window scoring), a ranking module (LLM-guided merge sort to remove scoring bias), and a prediction module (multimodal time series forecasting for adaptive latency), it achieves content-aware streaming. HiVid improves VOD PLCC by 11.5%, live streaming prediction by 26%, and human MOS correlation by 14.7%.

ICDiffAD: Implicit Conditioning Diffusion Model for Time Series Anomaly Detection

Addressing the inherent stochasticity issues in diffusion models for time series anomaly detection—such as "random reconstruction from Gaussian noise" and "reconstructing sine waves as cosine waves"—ICDiffAD utilizes a Signal-to-Noise Ratio (SNR) based noise scheduler and a per-sample implicit conditioning mechanism. This allows the reverse diffusion to start from a "partially corrupted input" rather than pure noise, achieving input-consistent reconstruction while maintaining generative flexibility, thereby reducing the false positive rate by 60%.

Improving Extreme Wind Prediction with Frequency-Informed Learning

This paper theoretically proves from a frequency domain perspective that the "MSE training + pattern shift → high-frequency amplitude contraction" mechanism is the root cause of the systematic underestimation of extreme wind speeds by data-driven models. Accordingly, it proposes a triad of gradient penalty loss + NS physical embedding structure + frequency-separated reweighting, which significantly enhances extreme wind prediction accuracy without sacrificing overall performance.

Inferring brain plasticity rule under long-term stimulation with structured recurrent dynamics

This paper proposes STEER, which models brain network reorganization under long-term neural stimulation as a stimulus-conditioned slow-timescale dynamical law. By utilizing structured low-rank RNNs to interpret fast intra-session neural activity, it enables the inference of interpretable plasticity rules from longitudinal recordings. The model predicts network evolution under unseen stimulation protocols in Lorenz systems, BCM rules, stimulus-induced task learning, and Parkinsonian rat DBS data.

JAPAN: Joint Adaptive Prediction Areas with Normalising-Flows

JAPAN uses Normalising Flows (NF) to estimate (conditional) density and employs log-density as the conformal score. By thresholding the density, it constructs prediction areas that are geometry-independent, potentially disconnected, and context-adaptive. While maintaining finite-sample coverage guarantees, it compresses the prediction area volume significantly more than various residual-based baselines.

Language in the Flow of Time: Time-Series-Paired Texts Weaved into a Unified Temporal Narrative

This work discovers that text paired with time series exhibits a periodicity similar to the time series itself (Chronological Textual Resonance). It proposes the TaTS framework, which transforms textual representations into auxiliary variables to enhance the forecasting and imputation performance of any existing time series model in a plug-and-play manner.

Latent-to-Data Cascaded Diffusion Models for Unconditional Time Series Generation

Ours proposes L2D-Diff—a cascaded (latent-to-data) dual-space framework that decomposes unconditional time series generation into two steps: modeling the high-level representation distribution in latent space and using these representations as conditions to guide the refinement of local details in data space, thereby balancing representation consistency and local fidelity.

Learning Koopman Representations with Controllability Guarantees

This work encodes "controllability" as a structural prior directly into Koopman representation learning. By parameterizing latent linear operators with a new controllable canonical form, the learned Neural ODE model is controllable by construction, enabling accurate fitting and direct MPC application even under data scarcity.

Learning Linear State-Space Models with Sparse System Matrices

This paper introduces sparsity-inducing priors (Student's t-distribution) to the system matrices \(A, B, C, D\) of Linear State-Space Models (LSSM). By employing EM combined with block coordinate descent for MAP estimation, the method bypasses the unidentifiability caused by "similarity transformations," learning sparse system matrices that are both accurate and capable of preserving the true topology between variables.

Learning Mixtures of Linear Dynamical Systems via Hybrid Tensor-EM Method

This paper proposes a hybrid Tensor-EM framework for learning "Mixtures of Linear Dynamical Systems (MoLDS)." It utilizes a tensor method of moments based on Simultaneous Matrix Diagonalization (SMD) for globally consistent initialization, followed by a full Kalman filter-smoother EM for local refinement. This approach balances global identifiability with statistical optimality and marks the first successful application of MoLDS to real neural data from non-human primates.

Learning Recursive Multi-Scale Representations for Irregular Multivariate Time Series Forecasting

The authors propose ReIMTS, which preserves the original sampling patterns of irregular multivariate time series (IMTS) through period-based recursive splitting (rather than resampling). Combined with an irregularity-aware representation fusion mechanism for multi-scale modeling, it achieves an average improvement of 27.1% across six IMTS backbones as a plug-in.

Local Geometry Attention for Time Series Forecasting under Realistic Corruptions

By using local Gaussian Processes, the attention scoring is transformed from Euclidean dot-product to a "query-adaptive negative Mahalanobis distance." This prevents Transformers from being biased by outliers under realistic corruptions like spikes or level-shifts. Simultaneously, the first statistically grounded robustness benchmark for time series, TSRBench, is proposed.

Long-range Modeling and Processing of Multimodal Event Sequences

MM-TPP extends Temporal Point Processes (TPP) from "Time + Type + Text" to a full multimodal generation framework including "Time + Type + Text + Image." By employing adaptive sequence compression based on time-interval similarity, it fits long sequences involving thousands of events and tens of thousands of tokens into a fixed context window, outperforming SOTA TPP baselines in both prediction accuracy and long-form analytical report generation.

Lost in the Non-convex Loss Landscape: How to Fine-tune the Large Time Series Model?

By linearly interpolating the weights of a pre-trained large time series model with a randomly initialized "sparring" model, the smooth loss landscape of the latter is used to "level out" the sharp, non-convex landscape of the former. This allows full fine-tuning to truly benefit from pre-training without increasing any memory or computational overhead.

MambaSL: Exploring Single-Layer Mamba for Time Series Classification

By using only a single-layer Mamba and applying minimal modifications to the selective SSM and projection layers based on four TSC-specific hypotheses (H1–H4), this work re-evaluates 20 strong baselines across all 30 UEA datasets fairly, achieving a statistically significant SOTA.

MixLinear: Extreme Low Resource Multivariate Time Series Forecasting with 0.1K Parameters

MixLinear employs a dual-pathway linear architecture of "temporal segmentation for local trends + frequency adaptive low-rank filtering for global trends," reducing long-term time series forecasting (LTSF) model parameters to only 0.1K (45–176) while achieving accuracy comparable to or better than mainstream lightweight models on 8 benchmarks.

MMPD: Diverse Time Series Forecasting via Multi-Mode Patch Diffusion Loss

The training loss is upgraded from MSE, which assumes the future follows a unimodal Gaussian distribution, to a MMPD loss parameterized by a diffusion process. It serves as a plug-and-play module attached to any patch-based time series backbone, enabling the prediction of multiple probabilistic futures with diverse shapes from the same historical input.

Multi-Scale Hypergraph Meets LLMs: Aligning Large Language Models for Time Series Analysis

MSH-LLM supplements time series with semantics using "learnable hyperedges," aligns temporal features to LLM lexical prototypes across multiple scales via cross-modal attention, and activates the temporal reasoning of LLMs through "Mixture of Prompts," achieving SOTA performance on 27 datasets across 5 task categories.

Omni-iEEG: A Large-Scale, Comprehensive iEEG Dataset and Benchmark for Epilepsy Research

This paper constructs the Omni-iEEG dataset (302 patients, 178 hours of high-resolution intracranial EEG recordings), defines standardized benchmark tasks and evaluation metrics based on clinical priors, and demonstrates that end-to-end modeling can match or surpass traditional biomarker-based methods in epilepsy surgical planning.

Online Time Series Prediction Using Feature Adjustment

This paper proposes ADAPT-Z (Automatic Delta Adjustment via Persistent Tracking in Z-space), shifting the adaptation target of online time series forecasting from model parameter updates to feature space correction. By using a lightweight adapter to fuse current features with historical gradients, it addresses the delayed feedback issue in multi-step forecasting and consistently outperforms existing online learning methods across 13 datasets.

Panda: A Pretrained Forecast Model for Chaotic Dynamics

This paper uses an evolutionary algorithm to "create" 20,000 new chaotic ordinary differential equations (ODEs) as a synthetic training set. Combined with a patch Transformer (Panda) featuring channel attention and dynamical embedding, it achieves zero-shot prediction of unseen chaotic systems and even high-dimensional PDEs after pretraining only on low-dimensional ODEs, demonstrating neural scaling laws specific to dynamical systems.

Perturbed Dynamic Time Warping: A Probabilistic Framework and Generalized Variants

This paper reinterprets soft-DTW from the perspective of perturbed optimization—where random noise is added to alignment costs before taking the expected minimum—proving it to be a special case under Gumbel noise. The noise is generalized to the Generalized Extreme Value (GEV) distribution to derive nested-soft-DTW (ns-DTW) with adjustable skewness, which consistently outperforms soft-DTW in time series barycenter computation, clustering, and classification.

PhaseFormer: From Patches to Phases for Efficient and Effective Time Series Forecasting

To address the issue of exploding parameter counts and computational costs in long-term forecasting caused by the drift of periodic patterns in patch tokens, this paper adopts a "phase perspective." It aggregates values at the same offset positions across cycles into tokens, proving that they are more stable and lower-dimensional than patches. Based on this, PhaseFormer is designed with only approximately 1k parameters, achieving SOTA accuracy across seven benchmarks while reducing FLOPs by about 99.99%.

PHAT: Modeling Period Heterogeneity for Multivariate Time Series Forecasting

PHAT identifies that in real-world multivariate time series, different variables have distinct and dynamically changing period lengths (period heterogeneity). It first uses FFT to group variables into "period buckets" according to their primary periods and folds them into phase-aligned 2D tensors. Then, an "X-shaped" self-attention mechanism with positive/negative decomposition and periodic modulation terms is used to model periodic dependencies. Finally, multiple periodic components are fused via weighting based on frequency saliency. PHAT achieves SOTA results on approximately 74% of metrics across 14 real-world datasets and 18 baselines, while maintaining parameters and computational costs over an order of magnitude lower than Transformer-based methods.

PMDformer: Patch-Mean Decoupling Information Transformer for Long-term Forecasting

PMDformer points out that the true "shape similarity" between patches is often drowned out by different numerical scales (means). It explicitly decouples trends and residual shapes by "subtracting the mean of each patch." It then re-stitches local shapes with global trends using Proximal Variable Attention (cross-variable interaction only on the most recent patch) and Trend Restoration Attention (injecting means back into the Value channel), surpassing various SOTA models with more stable and accurate performance across 8 LTSF benchmarks.

Point-wise Anomaly Detection via Fold-bifurcation ODE

FOLD reformulates time series anomaly detection as "tracking how far the system is from a critical transition." It extracts "sensitivity + uncertainty" stress signals from a frozen prediction model and injects them into an ODE inspired by fold-bifurcation to evolve a risk state \(z(t)\). An anomaly is detected when \(z(t)\) crosses a threshold calibrated only on normal data. The entire process requires no anomaly labels or detector training. FOLD achieves the best average ranking under strict point-wise evaluation across 40 benchmarks compared to 34 SOTAs.

pyrregular: A Unified Framework for Irregular Time Series, with Classification Benchmarks

This paper proposes pyrregular, a unified container based on xarray and sparse COO tensors that systematically organizes three types of irregularity in time series (uneven sampling, partial observation, and raggedness). It provides the first standardized data repository for irregular time series classification (34 datasets) and a cross-community benchmark (12 classifiers), concluding that the simple, generic ROCKET model surprisingly performs the best overall on such data.

Quadratic Direct Forecast for Training Multi-Step Time-Series Forecast Models

Addressing the flaw in multi-step time-series forecasting where MSE treats each future step as an independent, equal-weighted task, this paper derives a "quadratic learning objective" weighted by the inverse conditional covariance matrix from a maximum likelihood perspective. Using a bilevel optimization framework (QDF), this weighting matrix is treated as a learnable parameter and learned on a hold-out set to maximize generalization. As a plug-and-play loss replacement for MSE, it consistently achieves SOTA results across 8 datasets and various forecasting models.

Random Controlled Differential Equations

By utilizing a large collection of Controlled Differential Equations (CDEs) / Rough Differential Equations (RDEs) with random parameters as a continuous-time reservoir and training only a final linear readout layer, a fast, scalable time-series classifier is obtained. It strictly converges to the "signature kernel" in the infinite-width limit, preserving the inductive bias of path signature methods while eliminating the overhead of explicit signature calculation and kernel matrix inversion.

Rating Quality of Diverse Time Series Data by Meta-learning from LLM Judgment

Ours proposes the TSRating framework, which utilizes LLMs to perform pairwise quality comparisons of time series (TS) data blocks across four dimensions: trend, frequency, amplitude, and pattern. These comparisons are converted into scalar quality scores using the Bradley-Terry model. A TSRater model (comprising a MOMENT encoder and an MLP) is then trained via MAML meta-learning across 22 subsets in 9 domains, achieving efficient and unified cross-domain TS data quality assessment.

Reasoning on Time-Series for Financial Technical Analysis

This paper proposes the Verbal Technical Analysis (VTA) framework, which combines the linguistic reasoning capabilities of LLMs with the pattern-capturing abilities of time-series models. By optimizing the reasoning chain through Time-GRPO reinforcement learning and conditioning time-series forecasting on reasoning attributes, the framework achieves financial time-series prediction that is both accurate and interpretable.

Relational Feature Caching for Accelerating Diffusion Transformers

The Relational Feature Caching (RFC) framework is proposed to enhance the precision of cached feature prediction by leveraging the strong correlation between input and output features of DiT modules. It includes Relational Feature Estimation (RFE) to estimate output change magnitudes from input changes and Relational Cache Scheduling (RCS) to trigger full computation using input errors as a proxy. RFC significantly outperforms existing temporal extrapolation-based caching methods in image and video generation tasks.

Relational Transformer: Toward Zero-Shot Foundation Models for Relational Data

The paper proposes the Relational Transformer (RT) architecture. Through task table prompting, cell tokenization, and Relational Attention mechanisms, the model can be pre-trained on multiple relational databases and transferred zero-shot to unseen datasets and tasks. A 22M parameter model achieves a zero-shot AUROC of 93% compared to fully supervised methods, significantly outperforming a 27B LLM (84%).

Reliable Probabilistic Forecasting of Irregular Time Series via Marginal Consistent Flows

This paper proposes MOSES (Mixtures of Separable Flows), which uses a mixture of normalizing flows—combining a "multivariate Gaussian source distribution + variable-wise separable spline transformations"—to perform probabilistic forecasting for irregular time series. This approach ensures "marginal consistency," where predictions for subset queries are perfectly self-consistent with the margins integrated from the joint distribution. It significantly outperforms the previous SOTA ProFITi in marginal prediction while maintaining near-SOTA joint prediction performance.

Repurposing Foundation Model for Generalizable Medical Time Series Classification

FORMED freezes a forecasting foundation model (TimesFM) pre-trained on general time series to serve as a feature extractor, appending a novel classification head composed of "Channel Embedding + Label Query + Shared Decoding Attention." By jointly training on multiple MedTS datasets, medical domain knowledge is consolidated into the shared layers. This enables adaptation to new medical time series datasets with arbitrary channel counts, sequence lengths, and class numbers using only 0.1% of the parameters, achieving a maximum absolute F1 improvement of 35% on ADFTD.

ResCP: Reservoir Conformal Prediction for Time Series Forecasting

This paper introduces Reservoir Computing (Echo State Network) into Conformal Prediction for the first time. By encoding temporal dynamics of residual sequences using a randomly initialized ESN and utilizing state similarity to adaptively reweight historical residuals, it constructs local prediction intervals. Without any training, it achieves SOTA Winkler scores on four real-world datasets and is 20-80× faster than HopCPT.

Routing Channel-Patch Dependencies in Time Series Forecasting with Graph Spectral Decomposition

Ours proposes xCPD, a plug-and-play plugin that refines modeling units from "channels" to "channel-patches" in multivariate time series. By utilizing shared graph Fourier bases for spectral embedding, it groups units by frequency energy response into low, medium, and high bands. Dynamic MoE routing adaptively selects frequency-specific filtering experts, enabling seamless integration into any existing CI/CD models to consistently improve long-term and short-term forecasting performance and support zero-shot transfer.

SciTS: Scientific Time Series Understanding and Generation with LLMs

The authors propose the SciTS benchmark, covering 43 tasks and 54K+ instances across 12 scientific fields (with lengths from \(10^0\) to \(10^7\) and frequencies up to 10MHz). Systematic evaluation of 17 models reveals that general LLMs generalize better than specialized time series models, though text/image encodings have limitations. Accordingly, the TimeOmni framework is designed using Multi-Patch Experts, a routing mechanism, and Patch Reprogramming to explicitly model temporal dynamics and train jointly with LLMs.

Semantic-Enhanced Time-Series Forecasting via Large Language Models

SE-LLM enhances token representations by injecting the periodicity and anomaly characteristics of time series into the semantic space of a pre-trained LLM (via the TSCC module). It further utilizes an LSTM-embedded adapter (Time-Adapter) to complement the LLM's capacity for modeling long- and short-term temporal dependencies, achieving state-of-the-art (SOTA) performance across long-term, short-term, and zero-shot forecasting while keeping the LLM frozen and compressing the sequence length.

SONATA: Synergistic Coreset Informed Adaptive Temporal Tensor Factorization

SONATA unifies "expressive dynamic embedding modeling" and "adaptive coreset sample selection" into a streaming tensor factorization framework. It employs Linear Dynamical Systems (LDS) derived from Matérn kernels to characterize the multi-scale temporal evolution of entity embeddings. By utilizing a four-criterion scoring system comprising "Uncertainty + Influence + Novelty + Information Gain" in conjunction with the Bellman equation to dynamically maintain a compact, highly informative coreset, the model reduces the RMSE by up to 61.5% relative to the runner-up method on datasets like CA Traffic under a single-pass data stream constraint.

SRT: Super-Resolution for Time Series via Disentangled Rectified Flow

SRT transfers the concepts of image super-resolution to time series: it first decomposes the low-resolution sequence into trend and seasonal components, aligns them to the target resolution using an Implicit Temporal Function (ITF), and then employs two rectified flow models with Cross-Resolution Attention to complement high-frequency details. It achieves SOTA on 9 datasets for both sampling-based and aggregation-based super-resolution tasks, requiring only 4-step sampling for inference.

ST-HHOL: Spatio-Temporal Hierarchical Hypergraph Online Learning for Crime Prediction

ST-HHOL utilizes "Heterogeneous Hypergraph Modeling for Crime Patterns + Homogeneous Hypergraph Modeling for Co-occurrence Relations" to characterize high-order contextual factors behind sparse crime data. Combined with an online learning strategy featuring "frequent fine-tuning for short-term fluctuations + periodic retraining for long-term drift" and a partially frozen GPT-2, it consistently outperforms all offline and online baselines in MAE/MAPE across four real-world city crime datasets.

STABLE: Shift-Tolerant Allocation via Black–Litterman Using Conditional Diffusion Estimates

STABLE utilizes conditional diffusion models to generate "regime-aware" individual stock return distributions, which are then injected as investor views into Black–Litterman mean-variance optimization. This approach improves the Sharpe ratio by up to 122.9% across four regional equity markets while simultaneously reducing drawdowns and volatility.

STDDN: A Deep Learning Framework for Crowd Simulation Guided by the Fluid Continuity Equation

STDDN treats crowds as continuous fluid media, utilizing the fluid mechanics continuity equation as a strong physical constraint and Neural ODEs to model macroscopic density field evolution. This macro-constraint is used to inversely regularize a microscopic trajectory prediction network, simultaneously achieving state-of-the-art accuracy in long-term simulations across four real-world datasets while drastically reducing inference latency (by up to 90%).

STORM: Synergistic Cross-Scale Spatio-Temporal Modeling for Weather Forecasting

STORM explicitly decomposes global meteorological fields into fine-to-coarse multi-scale representations. Through cross-scale messaging, lightweight temporal evolution encoding, and level-aligned decoding, it simultaneously enhances short-term accuracy and 7-10 day long-term stability for ERA5 global and regional weather forecasting.

Structure Learning from Time-Series Data with Lag-Agnostic Structural Prior

This paper investigates how to incorporate coarse-grained causal priors—where variable \(j\) affects variable \(i\) but the specific lag is unknown—into time-series structure learning. By using process-equivalent prior losses and data-driven initialization, the method more stably recovers fine-grained lagged causal structures.

SuperMAN: Interpretable and Expressive Networks over Temporally Sparse Heterogeneous Data

SuperMAN models "multi-type, irregularly sampled, and asynchronous" sparse temporal data as "a set of implicit graphs." By utilizing an extended Graph Additive Network (ExtGNAN) combined with a subset grouping mechanism, it directly learns from these structures. This approach provides interpretable contribution scores at three granularities (node, graph, and subset) while allowing users to trade fine-grained interpretability for stronger expressivity via "grouping" when domain priors are available. It achieves SOTA results on Crohn’s disease onset prediction, ICU length of stay, and fake news detection.

SwiftTS: A Swift Selection Framework for Time Series Pre-trained Models via Multi-task Meta-Learning

SwiftTS is introduced as the first model selection framework for time series pre-trained models. It employs a dual-encoder architecture to independently embed dataset patch-level temporal features and model meta-information (architecture, topology, and functionality). Compatibility scores are computed via patch-level cross-attention, combined with a horizon-adaptive Mixture-of-Experts (MoE) and cross-domain/cross-horizon meta-learning. Across 14 datasets and 8 models, it significantly outperforms all baselines with a mean weighted Kendall \(\tau_\omega = 0.442\).

T1: One-to-One Channel-Head Binding for Multivariate Time-Series Imputation

T1 is proposed as a CNN-Transformer hybrid architecture. Its core innovation is Channel-Head Binding (CHead Attention): a shared Depthwise Conv extracts \(C\) types of temporal features (trend, periodicity, abrupt changes, etc.) for each variable, followed by a one-to-one binding of each CNN channel with an attention head. This ensures that cross-variable information transfer occurs independently at the feature level. When missing data prevents a channel from extracting valid patterns, the corresponding attention head is automatically down-weighted, achieving adaptive missing value processing without explicit design. On 11 benchmark datasets, the MSE is reduced by an average of 46%, with even greater advantages under 70% extreme missingness.

Tackling Time-Series Forecasting Generalization via Mitigating Concept Drift

This paper categorizes distribution shifts in time-series forecasting into "temporal shift" and "concept drift." It proposes a Soft Attention Mask (SAM) to extract stable invariant patterns from exogenous features in both the look-back and prediction windows to mitigate concept drift. Using a model-agnostic framework, ShifTS, which "treats temporal shift first, then concept drift," it consistently improves forecasting accuracy across multiple datasets and models.

TEDM: Elucidated Diffusion Models for Time Series Forecasting

TEDM ports the EDM (Elucidated Diffusion Models) framework from image generation to multivariate time series forecasting. The key is to align the diffusion time axis with the physical time axis and replace manually preset schedules with empirically estimated noise/scale schedules from data. This reduces sampling complexity from \(O(SH)\) to \(O(H)\), achieving SOTA results across multiple long-sequence forecasting benchmarks using a lightweight network.

Temporal Generalization: A Reality Check

This paper systematically evaluates the practice of interpolating or extrapolating future model parameters using historical checkpoints under a strict "no future data" setting. It finds that model averaging and Taylor extrapolation are generally inferior to simply using the most recent model; while simple parameter scaling is relatively stable for some language tasks, it is not a universal solution.

TEN-DM: Topology-Enhanced Diffusion Model for Spatio-Temporal Event Prediction

TEN-DM transforms spatio-temporal point processes (STPP) simultaneously into multi-semantic event graphs and multi-scale temporal sequence images. It utilizes graph representations, zigzag topological features, and temporal query attention to jointly condition the diffusion denoising process, enabling more accurate prediction of the time intervals and spatial locations of future events.

Tensor learning with orthogonal, Lorentz, and symplectic symmetries

This paper provides a complete parametrization of equivariant polynomial functions under the diagonal action of the orthogonal group \(O(d)\), indefinite orthogonal groups (including the Lorentz group), and the symplectic group \(Sp(d)\) on tensors. These characterizations are applied to design learnable sparse vector recovery algorithms that outperform existing sum-of-squares spectral methods across various data-generating assumptions.

The Forecast After the Forecast: A Post-Processing Shift in Time Series

This paper proposes \(\delta\)-Adapter: a lightweight post-processing module constrained by \(\delta\), added before and after a frozen time series forecasting backbone. By utilizing input fine-tuning, output residual correction, sparse feature selection, and uncertainty calibration, it consistently improves prediction accuracy and interval coverage quality without modifying the model architecture or retraining the backbone.

Time-Gated Multi-Scale Flow Matching for Time-Series Imputation

This paper models multivariate time-series imputation as a "noise \(\to\) data" data-conditional ODE. It utilizes flow matching to learn the velocity field, prevents information leakage via visibility-masked attention, schedules "coarse-to-fine" frequency content through time-gated multi-scale velocity heads, and anchors observed points to a linear bridge using the Heun integrator with data consistency projection. This approach achieves competitive or superior imputation accuracy across ten benchmarks with deterministic, low-compute inference.

TimeOmni-1: Incentivizing Complex Reasoning with Time Series in Large Language Models

TimeOmni-1 proposes the first unified time series reasoning model. Through TSR-Suite (the first reasoning-oriented time series dataset suite) and a two-stage training process (SFT for injecting time series priors + RL for refining reasoning), it significantly outperforms GPT-4.1 across multiple time series reasoning tasks.

TimeRecipe: A Time-Series Forecasting Recipe via Benchmarking Module Level Effectiveness

The authors decompose modern time-series forecasting models into a five-component "Canonical Architecture" (pre-processing, embedding, feed-forward modeling, projection, and post-processing). By conducting over 10,000 experiments to systematically evaluate the effectiveness of each design at a modular granularity across different data/tasks, they found that combinations obtained through exhaustive design space search outperform existing SOTA models in over 90% of scenarios. Based on these findings, they trained a training-free LightGBM toolkit that directly recommends architecture configurations according to data characteristics.

TimeSeriesExamAgent: Creating Time Series Reasoning Benchmarks at Scale

This paper proposes a "scalable benchmark creation" methodology: first, a domain-agnostic TimeSeriesExam multiple-choice benchmark is constructed using manual templates and synthetic time series. Then, the TimeSeriesExamAgent multi-agent framework extends this paradigm to any real-world dataset. By having a generator LLM write "item templates" (Python functions) and passing them through three-stage verification, the framework automatically generates domain-specific reasoning questions with diversity comparable to manual benchmarks. Experiments reveal that even the strongest VLMs achieve an average accuracy of only 51.5% on these tasks.

TimeSliver: Symbolic-Linear Decomposition for Explainable Time Series Classification

The authors propose TimeSliver, an interpretability-driven deep learning framework that jointly utilizes raw time series data and symbolic abstractions (binning) to construct representations aligned with the original temporal structure. Each element linearly encodes the contribution of the corresponding time interval to the final prediction, enabling the derivation of positive/negative attribution scores for each time point. The method exceeds other approaches by 11% in temporal attribution accuracy across 7 datasets while matching SOTA prediction performance on 26 UEA benchmarks.

Towards Generalizable PDE Dynamics Forecasting via Physics-Guided Invariant Learning

The iMOOE framework is proposed, which explicitly defines the two-level physical invariance principle of "operator invariance + compositional invariance" in PDE systems. By designing an aligned Mixture of Operator Experts (MOOE) network and a frequency-enhanced risk equality objective, the method achieves SOTA zero-shot PDE dynamics forecasting under various OOD scenarios without requiring any test-time adaptation.

Towards Multimodal Time Series Anomaly Detection with Semantic Alignment and Condensed Interaction

MindTS advances time series anomaly detection from unimodal numerical data to "Time Series + Text" multimodality. It aligns endogenous text (statistical descriptions generated from the sequence itself) and exogenous text (external background knowledge) with time series representations via cross-view fusion. A content condenser based on the Information Bottleneck (IB) filters redundant text and utilizes the condensed information to reconstruct masked time series. The method outperforms 17 unimodal and multimodal baselines across 6 real-world datasets.

Towards Robust Real-World Multivariate Time Series Forecasting: A Unified Framework

ChannelTokenFormer (CTF) is proposed as a unified Transformer framework to simultaneously address three major challenges in real-world multivariate time series forecasting: (1) complex cross-channel dependencies—addressed via inter-channel cross-attention with channel tokens; (2) asynchronous sampling—addressed via frequency-domain dynamic patching to maintain original resolution; and (3) block-wise missingness during testing—addressed by patch masking during training and direct removal of missing patches during inference. The framework achieves State-of-The-Art (SOTA) performance across six datasets, including ETT, SolarWind, Weather, EPA, and CHS.

TrajFlow: Nationwide Pseudo GPS Trajectory Generation with Flow Matching Models

TrajFlow introduces Flow Matching to GPS trajectory generation for the first time. Combined with a "per-trajectory normalization + RDP compression + OD condition normalization" strategy, it stably generates pseudo GPS trajectories across city, metropolitan, and national scales using approximately 10 ODE integration steps. It outperforms diffusion and other deep generative baselines on a dataset covering millions of real trajectories in Japan, showing significant advantages at the national scale.

TRIDENT: Cross-Domain Trajectory Spatio-Temporal Representation via Distance-Preserving Triplet Learning

TRIDENT utilizes a unified architecture (GCN spatial embedding + Date2Vec temporal embedding + Bi-directional Cross-Attention Encoder + non-linear tanh projection pooling) to simultaneously model continuous GPS trajectories and discrete badminton hit-point trajectories. It introduces a "Distance-Preserving Multi-kernel Triplet Loss" to align distances in the embedding space with the original trajectory space, consistently outperforming strong baselines in retrieval accuracy, training efficiency, and cross-domain generalization.

TSPulse: Tiny Pre-Trained Models with Disentangled Representations for Rapid Time Series

Ours proposes TSPulse, an ultra-lightweight time series pre-trained model with only 1M parameters. Through dual-space masked reconstruction and dual-embedding disentanglement strategies, it outperforms models 10-100 times larger in four major tasks: classification (+5-16%), anomaly detection (+20%), imputation (+50%), and similarity retrieval (+25%).

Tuning the burn-in phase in training recurrent neural networks improves their performance

This work provides a theoretical proof of the critical impact of burn-in phase length \(m\) on Truncated Backpropagation Through Time (TBPTT) performance in RNN training. It establishes upper bound estimates for training regret and validates through system identification and time series prediction experiments that proper tuning of the burn-in phase can reduce prediction error by over 60%.

Understanding the Implicit Biases of Design Choices for Time Series Foundation Models

This paper does not propose a new model or chase SOTA. Instead, it systematically maps three common design knobs of Time Series Foundation Models (TSFM)—patch size, embedding method (discrete quantization vs. continuous), and training loss (CE vs. L1/L2)—to three types of "implicit biases" (temporal, geometric, and regression-to-the-mean). Through theory and controlled experiments, it illustrates how each knob shapes the model's preference for frequency/periodicity, geometric structure, and predictive form under uncertainty, showing how these biases intertwine in scenarios like outlier handling.

Understanding Transformers in Time Series Forecasting: A Case Study on MOIRAI

This paper theoretically answers "why Transformers (especially MOIRAI) are so powerful in time series forecasting" by proving that a Transformer can fit an autoregressive (AR) model on input sequences via gradient descent through in-context learning. It further demonstrates how MOIRAI's any-variate encoding and attention mechanism automatically parallelize AR regressions of arbitrary numbers of covariates into a single set of weights, providing a pre-training generalization bound of \(O(1/\sqrt{nT})\) under Dobrushin conditions.

Understanding Transformers for Time Series: Rank Structure, Flow-of-ranks, and Compressibility

This paper analyzes Time Series Transformers from the perspective of "numerical rank." It proves that patch embeddings of time series naturally fall into extremely low-rank subspaces, allowing \(Q/K/V\) attention matrices to be approximated by low-rank counterparts. It proposes the "flow-of-ranks" to explain why rank grows with depth and why shallow layers are most compressible. Based on these insights, the time series foundation model Chronos is compressed to achieve a 65% reduction in inference time and 81% in VRAM with no loss in accuracy.

Uni-NTFM: A Unified Foundation Model for EEG Signal Representation Learning

Starting from neuroscience first principles, Uni-NTFM designs Heterogeneous Feature Projection (HFPM) to decouple time-frequency encoding, Hierarchical Topological Embedding (TE) to unify heterogeneous electrode configurations, and MoE Transformer to achieve functional modularity and sparse coding. Pretrained on 28,000 hours of EEG data with 1.9B parameters, it achieves SOTA in linear probing and fine-tuning across 9 downstream tasks.

UniCA: Unified Covariate Adaptation for Time Series Foundation Model

UniCA maps heterogeneous covariates such as categories, images, and text into a unified "implicit time series" representation. It then integrates these via pre-fusion and post-fusion attention modules into a frozen time series foundation model, improving covariate-aware forecasting performance without compromising pre-trained generalization capabilities.

Unlocking the Value of Text: Event-Driven Reasoning and Multi-Level Alignment for Time Series Forecasting

VoT is proposed, a multi-modal time series forecasting method that fully exploits text value through event-driven reasoning (utilizing LLMs for structured reasoning on exogenous text to obtain numerical predictions) and multi-level alignment (representation-level endogenous text alignment + prediction-level adaptive frequency fusion). It comprehensively outperforms existing methods across 10 real-world domains.

Weight-Space Linear Recurrent Neural Networks

Ours proposes WARP (Weight-space Adaptive Recurrent Prediction), which explicitly parameterizes the hidden state of a linear RNN as the weights and biases of an auxiliary MLP. It utilizes input differences to drive linear recurrence for weight updates, combined with non-linear decoding to achieve efficient sequence modeling, reaching SOTA on tasks including classification, prediction, and dynamical system reconstruction.

When Foundation Models Are One-Liners: Limitations and Future Directions for Time Series Anomaly Detection

This paper systematically verifies the actual performance of five Time Series Foundation Models (TSFMs)—MOMENT, Chronos, TimesFM, Time-MoE, and TSPulse—on Time Series Anomaly Detection (TSAD). It finds that their zero-shot performance does not significantly differ from simple "one-liner" baselines written in a single line of code, such as "moving window variance" or "squared difference." The root cause is that the core assumption—"anomalies are harder to reconstruct/predict"—does not hold. Based on this, the paper proposes three remedial directions to make TSFMs truly effective.

Zero-shot Forecasting by Simulation Alone

This paper proposes SarSim0—a fast time-series simulator based entirely on stable SARIMA processes. It is used to generate approximately 1 billion purely synthetic sequences online to pre-train general forecasting backbones. This enables small models to match or even exceed the forecasting accuracy of large foundation models (Chronos, MOIRAI, TimesFM) trained on real data under a strict zero-shot protocol. Furthermore, a "student surpasses teacher" phenomenon (neural networks exceeding the AutoARIMA that generated their training data) is observed on GiftEval.