Enhancing Multivariate Time Series Forecasting with Global Temporal Retrieval¶

Conference: ICLR 2026 arXiv: 2602.10847 Code: https://github.com/macovaseas/GTR Area: Video Understanding / Time Series Forecasting Keywords: Time Series Forecasting, Global Periodicity, Retrieval Augmentation, Plug-and-Play Module, 2D Convolution

TL;DR¶

This paper proposes the Global Temporal Retriever (GTR), a lightweight plug-and-play module that maintains adaptive global period embeddings and leverages absolute time indices to retrieve temporally aligned global periodic information, enabling arbitrary forecasting models to transcend the look-back window constraint and effectively capture global periodic patterns far exceeding the input length.

Background & Motivation¶

Importance of global periodicity: Real-world time series frequently exhibit multi-scale periodic patterns (diurnal, weekly, monthly, seasonal). Global periodic signals often carry stronger predictive information than local adjacent patterns. For instance, on the Electricity dataset, the Pearson correlation between distant global periodic segments (0.96) exceeds that of adjacent local periodic segments (0.94, 0.88).

Look-back window limitation: Existing methods (decomposition, frequency-domain, reshaping, etc.) all operate within a fixed look-back window. When the global period length greatly exceeds the input length, models are effectively blind to global patterns.

Infeasibility of brute-force window expansion: Naively enlarging the input length leads to overfitting to noise, surging computational and memory costs, and difficulty extracting useful signals from redundant information.

Limitations of prior work: Seasonal-trend decomposition methods are constrained by in-window decomposition accuracy; frequency-domain methods work well under stationary periodicity assumptions but struggle with long-period non-stationary phenomena; retrieval-augmented methods depend on the quality of similarity search and do not provide compact temporally aligned representations.

Method¶

Overall Architecture¶

GTR operates in two stages: (1) the GTR module dynamically retrieves periodic information from global temporal embeddings and fuses it with the input via 2D convolution; (2) the enriched representation is fed into a backbone model (MLP) for final prediction. The core innovation lies in maintaining a learnable global parameter matrix that covers the entire period.

Key Designs¶

1. Global Temporal Embedding

Function: Introduces a learnable parameter matrix \(\mathbf{Q} \in \mathbb{R}^{L \times N}\) (where \(L\) is the global period length and \(N\) is the number of variates), initialized to zero and automatically trained to encode the global periodic patterns of each variate.
Mechanism: Absolute time indices are used to precisely locate the input sequence's position within the global period, from which the corresponding global temporal reference is retrieved.
Design Motivation: By directly encoding the temporal structure of the entire period, the model can access global periodic information without expanding the input window.

2. Cycle Information Alignment

Function: Computes the cycle index vector \(\mathbf{i} = [(t_0 \bmod L) + \tau] \bmod L\), retrieves the corresponding segment from the global embedding, and enhances it via a linear transformation.
Mechanism: The absolute start time \(t_0\) of the input sequence is used to determine its position within the global period; the raw input and the retrieved global reference are then stacked into a \(2 \times T\) 2D representation.
Design Motivation: Establishes precise positional alignment between the input sequence and the global periodic structure, enabling the model to perceive "where in the global cycle the current moment lies."

3. Temporal Pattern Extraction (2D Convolution Fusion)

Function: Applies a 2D convolution \(\mathcal{C}(\mathbf{F}_n; \kappa=(2, 1+2\lfloor P/2 \rfloor))\) to the stacked local–global 2D representation \(\mathbf{F}_n \in \mathbb{R}^{2 \times T}\), where \(P\) is the dominant high-frequency period length.
Mechanism: The convolution kernel has height 2 (spanning both local and global scales) and a width determined by the dominant period \(P\), simultaneously capturing local–global interactions and intra-period patterns.
Design Motivation: 2D convolution naturally models interactions between two temporal scales; the residual connection \(\mathbf{z}_n = \mathbf{x}_n + \text{Dropout}(\mathbf{h}_n)\) preserves the original information.

4. Plug-and-Play Integration

Function: GTR preserves the input dimensionality, and its output can be directly fed into any backbone model, supporting end-to-end training.
Mechanism: Modular design enables seamless integration with diverse architectures such as iTransformer, PatchTST, and DLinear.
Design Motivation: Maximizes generality without requiring any modification to the host model architecture.

Loss & Training¶

Standard MSE loss with end-to-end training.
RevIN (Reversible Instance Normalization) is applied to handle distribution shift.
Adam optimizer with learning rate selected from \(\{10^{-3}, 3\times10^{-3}, 5\times10^{-4}\}\).
MLP backbone hidden dimension \(D = 512\).

Key Experimental Results¶

Main Results¶

Long-term forecasting (input \(T=96\), average over \(S \in \{96, 192, 336, 720\}\)):

Model	ETTh1 MSE	ETTm1 MSE	Electricity MSE	Solar MSE	Weather MSE	Top-2 Count
GTR (Ours)	0.439	0.367	0.166	0.194	0.239	10/16
RAFT	0.428	0.381	0.175	0.301	0.270	3/16
TQNet	0.441	0.377	0.164	0.198	0.242	7/16
CycleNet	0.457	0.379	0.168	0.210	0.243	3/16

Short-term forecasting (PEMS datasets, input \(T=96\), average over \(S \in \{12, 24, 48, 96\}\)):

Model	PEMS03	PEMS04	PEMS07	PEMS08	Top-2 Count
GTR (Ours)	0.087	0.087	0.076	0.142	8/8
TQNet	0.097	0.091	0.075	0.142	7/8
iTransformer	0.113	0.111	0.101	0.150	0/8

Ablation Study¶

Model + GTR Tech.	Electricity MSE Improvement	Weather MSE Improvement
iTransformer + GTR	Significant improvement	Significant improvement
PatchTST + GTR	Significant improvement	Significant improvement
DLinear + GTR	Significant improvement	Significant improvement

GTR consistently improves performance across different backbones as a plug-and-play module, validating its generality.

Key Findings¶

Global periodic modeling is critical: GTR surpasses CycleNet by 8.2% MSE on the Solar-Energy dataset, which exhibits strong long-term periodic patterns.
More pronounced advantage in short-term forecasting: GTR achieves Top-2 on all 8 PEMS tasks, reducing MSE by an average of 18.7% compared to iTransformer.
Cross-model generality: GTR consistently improves diverse architectures including iTransformer, PatchTST, and DLinear.
Limitation on Traffic data: Due to strong spatio-temporal dependencies and delay effects, GTR underperforms models that explicitly model inter-variate relationships (e.g., S-Mamba, SOFTS) on the Traffic dataset.

Highlights & Insights¶

Core insight: The predictive signal from global periodic patterns is stronger than that from local adjacent patterns (quantitatively validated via Pearson correlation matrices), yet is obscured by fixed-window constraints.
Simple yet effective design: Only a single learnable matrix, absolute time indices, and 2D convolution are required, with minimal parameter and computational overhead.
Plug-and-play nature endows GTR with broad practical value, enabling direct performance improvement in existing forecasting systems.
Complexity analysis is clear: total complexity is \(O(NT^2 + Nd^2 + NTd + NSd)\), linear in both the number of variates \(N\) and the forecasting horizon \(S\).

Limitations & Future Work¶

Global period length \(L\) must be specified in advance: For time series with unknown or varying period lengths, an automatic period detection mechanism is needed.
Suboptimal performance on data with strong spatial dependencies: Results on the Traffic dataset indicate that GTR does not adequately model inter-variate relationships.
Static global embeddings: Once training is complete, the global periodic patterns are fixed; adapting to concept drift or period shifts requires retraining.
Backbone model selection: Although cross-model generality is validated, the choice of backbone still influences final performance.
Analysis of non-periodic or weakly periodic time series is lacking.

CycleNet: Explicitly learns cyclic periodic structures but is constrained by the observation window — GTR overcomes this limitation via global embeddings.
TimesNet: Transforms 1D sequences into 2D tensors to model intra-period and inter-period variations — GTR's 2D convolution design is conceptually similar but directly models local–global interactions.
Retrieval-augmented forecasting (RAFT, etc.): Expands context by retrieving historically similar segments — GTR replaces explicit retrieval with compact global embeddings, achieving greater efficiency and temporal alignment.
The approach is generalizable to video understanding: recognition of periodic actions in video faces an analogous challenge of "local windows failing to capture global periodicity."

Rating¶

Novelty: ⭐⭐⭐⭐ The combination of global period embeddings and absolute time indices is concise and novel, though the overall concept is relatively intuitive.
Experimental Thoroughness: ⭐⭐⭐⭐⭐ Six datasets, both long- and short-term forecasting, cross-model ablations, and complexity analysis are thoroughly covered.
Writing Quality: ⭐⭐⭐⭐ Motivation visualization (Pearson correlation matrix) is intuitive and compelling; method description is clear.
Value: ⭐⭐⭐⭐ The plug-and-play design offers high practical value, though the contribution is primarily at the engineering level.