CometNet: Contextual Motif-guided Long-term Time Series Forecasting¶

Conference: AAAI 2026
arXiv: 2511.08049
Code: None
Area: Time Series Forecasting
Keywords: Long-term time series forecasting, contextual motif, mixture of experts, receptive field bottleneck, frequency-domain analysis

TL;DR¶

This paper proposes CometNet, which extracts recurrently occurring "contextual motifs" from the full historical sequence to construct a motif library, and employs a motif-guided MoE architecture to dynamically associate the current window with relevant motifs for prediction. This approach breaks the receptive field bottleneck imposed by limited look-back windows and achieves significant improvements over state-of-the-art methods such as TimeMixer++ and iTransformer on 8 datasets.

Background & Motivation¶

Background: Long-term time series forecasting (LTSF) is a core task in data science. Mainstream approaches include Transformer-based models (PatchTST, iTransformer) and MLP-based models (DLinear, TimeMixer++), all of which operate within a fixed look-back window.

Limitations of Prior Work: Receptive field bottleneck — models can only learn from a window of length \(L\) and cannot capture long-range dependencies beyond the window. Gradient backpropagation is confined to a single window, even when sliding windows traverse the entire sequence during training. Simply enlarging the window not only incurs \(O(L^2)\) computational complexity but also buries meaningful temporal dependencies in historical noise.

Key Challenge: Long-range context is necessary for long-term forecasting, yet directly enlarging the window is both costly and yields diminishing returns.

Goal: To provide the model with long-range contextual information beyond the look-back window without increasing window size.

Key Insight: Real-world time series are governed by periodic "contextual motifs" — such as factory production cycles or seasonal climate patterns — that recur across thousands of time steps. Extracting these motifs and using them to guide prediction is a natural and principled approach.

Core Idea: Mine recurrent contextual motifs from the full history to build a motif library; at inference time, dynamically match the current window to the most relevant motifs via MoE routing to inject long-range context.

Method¶

Overall Architecture¶

A two-stage paradigm: 1. Contextual Motif Extraction (offline): Analyze the entire historical sequence and construct a dominant motif library \(\mathcal{M} = \{m_1, ..., m_K\}\). 2. Motif-guided Forecasting (online): Given a look-back window, route to the expert network associated with the most relevant motif via MoE routing to produce predictions.

Key Designs¶

Cascaded Motif Extraction:
- Function: Automatically discover multi-scale dominant contextual motifs from historical sequences.
- Mechanism: A three-step cascade — (a) Multi-scale candidate discovery: FFT extracts dominant frequencies → top-\(N_s\) periods are selected as scales → at each scale, downsampled subsequences are clustered using anchor points (randomly sampled subsequences serve as anchors; a Pearson correlation matrix computes density scores) to yield candidate motifs; (b) Cross-scale redundancy removal: candidate motifs form a DTW similarity graph → the most representative motif within each connected component is selected as the prototype; (c) Benefit-driven selection: each candidate is evaluated by \(B(c|\mathcal{S}) = Q(c) \cdot Cov(c|\mathcal{S}) \cdot Div(c|\mathcal{S})\), and the top-\(K\) candidates are iteratively selected to form the final library.
- Design Motivation: Direct multi-scale search produces an exponentially large candidate space with severe cross-scale redundancy. The cascaded strategy — discover, deduplicate, then refine — balances comprehensiveness with efficiency.
Motif-driven Gating Network:
- Function: Dynamically associate the current window with the most relevant motif in the library.
- Mechanism: The window embedding \(e_t = \text{LN}(\text{MLP}(X_{t-L+1:t}))\) is processed by two heads — a routing head that produces a \(K\)-dimensional softmax probability \(p_t\) (selecting which motif/expert to use), and a position head that produces \(s_t \in [0,1]\) (the relative position of the current window within the motif's life cycle).
- Design Motivation: It is necessary not only to identify "which motif matches" but also to determine "at which stage within the motif" — positional information provides fine-grained temporal context.
Context-conditioned Experts:
- Function: \(K\) experts each correspond to one motif and generate predictions based on motif-specific temporal dynamics.
- Mechanism: The positional encoding \(e_{pos} = \text{MLP}(s_t)\) is concatenated with the window embedding and fused to obtain a conditioned representation \(z_t\). \(K\) parallel expert prediction heads each output \(\hat{X}_{k}\), and the final prediction is \(\hat{X} = \sum_k p_{t,k} \cdot P_k(z_t)\).
- Design Motivation: Different motifs represent distinct temporal dynamics (e.g., weekday/weekend cycles, seasonal cycles). Specialized experts capture the prediction logic of each pattern more precisely than a unified model.

Loss & Training¶

Standard MSE loss.
Channel-independent strategy for multivariate time series.

Key Experimental Results¶

Main Results¶

Average MSE across 8 datasets (look-back 96, average over prediction horizons 96/192/336/720):

Model	ETTh1 Avg	ETTh2 Avg	ETTm1 Avg	ETTm2 Avg
TimeMixer++ (2025)	0.419	0.356	0.351	-
iTransformer (2024)	0.454	0.383	0.360	-
PatchTST (2023)	0.516	-	-	-
CometNet (Ours)	0.373	0.284	0.324	-

On ETTh1 with look-back 96 and prediction horizon 720: MSE 0.391 (vs. TimeMixer++ 0.467), a 16.3% improvement.

Ablation Study¶

Configuration	ETTh1 MSE	Note
w/o Motif (MLP only)	~0.43	Baseline without contextual guidance
w/o positional encoding	~0.40	Loss of within-motif position information
w/o cross-scale deduplication	~0.39	Redundant motifs degrade library quality
Full CometNet	0.373	Complete model

Key Findings¶

The advantage of CometNet grows with longer prediction horizons — the gain is most pronounced at 720 steps, confirming that motif context is critical for long-term forecasting.
Positional encoding contributes substantially: knowing "at which stage within a motif" is more informative than knowing "which motif" alone.
Even with a short look-back window (96 steps), CometNet leverages context spanning thousands of steps via the motif library.

Highlights & Insights¶

The design of motif mining as a preprocessing step is elegant: rather than modifying the training window size, long-range context is "injected" into a limited-window model through offline construction of a motif library, incurring zero additional online computational overhead.
The dual-head gating (routing + position) adds a positional dimension beyond standard MoE, enabling more precise expert predictions — analogous to "telling the model not only the season but also the day within that season."
The cascaded extraction pipeline (FFT → clustering → graph-based deduplication → benefit-driven selection) is engineering-intensive but demonstrably effective.

Limitations & Future Work¶

Motif extraction is entirely offline and relies on FFT, which may yield low-quality motifs for non-stationary sequences with abrupt trends.
The motif library size \(K\) is a hyperparameter that may require tuning across different datasets.
The channel-independent strategy ignores inter-variable correlations, which may limit performance in high-dimensional multivariate settings.
The softmax routing at inference time is a soft selection in which all experts participate in computation; efficiency could be further improved with top-1 routing.

vs. TimeMixer++: Performs multi-scale mixing within the window but remains constrained by window size. CometNet transcends this limitation via motifs, achieving an average ETTh1 MSE of 0.373 vs. 0.419.
vs. BSA (2024): BSA enhances long-range modeling via cross-sample spectral attention but still struggles with dependencies spanning thousands of steps. CometNet's motifs directly encode patterns at the thousand-step scale.
vs. PatchTST: PatchTST's patches remain within the window boundary, whereas CometNet's motifs span across window boundaries.

Rating¶

Novelty: ⭐⭐⭐⭐⭐ — A new paradigm of motif-guided forecasting that fundamentally addresses the receptive field bottleneck.
Experimental Thoroughness: ⭐⭐⭐⭐ — 8 datasets, multiple prediction horizons, and comprehensive ablation studies.
Writing Quality: ⭐⭐⭐⭐ — Problem formulation is clear; the motif concept is well illustrated with figures.
Value: ⭐⭐⭐⭐⭐ — Introduces a new direction for long-term time series forecasting; the motif library + MoE framework exhibits strong generalizability.