Skip to content

Breaking Silos: Adaptive Model Fusion Unlocks Better Time Series Forecasting

Conference: ICML 2025
arXiv: 2505.18442
Code: https://github.com/ZhiningLiu1998/TimeFuse
Area: Time Series
Keywords: Time Series Forecasting, Model Fusion, Meta-Learning, Ensemble Methods, Adaptive Weights

TL;DR

Proposes TimeFuse—a sample-level adaptive model fusion framework. It characterizes input time series features using meta-features and trains a learnable fuser to predict the optimal model combination weights, achieving near-universal improvements (outperforming the best single model on 95.1% of samples) across multiple forecasting benchmarks.

Background & Motivation

Background: Time series forecasting models continue to advance (Transformers, Mamba, MLPs, etc.), competing closely on benchmark datasets.

Limitations of Prior Work: Fine-grained sample-level analysis reveals a neglected fact—no single model is consistently optimal across all samples; even the top-ranked model ranks first on only about 23.2% of test samples; each model has unique areas of strength.

Key Challenge: The single-model paradigm wastes the complementary strengths of different models.

Goal: How to adaptively leverage the unique advantages of different models on different samples?

Key Insight: Characterize the properties of each input time series using meta-features, and train a fuser to predict the optimal model combination weights.

Core Idea: Shift from "selecting the best single model" to "finding the optimal model combination for each sample".

Method

Overall Architecture

  1. Build a model zoo: Independently train \(k\) forecasting models.
  2. Meta-feature extraction: Compute statistical, temporal, and spectral features for each input.
  3. Fuser training: Learn the mapping from meta-features to model combination weights.
  4. Inference: Extract meta-features \(\rightarrow\) predict weights \(\rightarrow\) perform weighted combination of predictions from each model.

Key Designs

  1. Multi-dimensional Meta-feature Extraction:

    • Function: Compute comprehensive feature descriptions for each input time series.
    • Mechanism: Three categories of features—statistical features (skewness, kurtosis), temporal features (stationarity, rate of change), and spectral features (dominant frequency, spectral entropy).
    • Design Motivation: These features capture the types of time series that different models excel at—e.g., TimeMixer excels at high spectral complexity, while Non-stationary Transformer excels at low stationarity.
  2. Learnable Fuser:

    • Function: Predict the combination weights \(w_1, \dots, w_k\) of the \(k\) models from meta-features.
    • Mechanism: An MLP network that takes meta-features as input and outputs softmax-normalized weights.
    • Design Motivation: End-to-end learning enables the fuser to automatically discover associations between meta-features and model strengths.
  3. Cross-dataset Joint Training:

    • Function: Jointly train the fuser on samples from multiple datasets.
    • Mechanism: Meta-features are dataset-agnostic descriptions, enabling the fuser to generalize to unseen datasets.
    • Design Motivation: Increasing training diversity improves zero-shot generalization capabilities.

Loss & Training

  • Minimize the MSE loss of the fused predictions.
  • Decouple fuser training from base model training.
  • Support arbitrary heterogeneous base models.

Key Experimental Results

Main Results

Dataset Best Single Model MSE TimeFuse MSE Improved Sample Ratio
ETTh1 0.376 0.358 89.2%
Weather 0.151 0.142 92.4%
Traffic 0.360 0.344 95.1%

Ablation Study

Configuration MSE Description
Uniform Weight Ensemble 0.368 Non-adaptive
Statistical Features Only 0.362 Lacks spectral information
All Meta-features 0.358 Optimal
Single-dataset Training 0.364 Poor generalization
Cross-dataset Training 0.358 Good generalization

Key Findings

  • Outperforms the best single model on up to 95.1% of samples—achieving near-universal improvement.
  • Interpretable fuser weights: high spectral complexity \(\rightarrow\) more weight to TimeMixer; low stationarity \(\rightarrow\) more weight to Non-stationary Transformer.
  • Remains effective for zero-shot generalization to unseen datasets.

Highlights & Insights

  • The fine-grained discovery of "no one-size-fits-all model" is highly convincing, backed by thorough data analysis.
  • Utilizing meta-features as a bridge to enable cross-dataset transfer of the fuser is a key design choice.
  • The framework is highly versatile—any new model can be directly incorporated into the model zoo.

Limitations & Future Work

  • Maintaining and inferring multiple base models results in linearly growing computational overhead.
  • Meta-feature design is currently manual; automated feature learning could be more effective.
  • Correlations between models are not considered—which may lead to redundancy.
  • vs. Traditional Ensembles (bagging/boosting): Static combinations, non-adaptive.
  • vs. Model Selection: Selects only a single model, discarding info from other models.
  • Provides insights for AutoML and model selection research.

Rating

  • Novelty: ⭐⭐⭐⭐ Novel perspective with sample-level adaptive fusion
  • Experimental Thoroughness: ⭐⭐⭐⭐⭐ Extremely thorough analysis with 14 models × 7 datasets
  • Writing Quality: ⭐⭐⭐⭐⭐ Excellent visualization and analysis
  • Value: ⭐⭐⭐⭐⭐ Practical and general forecasting improvement framework