Breaking Silos: Adaptive Model Fusion Unlocks Better Time Series Forecasting¶

Conference: ICML 2025
arXiv: 2505.18442
Code: https://github.com/ZhiningLiu1998/TimeFuse
Area: Time Series
Keywords: Time Series Forecasting, Model Fusion, Meta-Learning, Ensemble Methods, Adaptive Weights

TL;DR¶

Proposes TimeFuse—a sample-level adaptive model fusion framework. It characterizes input time series features using meta-features and trains a learnable fuser to predict the optimal model combination weights, achieving near-universal improvements (outperforming the best single model on 95.1% of samples) across multiple forecasting benchmarks.

Background & Motivation¶

Background: Time series forecasting models continue to advance (Transformers, Mamba, MLPs, etc.), competing closely on benchmark datasets.

Limitations of Prior Work: Fine-grained sample-level analysis reveals a neglected fact—no single model is consistently optimal across all samples; even the top-ranked model ranks first on only about 23.2% of test samples; each model has unique areas of strength.

Key Challenge: The single-model paradigm wastes the complementary strengths of different models.

Goal: How to adaptively leverage the unique advantages of different models on different samples?

Key Insight: Characterize the properties of each input time series using meta-features, and train a fuser to predict the optimal model combination weights.

Core Idea: Shift from "selecting the best single model" to "finding the optimal model combination for each sample".

Method¶

Overall Architecture¶

Build a model zoo: Independently train \(k\) forecasting models.
Meta-feature extraction: Compute statistical, temporal, and spectral features for each input.
Fuser training: Learn the mapping from meta-features to model combination weights.
Inference: Extract meta-features \(\rightarrow\) predict weights \(\rightarrow\) perform weighted combination of predictions from each model.

Key Designs¶

Multi-dimensional Meta-feature Extraction:
- Function: Compute comprehensive feature descriptions for each input time series.
- Mechanism: Three categories of features—statistical features (skewness, kurtosis), temporal features (stationarity, rate of change), and spectral features (dominant frequency, spectral entropy).
- Design Motivation: These features capture the types of time series that different models excel at—e.g., TimeMixer excels at high spectral complexity, while Non-stationary Transformer excels at low stationarity.
Learnable Fuser:
- Function: Predict the combination weights \(w_1, \dots, w_k\) of the \(k\) models from meta-features.
- Mechanism: An MLP network that takes meta-features as input and outputs softmax-normalized weights.
- Design Motivation: End-to-end learning enables the fuser to automatically discover associations between meta-features and model strengths.
Cross-dataset Joint Training:
- Function: Jointly train the fuser on samples from multiple datasets.
- Mechanism: Meta-features are dataset-agnostic descriptions, enabling the fuser to generalize to unseen datasets.
- Design Motivation: Increasing training diversity improves zero-shot generalization capabilities.

Loss & Training¶

Minimize the MSE loss of the fused predictions.
Decouple fuser training from base model training.
Support arbitrary heterogeneous base models.

Key Experimental Results¶

Main Results¶

Dataset	Best Single Model MSE	TimeFuse MSE	Improved Sample Ratio
ETTh1	0.376	0.358	89.2%
Weather	0.151	0.142	92.4%
Traffic	0.360	0.344	95.1%

Ablation Study¶

Configuration	MSE	Description
Uniform Weight Ensemble	0.368	Non-adaptive
Statistical Features Only	0.362	Lacks spectral information
All Meta-features	0.358	Optimal
Single-dataset Training	0.364	Poor generalization
Cross-dataset Training	0.358	Good generalization

Key Findings¶

Outperforms the best single model on up to 95.1% of samples—achieving near-universal improvement.
Interpretable fuser weights: high spectral complexity \(\rightarrow\) more weight to TimeMixer; low stationarity \(\rightarrow\) more weight to Non-stationary Transformer.
Remains effective for zero-shot generalization to unseen datasets.

Highlights & Insights¶

The fine-grained discovery of "no one-size-fits-all model" is highly convincing, backed by thorough data analysis.
Utilizing meta-features as a bridge to enable cross-dataset transfer of the fuser is a key design choice.
The framework is highly versatile—any new model can be directly incorporated into the model zoo.

Limitations & Future Work¶

Maintaining and inferring multiple base models results in linearly growing computational overhead.
Meta-feature design is currently manual; automated feature learning could be more effective.
Correlations between models are not considered—which may lead to redundancy.

vs. Traditional Ensembles (bagging/boosting): Static combinations, non-adaptive.
vs. Model Selection: Selects only a single model, discarding info from other models.
Provides insights for AutoML and model selection research.

Rating¶

Novelty: ⭐⭐⭐⭐ Novel perspective with sample-level adaptive fusion
Experimental Thoroughness: ⭐⭐⭐⭐⭐ Extremely thorough analysis with 14 models × 7 datasets
Writing Quality: ⭐⭐⭐⭐⭐ Excellent visualization and analysis
Value: ⭐⭐⭐⭐⭐ Practical and general forecasting improvement framework