UniShape: A Unified Shape-Aware Foundation Model for Time Series Classification¶

Conference: AAAI 2026 arXiv: 2601.06429 Code: https://github.com/qianlima-lab/UniShape Area: Others Keywords: Time Series Classification, Foundation Model, Shapelet, Multi-scale, Prototype Learning

TL;DR¶

This paper proposes UniShape — the first shape-aware foundation model for time series classification (TSC). It captures class-discriminative temporal patterns via a shape-aware adapter that adaptively aggregates multi-scale subsequences (shapes), and jointly learns transferable shapelet representations at both instance and shape levels through a prototype-based pretraining module. Pretrained on 1.89M samples, UniShape achieves an average accuracy of 0.8708 across 128 UCR datasets, surpassing all baselines.

Background & Motivation¶

Existing time series foundation models are primarily designed for forecasting tasks, which differ fundamentally from classification: forecasting focuses on the continuous extrapolation of trends and seasonality, whereas classification requires identifying discriminative local patterns (shapelets) within fixed-length samples. Forecasting-oriented foundation models therefore perform poorly when directly applied to TSC. Meanwhile, most existing TSC methods are trained on small-scale, single-domain datasets with limited cross-domain generalizability. Furthermore, shapelets — the most interpretable features for classification — exhibit multi-scale characteristics (discriminative subsequences may appear at varying lengths and positions), a property that has not been effectively modeled in prior foundation models.

Method¶

Overall Architecture¶

UniShape follows a pretrain-then-finetune paradigm: (1) a Shape-Aware Adapter encodes variable-length subsequences into shape tokens and aggregates them into class tokens via attention pooling; (2) a prototype pretraining module performs contrastive learning jointly at the instance and shape levels; (3) during finetuning, the class token is passed through a classification head to produce predictions.

Key Designs¶

Shape-Aware Adapter: Multi-scale subsequences are extracted from the input time series using \(Q\) sliding windows of different scales (\(W_q \in \{64, 32, 16, 8, 4\}\)). Each subsequence is normalized and encoded into a shape token via a 1D CNN, then adaptively aggregated into a class token through attention pooling. A coarse-to-fine hierarchical fusion strategy is adopted, where the class token from the previous scale is prepended to the token sequence of the next scale, enabling cross-scale information transfer.
Prototype Pretraining Module: A set of learnable prototype vectors, one per class, is maintained and updated dynamically via exponential moving average. Instance-level contrastive learning (class token ↔ class prototype) captures global discriminative features, while shape-level contrastive learning (high-confidence shape tokens ↔ class prototype) models local discriminative patterns. Pseudo-labels are assigned to unlabeled samples using the nearest prototype.
Multi-scale Interpretability: The attention pooling weights \(\alpha\) directly reflect the discriminative importance of each shape, providing shapelet-level interpretability. On the ECGFiveDays dataset, the model correctly highlights the delayed T-wave interval; on GunPoint, it localizes the motion overshoot interval.

Loss & Training¶

Pretraining loss = Prototype contrastive loss (instance-level + shape-level) + MoCo v3 self-supervised contrastive loss
Shape-level loss weight \(\lambda = 0.01\); temperature \(\tau\) controls the sharpness of contrastive learning
Pretraining with only 10% labeled data achieves performance statistically comparable to full supervision
Pretraining: 30 epochs, batch size 2048; Finetuning: 300 epochs, cross-entropy + shape contrastive auxiliary loss (\(\mu = 0.01\))

Key Experimental Results¶

Main Results (128 UCR Datasets, Fully Supervised)¶

Method	Type	Params	Avg. Accuracy	Avg. Rank
UniShape	FM	3.1M	0.8708	2.71
Mantis	FM	8.7M	0.8441	5.21
NuTime	FM	2.4M	0.8353	6.68
MR-H	NDL	-	0.8621	3.97
SoftShape	DS	472K	0.8388	5.89
MOMENT	FM	341M	0.7020	12.10

Zero-shot Feature Extraction (30 Additional Datasets)¶

Method	Avg. Accuracy	Avg. Rank
UniShape	0.7262	3.07
Mantis	0.7052	3.67
NuTime	0.6917	3.53
RandomForest	0.6930	3.77

Ablation Study¶

Performance consistently improves with larger pretraining data scale (UCR 60K → ALL 1.89M)
The difference between 10% and 100% labeled pretraining data is statistically insignificant (\(P = 0.20\)), indicating that a small label budget suffices
Both the shape-aware adapter and the prototype pretraining module contribute independently; removing either component leads to significant performance degradation

Key Findings¶

Forecasting-oriented foundation models (GPT4TS, MOMENT, UniTS) substantially underperform non-deep-learning methods on TSC, demonstrating the critical importance of task-specific design
UniShape with only 3.1M parameters surpasses MOMENT with 341M parameters, exhibiting exceptional parameter efficiency
Interpretability analysis shows that attention weights align closely with shapelet intervals identified by domain experts

Highlights & Insights¶

This is the first work to explicitly identify the unsuitability of forecasting-oriented foundation models for classification and to provide a targeted solution
The multi-scale design of the shape-aware adapter is elegant and computationally efficient, with shared parameters handling all scales
Prototype learning captures class structure with very few labels, making it particularly valuable for semi-supervised and few-shot scenarios
Attention weights serve as an interpretability mechanism for shapelets, offering practical utility in domains such as medical time series analysis

Limitations & Future Work¶

Only univariate time series classification is addressed; multivariate settings require additional design considerations
The fixed five scales (\(4\)–\(64\)) may not be optimal for shapelet lengths in all domains
Pretraining sequences are uniformly interpolated to length 512, potentially discarding information from very long sequences
Zero-shot accuracy still has room for improvement (0.73 vs. 0.87 under full supervision)

The evolution of shapelet learning — from exhaustive search to gradient-based optimization to foundation model pretraining — represents a trajectory worth following
Prototype contrastive learning can be generalized to other domains requiring class-aware pretraining, such as few-shot image classification
The multi-scale attention pooling design is transferable to time series forecasting foundation models
The momentum contrastive learning framework of MoCo v3 proves effective on time series data as well
Non-deep-learning methods such as Rocket/MiniRocket remain extremely strong baselines (0.85+); foundation models must demonstrate clear improvements to justify their adoption

Pretraining Data Construction¶

Source	# Samples	Notes
UCR Archive	~60K	128 univariate classification datasets
UEA Archive	~1.39M	Multivariate → channel-independent splitting
Additional Data	~0.44M	8 commonly used time series datasets
Total	1.89M	Uniformly interpolated to length 512

Rating¶

Dimension	Score (1–5)	Notes
Novelty	4	First shape-aware foundation model for TSC
Technical Depth	4	Multi-scale adapter + prototype learning, elegantly designed
Experimental Thoroughness	5	158 datasets, 16 baselines, comprehensive ablation
Writing Quality	4	Clear motivation, fluent method presentation
Value	4	Parameter-efficient, interpretable, cross-domain transferable