Mask the Redundancy: Evolving Masking Representation Learning for Multivariate Time-Series Clustering¶
Conference: AAAI2026 arXiv: 2511.17008 Code: yueliangy/EMTC Area: Time Series Keywords: multivariate time-series clustering, masking, representation learning, contrastive learning
TL;DR¶
This paper proposes EMTC, a framework that dynamically masks redundant timestamps via Importance-aware Variate-wise Masking (IVM), combined with Multi-Endogenous Views (MEV) generation and cluster-guided contrastive learning, achieving an average F1 improvement of 4.85% across 15 MTS clustering benchmarks.
Background & Motivation¶
State of the Field¶
Background: Multivariate time-series (MTS) clustering aims to discover intrinsic grouping patterns in data. However, MTS data contains substantial redundancy (e.g., steady-state operation records, zero-output intervals), which weakens the model's focus on critical timestamps.
Issues with existing methods:
Starting Point¶
Key Insight: Autoencoder-based methods (DeTSEC, RDDC): reconstruction objectives tend to retain redundant information.
Limitations of Prior Work¶
Limitations of Prior Work: Contrastive learning methods (TimesURL, FCACC): performance is highly sensitive to the design of data augmentation strategies; misalignment with the clustering distribution amplifies redundancy.
Root Cause¶
Key Challenge: Attention mechanisms: soft weighting preserves the complete input structure and may be misled by highly activated yet uninformative patterns.
Additional Notes¶
Additional Notes: Static masking (Ti-MAE, TS-MVP): fixed masking strategies cannot dynamically adapt to the clustering task as learning progresses.
Core Insight: The masking strategy should co-evolve with the clustering objective, dynamically excluding redundant timestamps irrelevant to clustering.
Method¶
IVM: Importance-aware Variate-wise Masking¶
- Univariate view generation: An embedding \(Z^{(d)} = f_{\theta}^{(d)}(X)\) is generated independently for each variate \(d\).
- Content-aware importance estimation: An attention mechanism computes an importance score \(S_i^{(d)}\) for each timestamp.
- Redundant timestamp masking: A threshold \(\epsilon\) filters low-importance timestamps, producing a binary mask \(M_i^{(d)}(t)\); element-wise multiplication with the original input yields the masked input \(\widetilde{X}\).
- The mask is dynamically updated at each epoch as learning progresses ("evolving masking").
MEV: Multi-Endogenous Views¶
Multiple endogenous views \(F^{(v)}\) are generated from the masked input via \(V\) distinct encoders, providing complementary perspectives and preventing overfitting caused by the crisp masking of IVM.
Dual-Path Learning¶
- CRL (Consistency and Reconstruction Learning):
- Intra-view reconstruction: each view reconstructs the original MTS to preserve semantic structure.
- Inter-view reconstruction: cross-view consistency constraints enhance robustness.
- CMC (Clustering-guided MEV Contrastive Learning):
- \(k\)-means clustering is applied after fusing all view representations.
- Clustering labels are used to construct positive and negative sample pairs for contrastive learning.
- Clustering labels are dynamically updated each epoch, integrating the clustering objective into representation learning.
Total Loss¶
Key Experimental Results¶
Setup¶
- 15 UEA benchmark datasets, with sample sizes ranging from 15 to 293, sequence lengths from 30 to 17984, and dimensionality from 3 to 1345.
- 8 SOTA baselines: FEI (AAAI'25), FCACC, TimesURL (AAAI'24), UNITS (NeurIPS'24), USLA (TPAMI'23), Ti-MAE, MHCCL (AAAI'23), T-GMRF (TKDE'23).
- Metrics: ACC, F1, NMI, ARI.
Main Results¶
- EMTC achieves an average F1 improvement of 4.85% across 15 datasets (vs. the strongest baseline).
- On DuckDuckGeese, F1 reaches 0.4917 (vs. 0.3980 for the runner-up), a substantial gain.
- On Cricket, F1 reaches 0.6317, significantly outperforming the runner-up at 0.5136.
- EMTC also achieves state-of-the-art performance on challenging datasets such as FingerMovements.
Ablation Study¶
- Removing IVM leads to a significant performance drop, validating the necessity of dynamic masking.
- Removing CMC degrades clustering performance, validating the effectiveness of cluster-guided contrastive learning.
- Removing MEV results in weaker single-view performance, validating the complementary role of multi-view generation.
Highlights & Insights¶
- Novel evolving masking paradigm: The mask co-evolves with the clustering objective rather than serving as static preprocessing, representing the first exploration of learnable redundancy masking in MTS clustering.
- Complementary IVM–MEV design: MEV mitigates the information loss from crisp masking, while IVM suppresses the redundancy amplified by MEV, forming a virtuous cycle.
- Integrating clustering objectives into representation learning: Dynamic clustering labels guide contrastive learning, effectively bridging the gap between representation learning and the downstream clustering objective.
Limitations & Future Work¶
- The threshold \(\epsilon\) is a fixed hyperparameter; an adaptive threshold mechanism may further improve performance.
- Experiments are conducted exclusively on UEA standard datasets without validation in large-scale industrial settings.
- The number of clusters \(g\) must be specified in advance; strategies for automatically determining the cluster count remain unexplored.
- The MEV fusion strategy relies on simple averaging; weighted or attention-based fusion may yield better results.
Rating¶
- Novelty: ⭐⭐⭐⭐ — The combination of evolving masking and cluster-guided contrastive learning represents a novel direction in MTS clustering.
- Experimental Thoroughness: ⭐⭐⭐⭐ — Comprehensive coverage with 15 datasets, 8 baselines, and multiple ablation groups.
- Writing Quality: ⭐⭐⭐⭐ — Clear structure, intuitive figures, and well-formulated mathematical expressions.
- Value: ⭐⭐⭐⭐ — Provides an effective dynamic solution for redundancy suppression in MTS.