Mask the Redundancy: Evolving Masking Representation Learning for Multivariate Time-Series Clustering¶

Conference: AAAI2026 arXiv: 2511.17008 Code: yueliangy/EMTC Area: Time Series Keywords: multivariate time-series clustering, masking, representation learning, contrastive learning

TL;DR¶

This paper proposes EMTC, a framework that dynamically masks redundant timestamps via Importance-aware Variate-wise Masking (IVM), combined with Multi-Endogenous Views (MEV) generation and cluster-guided contrastive learning, achieving an average F1 improvement of 4.85% across 15 MTS clustering benchmarks.

Background & Motivation¶

State of the Field¶

Background: Multivariate time-series (MTS) clustering aims to discover intrinsic grouping patterns in data. However, MTS data contains substantial redundancy (e.g., steady-state operation records, zero-output intervals), which weakens the model's focus on critical timestamps.

Issues with existing methods:

Starting Point¶

Key Insight: Autoencoder-based methods (DeTSEC, RDDC): reconstruction objectives tend to retain redundant information.

Limitations of Prior Work¶

Limitations of Prior Work: Contrastive learning methods (TimesURL, FCACC): performance is highly sensitive to the design of data augmentation strategies; misalignment with the clustering distribution amplifies redundancy.

Root Cause¶

Key Challenge: Attention mechanisms: soft weighting preserves the complete input structure and may be misled by highly activated yet uninformative patterns.

Additional Notes¶

Additional Notes: Static masking (Ti-MAE, TS-MVP): fixed masking strategies cannot dynamically adapt to the clustering task as learning progresses.

Core Insight: The masking strategy should co-evolve with the clustering objective, dynamically excluding redundant timestamps irrelevant to clustering.

Method¶

IVM: Importance-aware Variate-wise Masking¶

Univariate view generation: An embedding \(Z^{(d)} = f_{\theta}^{(d)}(X)\) is generated independently for each variate \(d\).
Content-aware importance estimation: An attention mechanism computes an importance score \(S_i^{(d)}\) for each timestamp.
Redundant timestamp masking: A threshold \(\epsilon\) filters low-importance timestamps, producing a binary mask \(M_i^{(d)}(t)\); element-wise multiplication with the original input yields the masked input \(\widetilde{X}\).
The mask is dynamically updated at each epoch as learning progresses ("evolving masking").

MEV: Multi-Endogenous Views¶

Multiple endogenous views \(F^{(v)}\) are generated from the masked input via \(V\) distinct encoders, providing complementary perspectives and preventing overfitting caused by the crisp masking of IVM.

Dual-Path Learning¶

CRL (Consistency and Reconstruction Learning):
- Intra-view reconstruction: each view reconstructs the original MTS to preserve semantic structure.
- Inter-view reconstruction: cross-view consistency constraints enhance robustness.
CMC (Clustering-guided MEV Contrastive Learning):
- \(k\)-means clustering is applied after fusing all view representations.
- Clustering labels are used to construct positive and negative sample pairs for contrastive learning.
- Clustering labels are dynamically updated each epoch, integrating the clustering objective into representation learning.

Total Loss¶

\[\mathcal{L}_{total} = \mathcal{L}_{contra} + \alpha \mathcal{L}_{intra} + \beta \mathcal{L}_{inter}\]

Key Experimental Results¶

Setup¶

15 UEA benchmark datasets, with sample sizes ranging from 15 to 293, sequence lengths from 30 to 17984, and dimensionality from 3 to 1345.
8 SOTA baselines: FEI (AAAI'25), FCACC, TimesURL (AAAI'24), UNITS (NeurIPS'24), USLA (TPAMI'23), Ti-MAE, MHCCL (AAAI'23), T-GMRF (TKDE'23).
Metrics: ACC, F1, NMI, ARI.

Main Results¶

EMTC achieves an average F1 improvement of 4.85% across 15 datasets (vs. the strongest baseline).
On DuckDuckGeese, F1 reaches 0.4917 (vs. 0.3980 for the runner-up), a substantial gain.
On Cricket, F1 reaches 0.6317, significantly outperforming the runner-up at 0.5136.
EMTC also achieves state-of-the-art performance on challenging datasets such as FingerMovements.

Ablation Study¶

Removing IVM leads to a significant performance drop, validating the necessity of dynamic masking.
Removing CMC degrades clustering performance, validating the effectiveness of cluster-guided contrastive learning.
Removing MEV results in weaker single-view performance, validating the complementary role of multi-view generation.

Highlights & Insights¶

Novel evolving masking paradigm: The mask co-evolves with the clustering objective rather than serving as static preprocessing, representing the first exploration of learnable redundancy masking in MTS clustering.
Complementary IVM–MEV design: MEV mitigates the information loss from crisp masking, while IVM suppresses the redundancy amplified by MEV, forming a virtuous cycle.
Integrating clustering objectives into representation learning: Dynamic clustering labels guide contrastive learning, effectively bridging the gap between representation learning and the downstream clustering objective.

Limitations & Future Work¶

The threshold \(\epsilon\) is a fixed hyperparameter; an adaptive threshold mechanism may further improve performance.
Experiments are conducted exclusively on UEA standard datasets without validation in large-scale industrial settings.
The number of clusters \(g\) must be specified in advance; strategies for automatically determining the cluster count remain unexplored.
The MEV fusion strategy relies on simple averaging; weighted or attention-based fusion may yield better results.

Rating¶

Novelty: ⭐⭐⭐⭐ — The combination of evolving masking and cluster-guided contrastive learning represents a novel direction in MTS clustering.
Experimental Thoroughness: ⭐⭐⭐⭐ — Comprehensive coverage with 15 datasets, 8 baselines, and multiple ablation groups.
Writing Quality: ⭐⭐⭐⭐ — Clear structure, intuitive figures, and well-formulated mathematical expressions.
Value: ⭐⭐⭐⭐ — Provides an effective dynamic solution for redundancy suppression in MTS.