Skip to content

Mask the Redundancy: Evolving Masking Representation Learning for Multivariate Time-Series Clustering

Conference: AAAI2026 arXiv: 2511.17008 Code: yueliangy/EMTC Area: Time Series Keywords: multivariate time-series clustering, masking, representation learning, contrastive learning

TL;DR

This paper proposes EMTC, a framework that dynamically masks redundant timestamps via Importance-aware Variate-wise Masking (IVM), combined with Multi-Endogenous Views (MEV) generation and cluster-guided contrastive learning, achieving an average F1 improvement of 4.85% across 15 MTS clustering benchmarks.

Background & Motivation

State of the Field

Background: Multivariate time-series (MTS) clustering aims to discover intrinsic grouping patterns in data. However, MTS data contains substantial redundancy (e.g., steady-state operation records, zero-output intervals), which weakens the model's focus on critical timestamps.

Issues with existing methods:

Starting Point

Key Insight: Autoencoder-based methods (DeTSEC, RDDC): reconstruction objectives tend to retain redundant information.

Limitations of Prior Work

Limitations of Prior Work: Contrastive learning methods (TimesURL, FCACC): performance is highly sensitive to the design of data augmentation strategies; misalignment with the clustering distribution amplifies redundancy.

Root Cause

Key Challenge: Attention mechanisms: soft weighting preserves the complete input structure and may be misled by highly activated yet uninformative patterns.

Additional Notes

Additional Notes: Static masking (Ti-MAE, TS-MVP): fixed masking strategies cannot dynamically adapt to the clustering task as learning progresses.

Core Insight: The masking strategy should co-evolve with the clustering objective, dynamically excluding redundant timestamps irrelevant to clustering.

Method

IVM: Importance-aware Variate-wise Masking

  1. Univariate view generation: An embedding \(Z^{(d)} = f_{\theta}^{(d)}(X)\) is generated independently for each variate \(d\).
  2. Content-aware importance estimation: An attention mechanism computes an importance score \(S_i^{(d)}\) for each timestamp.
  3. Redundant timestamp masking: A threshold \(\epsilon\) filters low-importance timestamps, producing a binary mask \(M_i^{(d)}(t)\); element-wise multiplication with the original input yields the masked input \(\widetilde{X}\).
  4. The mask is dynamically updated at each epoch as learning progresses ("evolving masking").

MEV: Multi-Endogenous Views

Multiple endogenous views \(F^{(v)}\) are generated from the masked input via \(V\) distinct encoders, providing complementary perspectives and preventing overfitting caused by the crisp masking of IVM.

Dual-Path Learning

  • CRL (Consistency and Reconstruction Learning):
    • Intra-view reconstruction: each view reconstructs the original MTS to preserve semantic structure.
    • Inter-view reconstruction: cross-view consistency constraints enhance robustness.
  • CMC (Clustering-guided MEV Contrastive Learning):
    • \(k\)-means clustering is applied after fusing all view representations.
    • Clustering labels are used to construct positive and negative sample pairs for contrastive learning.
    • Clustering labels are dynamically updated each epoch, integrating the clustering objective into representation learning.

Total Loss

\[\mathcal{L}_{total} = \mathcal{L}_{contra} + \alpha \mathcal{L}_{intra} + \beta \mathcal{L}_{inter}\]

Key Experimental Results

Setup

  • 15 UEA benchmark datasets, with sample sizes ranging from 15 to 293, sequence lengths from 30 to 17984, and dimensionality from 3 to 1345.
  • 8 SOTA baselines: FEI (AAAI'25), FCACC, TimesURL (AAAI'24), UNITS (NeurIPS'24), USLA (TPAMI'23), Ti-MAE, MHCCL (AAAI'23), T-GMRF (TKDE'23).
  • Metrics: ACC, F1, NMI, ARI.

Main Results

  • EMTC achieves an average F1 improvement of 4.85% across 15 datasets (vs. the strongest baseline).
  • On DuckDuckGeese, F1 reaches 0.4917 (vs. 0.3980 for the runner-up), a substantial gain.
  • On Cricket, F1 reaches 0.6317, significantly outperforming the runner-up at 0.5136.
  • EMTC also achieves state-of-the-art performance on challenging datasets such as FingerMovements.

Ablation Study

  • Removing IVM leads to a significant performance drop, validating the necessity of dynamic masking.
  • Removing CMC degrades clustering performance, validating the effectiveness of cluster-guided contrastive learning.
  • Removing MEV results in weaker single-view performance, validating the complementary role of multi-view generation.

Highlights & Insights

  • Novel evolving masking paradigm: The mask co-evolves with the clustering objective rather than serving as static preprocessing, representing the first exploration of learnable redundancy masking in MTS clustering.
  • Complementary IVM–MEV design: MEV mitigates the information loss from crisp masking, while IVM suppresses the redundancy amplified by MEV, forming a virtuous cycle.
  • Integrating clustering objectives into representation learning: Dynamic clustering labels guide contrastive learning, effectively bridging the gap between representation learning and the downstream clustering objective.

Limitations & Future Work

  • The threshold \(\epsilon\) is a fixed hyperparameter; an adaptive threshold mechanism may further improve performance.
  • Experiments are conducted exclusively on UEA standard datasets without validation in large-scale industrial settings.
  • The number of clusters \(g\) must be specified in advance; strategies for automatically determining the cluster count remain unexplored.
  • The MEV fusion strategy relies on simple averaging; weighted or attention-based fusion may yield better results.

Rating

  • Novelty: ⭐⭐⭐⭐ — The combination of evolving masking and cluster-guided contrastive learning represents a novel direction in MTS clustering.
  • Experimental Thoroughness: ⭐⭐⭐⭐ — Comprehensive coverage with 15 datasets, 8 baselines, and multiple ablation groups.
  • Writing Quality: ⭐⭐⭐⭐ — Clear structure, intuitive figures, and well-formulated mathematical expressions.
  • Value: ⭐⭐⭐⭐ — Provides an effective dynamic solution for redundancy suppression in MTS.