iTimER: Reconstruction Error-Guided Irregularly Sampled Time Series Representation Learning¶

Conference: AAAI 2026 arXiv: 2511.06854 Code: N/A Area: Time Series / Self-Supervised Learning Keywords: Irregularly sampled time series, reconstruction error, self-supervised pretraining, Wasserstein alignment, pseudo-observations

TL;DR¶

This paper proposes iTimER, which leverages the model's own reconstruction error distribution as a learning signal. By estimating the error distribution from observed points and sampling from it to generate pseudo-observations at unobserved timestamps, the method aligns the error distributions of observed and pseudo-observed regions via Wasserstein distance combined with contrastive learning, achieving state-of-the-art performance on classification, interpolation, and forecasting tasks for irregularly sampled time series.

Background & Motivation¶

Background: Irregularly sampled time series (ISTS) are ubiquitous in domains such as healthcare and meteorology, characterized by asynchronous sampling across variables, non-uniform time intervals, and extensive natural missingness. Existing approaches either impute first and then learn, or perform end-to-end modeling.

Limitations of Prior Work: (a) Imputation-based methods are unreliable under high missing rates and may introduce noise bias; (b) end-to-end methods derive learning signals only from observed values, leaving model behavior in unobserved regions unconstrained; (c) self-supervised masked reconstruction methods treat reconstruction error merely as a loss term, ignoring the uncertainty information it encodes.

Key Challenge: Unobserved timestamps lack ground-truth values, making direct supervision infeasible—yet the model's reconstruction error itself encodes the model's understanding of data structure and can serve as a proxy signal.

Key Insight: Reconstruction error is not merely a loss term but an information source reflecting model uncertainty and inductive preference. Its distribution can be propagated to unobserved regions to generate pseudo-observations.

Core Idea: Sample from the reconstruction error distribution estimated at observed points, mix with the nearest observed values via mixup to generate pseudo-observations, and enforce consistency between the error distributions of observed and pseudo-observed regions using Wasserstein distance.

Method¶

Overall Architecture¶

Encoder–decoder self-supervised pretraining proceeds as follows: 1. Reconstruct observed values → estimate a Gaussian distribution \(\mathcal{N}(\mu_\epsilon, \sigma_\epsilon^2)\) of reconstruction errors. 2. Sample from the error distribution and apply mixup with the nearest observed value → generate pseudo-observations at unobserved timestamps. 3. Encode and reconstruct the pseudo-observation sequence → align the two error distributions via Wasserstein distance. 4. Apply contrastive learning to enhance representational discriminability.

Key Designs¶

Reconstruction Error Distribution Modeling and Pseudo-Observation Generation:
- Function: Transforms reconstruction error from "a loss to be minimized" into "an exploitable learning signal."
- Mechanism: For observed points \(m_t=1\), compute \(\epsilon_t = x_t - \hat{x}_t\) and estimate \(\mu_\epsilon, \sigma_\epsilon\) under a Gaussian assumption with momentum update \(\rho\). For unobserved points \(m_t=0\): \(\tilde{x}_t = \alpha_t \cdot \bar{x} + (1-\alpha_t) \cdot \tilde{\epsilon}_t\), where \(\tilde{\epsilon}_t \sim \mathcal{N}(\mu_\epsilon^h, (\sigma_\epsilon^h)^2)\).
- Design Motivation: Pseudo-observations preserve temporal continuity (via mixup with nearest observations) while incorporating noise-aware uncertainty (via error sampling), making them more principled than naive imputation or random noise injection.
Wasserstein Error Distribution Alignment:
- Function: Ensures consistent model behavior across observed and pseudo-observed regions.
- Mechanism: \(L_W = \|\mu_\epsilon - \mu_p\|^2 + \|\sigma_\epsilon - \sigma_p\|^2\) (closed-form 2-Wasserstein distance under Gaussian distributions).
- Design Motivation: If the reconstruction error distribution over pseudo-observed regions resembles that over observed regions, the pseudo-observations have not introduced structural bias.
Contrastive Learning + Dual Reconstruction Loss:
- Function: Enhances representational discriminability and robustness.
- Total loss: \(L = \alpha L_W + \beta L_{contrast} + \frac{1}{2}(L_{orig\_rec} + L_{pseudo\_rec})\).

Key Experimental Results¶

Main Results¶

Classification tasks (P12, P19 medical datasets; PAM activity recognition):

Method	P12 AUROC	P19 AUROC	PAM Accuracy
Warpformer	83.4	88.8	94.3
mTAND	84.2	84.4	92.9
Raindrop	82.8	87.0	88.5
iTimER	85.1+	89.2+	95.0+

iTimER also achieves comprehensive improvements on interpolation and forecasting tasks.

Key Findings¶

Effectiveness of reconstruction error as a learning signal: removing pseudo-observation generation leads to a notable performance drop.
Wasserstein alignment is critical: it ensures pseudo-observations do not introduce distributional bias.
Task-agnostic pretraining: a single pretrained model transfers to classification, interpolation, and forecasting downstream tasks.

Highlights & Insights¶

The insight of "reconstruction error as signal" is particularly profound: extracting useful information from the model's own imperfections is a principle transferable to any self-supervised learning scenario.
Mixup-based pseudo-observation generation is more physically motivated than direct imputation or noise injection.
The Wasserstein distance is computed efficiently via the closed-form solution under Gaussian assumptions.

Limitations & Future Work¶

The Gaussian assumption may not hold for all error distributions.
Using only the nearest observed value as the anchor may be insufficient for long-gap missing regions.

Rating¶

Novelty: ⭐⭐⭐⭐⭐ The core insight of treating reconstruction error as a learning signal is highly original.
Experimental Thoroughness: ⭐⭐⭐⭐ Validated across three task types and multiple datasets.
Writing Quality: ⭐⭐⭐⭐ Architecture diagrams are clear and the motivation is well-argued.
Value: ⭐⭐⭐⭐ Establishes a new paradigm for irregularly sampled time series representation learning.