iTimER: Reconstruction Error-Guided Irregularly Sampled Time Series Representation Learning¶
Conference: AAAI 2026 arXiv: 2511.06854 Code: N/A Area: Time Series / Self-Supervised Learning Keywords: Irregularly sampled time series, reconstruction error, self-supervised pretraining, Wasserstein alignment, pseudo-observations
TL;DR¶
This paper proposes iTimER, which leverages the model's own reconstruction error distribution as a learning signal. By estimating the error distribution from observed points and sampling from it to generate pseudo-observations at unobserved timestamps, the method aligns the error distributions of observed and pseudo-observed regions via Wasserstein distance combined with contrastive learning, achieving state-of-the-art performance on classification, interpolation, and forecasting tasks for irregularly sampled time series.
Background & Motivation¶
Background: Irregularly sampled time series (ISTS) are ubiquitous in domains such as healthcare and meteorology, characterized by asynchronous sampling across variables, non-uniform time intervals, and extensive natural missingness. Existing approaches either impute first and then learn, or perform end-to-end modeling.
Limitations of Prior Work: (a) Imputation-based methods are unreliable under high missing rates and may introduce noise bias; (b) end-to-end methods derive learning signals only from observed values, leaving model behavior in unobserved regions unconstrained; (c) self-supervised masked reconstruction methods treat reconstruction error merely as a loss term, ignoring the uncertainty information it encodes.
Key Challenge: Unobserved timestamps lack ground-truth values, making direct supervision infeasible—yet the model's reconstruction error itself encodes the model's understanding of data structure and can serve as a proxy signal.
Key Insight: Reconstruction error is not merely a loss term but an information source reflecting model uncertainty and inductive preference. Its distribution can be propagated to unobserved regions to generate pseudo-observations.
Core Idea: Sample from the reconstruction error distribution estimated at observed points, mix with the nearest observed values via mixup to generate pseudo-observations, and enforce consistency between the error distributions of observed and pseudo-observed regions using Wasserstein distance.
Method¶
Overall Architecture¶
Encoder–decoder self-supervised pretraining proceeds as follows: 1. Reconstruct observed values → estimate a Gaussian distribution \(\mathcal{N}(\mu_\epsilon, \sigma_\epsilon^2)\) of reconstruction errors. 2. Sample from the error distribution and apply mixup with the nearest observed value → generate pseudo-observations at unobserved timestamps. 3. Encode and reconstruct the pseudo-observation sequence → align the two error distributions via Wasserstein distance. 4. Apply contrastive learning to enhance representational discriminability.
Key Designs¶
-
Reconstruction Error Distribution Modeling and Pseudo-Observation Generation:
- Function: Transforms reconstruction error from "a loss to be minimized" into "an exploitable learning signal."
- Mechanism: For observed points \(m_t=1\), compute \(\epsilon_t = x_t - \hat{x}_t\) and estimate \(\mu_\epsilon, \sigma_\epsilon\) under a Gaussian assumption with momentum update \(\rho\). For unobserved points \(m_t=0\): \(\tilde{x}_t = \alpha_t \cdot \bar{x} + (1-\alpha_t) \cdot \tilde{\epsilon}_t\), where \(\tilde{\epsilon}_t \sim \mathcal{N}(\mu_\epsilon^h, (\sigma_\epsilon^h)^2)\).
- Design Motivation: Pseudo-observations preserve temporal continuity (via mixup with nearest observations) while incorporating noise-aware uncertainty (via error sampling), making them more principled than naive imputation or random noise injection.
-
Wasserstein Error Distribution Alignment:
- Function: Ensures consistent model behavior across observed and pseudo-observed regions.
- Mechanism: \(L_W = \|\mu_\epsilon - \mu_p\|^2 + \|\sigma_\epsilon - \sigma_p\|^2\) (closed-form 2-Wasserstein distance under Gaussian distributions).
- Design Motivation: If the reconstruction error distribution over pseudo-observed regions resembles that over observed regions, the pseudo-observations have not introduced structural bias.
-
Contrastive Learning + Dual Reconstruction Loss:
- Function: Enhances representational discriminability and robustness.
- Total loss: \(L = \alpha L_W + \beta L_{contrast} + \frac{1}{2}(L_{orig\_rec} + L_{pseudo\_rec})\).
Key Experimental Results¶
Main Results¶
Classification tasks (P12, P19 medical datasets; PAM activity recognition):
| Method | P12 AUROC | P19 AUROC | PAM Accuracy |
|---|---|---|---|
| Warpformer | 83.4 | 88.8 | 94.3 |
| mTAND | 84.2 | 84.4 | 92.9 |
| Raindrop | 82.8 | 87.0 | 88.5 |
| iTimER | 85.1+ | 89.2+ | 95.0+ |
iTimER also achieves comprehensive improvements on interpolation and forecasting tasks.
Key Findings¶
- Effectiveness of reconstruction error as a learning signal: removing pseudo-observation generation leads to a notable performance drop.
- Wasserstein alignment is critical: it ensures pseudo-observations do not introduce distributional bias.
- Task-agnostic pretraining: a single pretrained model transfers to classification, interpolation, and forecasting downstream tasks.
Highlights & Insights¶
- The insight of "reconstruction error as signal" is particularly profound: extracting useful information from the model's own imperfections is a principle transferable to any self-supervised learning scenario.
- Mixup-based pseudo-observation generation is more physically motivated than direct imputation or noise injection.
- The Wasserstein distance is computed efficiently via the closed-form solution under Gaussian assumptions.
Limitations & Future Work¶
- The Gaussian assumption may not hold for all error distributions.
- Using only the nearest observed value as the anchor may be insufficient for long-gap missing regions.
Rating¶
- Novelty: ⭐⭐⭐⭐⭐ The core insight of treating reconstruction error as a learning signal is highly original.
- Experimental Thoroughness: ⭐⭐⭐⭐ Validated across three task types and multiple datasets.
- Writing Quality: ⭐⭐⭐⭐ Architecture diagrams are clear and the motivation is well-argued.
- Value: ⭐⭐⭐⭐ Establishes a new paradigm for irregularly sampled time series representation learning.