C3RL: Rethinking the Combination of Channel-independence and Channel-mixing from Representation Learning¶

Conference: AAAI 2026 arXiv: 2507.17454 Code: https://github.com/SSMa913/NICLab-C3RL Area: Time Series Forecasting Keywords: Multivariate time series forecasting, channel independence, channel mixing, contrastive learning, SimSiam

TL;DR¶

This paper proposes C3RL, a SimSiam-based contrastive learning framework that treats channel-independence (CI) and channel-mixing (CM) strategies as two transposed views of the same data to construct positive pairs. By jointly optimizing representation learning and forecasting through a Siamese network, C3RL improves the best-performance rate of CI models from 43.6% to 81.4% and CM models from 23.8% to 76.3%.

Background & Motivation¶

State of the Field¶

Background: In multivariate time series forecasting, the CM strategy (treating all variables at each time step as a single token) excels at capturing inter-variable dependencies but overlooks variable-specific patterns; the CI strategy (processing each variable independently) captures temporal patterns but ignores cross-variable dependencies. Hybrid methods are mostly based on feature fusion, with limited generalization and interpretability.

Limitations of Prior Work: Neither strategy alone is comprehensive, and feature fusion methods learn only the prediction mapping without learning robust representations. No prior work has systematically explored how to unify CI and CM from a representation learning perspective.

Key Challenge: Simultaneously exploiting intra-channel temporal patterns and cross-channel dependencies is necessary, yet the two strategies have different input shapes (\(L \times N\) vs. \(N \times L\)). The challenge lies in modeling both within a unified framework.

Key Insight: The inputs for CM and implicit channel-independence (ICI) are exact transposes of each other—analogous to positive pairs generated by image rotation in SimSiam. This motivates unifying the two strategies via contrastive learning.

Core Idea: Treat \(X\) (CM view) and \(X^T\) (ICI view) as positive pairs in a SimSiam framework, and jointly optimize the contrastive loss and the forecasting loss through a Siamese network.

Method¶

Overall Architecture¶

Backbone encoder \(f\): identical to the original model architecture, processing CM or CI inputs.
Siamese encoder \(g\): same structure but with internal dimensions adapted to the transposed input.
Prediction module: aligns the output dimensions of the two branches.
Joint loss: \(\mathcal{L} = (1-\lambda)\mathcal{L}_{pred} + \lambda\mathcal{L}_{simsia}\) (with adaptive \(\lambda\)).

Key Designs¶

Transposed Views as Positive Pairs:
- CM input \(X \in \mathbb{R}^{L \times N}\) and ICI input \(X^T \in \mathbb{R}^{N \times L}\) naturally form a positive pair.
- Analogous to augmented views generated by rotation/cropping in image-based methods, the transpose preserves the same information content while changing the processing dimension.
SimSiam Architecture Prevents Collapse:
- Stop-gradient is applied to prevent representational collapse; no negative samples are required, reducing training cost.
- Symmetric loss: \(\mathcal{L}_{simsia} = \frac{1}{2}\mathcal{D}(X^{Pre}, \text{sg}(X^{SiaPro})) + \frac{1}{2}\mathcal{D}(\text{sg}(X^{Pro}), X^{SiaPre})\)
Plug-and-Play Framework Design:
- Seamlessly applicable to any existing forecasting model (iTransformer, PatchTST, DLinear, S-Mamba, RLinear, etc.).
- The Siamese encoder requires only an adjustment to the input dimension, without redesigning the feature extractor.

Key Experimental Results¶

Main Results¶

Best-performance rates across 9 datasets and 7 backbone models:

Model Type	w/o C3RL	+ C3RL	Gain
CI models (5)	43.6%	81.4%	+37.8pp
CM models (2)	23.8%	76.3%	+52.5pp

Representative improvements (ETTh1, horizon = 96): - DLinear: 0.384 → 0.374 MSE - iTransformer: 0.387 → 0.387 MSE (parity or marginal change) - S-Mamba: 0.388 → 0.386 MSE

Key Findings¶

C3RL yields larger gains for CI models than CM models—CI models inherently lack cross-channel information, which C3RL supplies through contrastive learning.
The adaptive weight \(\lambda\) automatically balances the contrastive and forecasting losses across different datasets.
Even simple linear models (DLinear, RLinear) benefit from the framework.

Highlights & Insights¶

The insight that "transpose equals positive pair" is remarkably concise and elegant—no complex augmentation strategy is needed, as time series data inherently provides two complementary views.
The plug-and-play design confers strong generality, making the framework applicable to any CI or CM model.
Enhancing representations via contrastive learning, rather than solely optimizing for prediction, improves generalization.

Limitations & Future Work¶

The Siamese encoder increases parameter count and training time (approximately 1.5–2×).
Adapting the framework to the ECI (explicit channel independence) strategy requires additional handling.
The temperature and weighting hyperparameters of the contrastive loss require tuning.

Rating¶

Novelty: ⭐⭐⭐⭐ — The combination of transposed views and SimSiam is novel and intuitive.
Experimental Thoroughness: ⭐⭐⭐⭐⭐ — 7 models, 9 datasets, and comprehensive ablations.
Writing Quality: ⭐⭐⭐⭐ — Motivation is clearly articulated with well-structured figures.
Value: ⭐⭐⭐⭐ — A general-purpose enhancement framework with potential applicability to any time series model.