Dissecting Chronos: Sparse Autoencoders Reveal Causal Feature Hierarchies in Time Series Foundation Models¶
Conference: ICLR 2026 arXiv: 2603.10071 Code: Not released Area: Time Series / Interpretability Keywords: Sparse Autoencoder, Time Series Foundation Model, mechanistic interpretability, Chronos-T5, Causal Ablation, Feature Hierarchy
TL;DR¶
This work is the first to apply Sparse Autoencoders (SAEs) to a time series foundation model (Chronos-T5-Large), revealing a depth-dependent feature hierarchy through 392 causal ablation experiments: mid-layer encoders concentrate causally critical change-point detection features, whereas the semantically richest final encoder layer exhibits the lowest causal importance.
Background & Motivation¶
Background: Time series foundation models such as Chronos-T5, TimesFM, MOMENT, and Moirai demonstrate strong zero-shot forecasting performance, yet their internal representations have never been examined at the mechanistic level.
Limitations of Prior Work: SAEs have been successfully applied in NLP to decompose dense, superposed activations of language models into interpretable features (Bricken et al., 2023; Templeton et al., 2024), and circuit analysis has identified interpretable computational subgraphs. In contrast, time series interpretability research remains confined to post-hoc methods such as saliency maps, perturbation-based explanations, counterfactual approaches, and concept-based frameworks; only Kalnāre et al. (2025) conducted a preliminary mechanistic analysis on small classifiers, and no prior work has examined foundation models.
Key Challenge: The T5 architecture underlying Chronos is mature, SAE training protocols are well-established, and Chronos's discrete tokenization (4,096 bins) provides a natural unit of analysis—yet mechanistic interpretability tools have not been applied in this domain.
Goal: This paper investigates whether SAE-learned features are causally relevant, whether a hierarchical structure exists across layers, and whether semantic richness is consistent with causal importance.
Method¶
Overall Architecture¶
SAEs with TopK sparsity are trained at six extraction points within Chronos-T5-Large (710M parameters, 24 encoder layers + 24 decoder layers, \(d_{\text{model}}=1024\)). A feature taxonomy is constructed using synthetic data, causal ablation is validated on real ETT data, and a correspondence between semantic labels and causal importance is established for each feature.
Key Design 1: TopK Sparse Autoencoder¶
- Function: An SAE is trained on residual stream activations at each extraction point to decompose dense activations into sparse, interpretable features.
- Mechanism: Given activation \(\mathbf{x} \in \mathbb{R}^{d_{\text{model}}}\), the SAE computes \(\mathbf{z} = \text{TopK}(\mathbf{W}_{\text{enc}}(\mathbf{x} - \mathbf{b}_{\text{dec}}) + \mathbf{b}_{\text{enc}}, k)\), retaining only the \(k=64\) largest activations, and reconstructs \(\hat{\mathbf{x}} = \mathbf{W}_{\text{dec}}\mathbf{z} + \mathbf{b}_{\text{dec}}\).
- Design Motivation: TopK provides more direct sparsity control than L1 regularization; the expansion factor \(d_{\text{sae}} = 8 \times d_{\text{model}} = 8192\) provides sufficient capacity to decompose superposed features; periodic resampling of dead features ensures utilization.
Key Design 2: Six-Point Hierarchical Activation Extraction¶
- Function: Forward hooks are registered at six positions—encoder layers 5, 11, and 23, and decoder layers 11 (residual stream + cross-attention output) and 23—to extract activations.
- Mechanism: Early, middle, and late representative encoder layers are selected along with corresponding decoder layers, covering the full processing pipeline from input encoding to prediction generation.
- Design Motivation: Research on language models has shown that layers at different depths serve distinct functions; whether a similar hierarchical structure exists in time series models is a central research question of this paper.
Key Design 3: Dual-Source Feature Taxonomy¶
- Function: A synthetic diagnostic dataset (containing known properties such as trends, seasonality, change points, frequency sweeps, and heteroscedastic noise) is used to assign each SAE feature one of 11 temporal concept labels.
- Mechanism: The Pearson correlation between each feature's activation pattern on synthetic data and the ground-truth attribute of each diagnostic category is computed; features whose maximum correlation falls below a threshold are labeled as unknown.
- Design Motivation: Synthetic data provides ground-truth temporal attributes, avoiding the ambiguity of labeling on real data; the 11 categories cover core temporal concepts including trend, seasonality, change points, frequency, volatility, and noise.
Key Design 4: Single-Feature and Progressive Causal Ablation¶
- Function: Single-feature ablation zeros out the sparse code of a given feature and measures the resulting change in CRPS; progressive ablation cumulatively removes 1–64 features ranked by their decoder norm contribution.
- Mechanism: \(\Delta\text{CRPS}_j = \text{CRPS}_{\text{ablated}} - \text{CRPS}_{\text{original}}\); a positive value indicates that the feature carries information necessary for model prediction. Progressive ablation further reveals differences in robustness to feature removal across layers.
- Design Motivation: Ablation establishes causal relationships directly, as opposed to correlation analysis; progressive ablation additionally distinguishes features that are "useful but redundant" from those that are "irreplaceable."
Loss & Training¶
SAEs are trained for 50,000 steps using MSE reconstruction loss with the Adam optimizer (learning rate \(3 \times 10^{-4}\), cosine decay). Ablation experiments are conducted on the ETT benchmark using a context window of 256, prediction length 64, and 4 forecast samples; an extended experiment on the final encoder layer uses 1,024 windows, 8 samples, and 200 features.
Key Experimental Results¶
Table 1: Single-Feature Ablation Summary¶
| Layer | # Features | Mean \(\Delta\)CRPS | Median | Max | % Positive | Max/Median |
|---|---|---|---|---|---|---|
| Encoder Block 5 | 64 | 3.05 | 0.95 | 26.32 | 100% | 27.7× |
| Encoder Block 11 | 64 | 5.15 | 1.26 | 38.61 | 100% | 30.5× |
| Encoder Block 23 | 64 | 3.73 | 2.98 | 11.65 | 100% | 3.9× |
| Encoder Block 23† | 200 | 2.37 | 2.37 | 2.44 | 100% | 1.03× |
All 392 ablations produce positive \(\Delta\)CRPS, confirming that every feature is causally relevant. The mid-layer encoder (Block 11) exhibits the greatest causal influence (max \(\Delta\)CRPS = 38.61) with a strongly right-skewed distribution.
Table 2: Feature Taxonomy Distribution by Layer (Selected)¶
| Concept | Enc 5 | Enc 11 | Enc 23 |
|---|---|---|---|
| Seasonality | 12 | 45 | 1,439 |
| Level shift ↑ | 66 | 1,024 | 1,097 |
| High frequency | 97 | 91 | 668 |
| Noise | 32 | 413 | 315 |
| Label coverage | 4.9% | 25.8% | 59.8% |
The final encoder layer is semantically richest (59.8% label coverage), while the mid-layer encoder concentrates change-point detection features (1,024 level_shift_up features).
Key Findings from Progressive Ablation¶
- Block 11: CRPS rises sharply from 2.61 to 25.32 (catastrophic degradation).
- Block 5: CRPS rises from 7.05 to 21.54.
- Block 23: CRPS decreases from 3.62 to 2.73 (an improvement of 0.89); extended experiments confirm this trend is stable.
Highlights & Insights¶
- Pioneering contribution: This is the first application of SAEs to a time series foundation model, successfully transferring mechanistic interpretability methodology from NLP.
- Counter-intuitive finding: Causal importance is inversely correlated with semantic richness—mid-layer encoders are causally most critical yet semantically sparse, while the final layer is semantically richest yet yields improved performance upon ablation.
- 100% causal validation rate: All 392 ablations produce positive CRPS degradation, providing strong evidence for the causal relevance of SAE features.
- Change-point detection as a core mechanism: Chronos-T5 relies primarily on change-point dynamics rather than periodic pattern recognition, offering guidance for model understanding and improvement.
- Plausible explanation for the final-layer ablation paradox: The final encoder layer likely encodes cross-domain generalization features, and ablation on a specific dataset may function as implicit domain adaptation.
Limitations & Future Work¶
- Limited dataset coverage: Causal ablation is conducted exclusively on ETT data; whether findings generalize to other time series domains remains unknown.
- Low taxonomy coverage: 82.8% of features receive no label, and decoder-side coverage is below 6%, indicating that the feature taxonomy remains coarse.
- Single model analysis: Only Chronos-T5-Large is studied; cross-architecture comparisons (e.g., TimesFM, MOMENT) are absent.
- Limited statistical precision in ablation configuration: The fast configuration (256 windows, 4 samples) provides directional conclusions only, with insufficient quantitative precision.
- Absence of circuit-level analysis: Only feature-level ablation is performed; the connectivity and computational graph structure among features are not examined.
Related Work & Insights¶
- Time series foundation models: Chronos-T5 (Ansari et al., 2024), TimesFM (Das et al., 2024), MOMENT (Goswami et al., 2024), Moirai (Woo et al., 2024).
- SAEs and mechanistic interpretability: Bricken et al. (2023) first applied SAEs to decompose language models; Cunningham et al. (2024) proposed TopK SAEs; Templeton et al. (2024) scaled SAEs to Claude 3 Sonnet.
- Time series interpretability: Saliency maps (Zhao et al., 2023), perturbation-based explanations (Enguehard, 2023; Liu et al., 2024), counterfactuals (Yan & Wang, 2023), concept-based frameworks (van Sprang et al., 2024).
- Mechanistic analysis of time series models: Kalnāre et al. (2025) conducted a preliminary mechanistic analysis on small classifiers; this paper is the first to extend such analysis to foundation models.
Rating¶
- Novelty: ⭐⭐⭐⭐ — First transfer of SAE methodology from NLP to time series foundation models; a pioneering contribution.
- Experimental Thoroughness: ⭐⭐⭐ — 392 ablations are compelling but limited to ETT data, a single model, and incomplete taxonomy coverage.
- Writing Quality: ⭐⭐⭐⭐ — Clear structure, well-articulated counter-intuitive findings, and well-designed figures and tables.
- Value: ⭐⭐⭐⭐ — Opens a new direction for mechanistic interpretability of time series models; findings offer actionable guidance for model design and compression.