ICLR 2026 Time Series Sparse Autoencoder Time Series Foundation Model mechanistic interpretability Chronos-T5 Causal Ablation Feature Hierarchy

Dissecting Chronos: Sparse Autoencoders Reveal Causal Feature Hierarchies in Time Series Foundation Models¶

Conference: ICLR 2026 arXiv: 2603.10071 Code: Not released Area: Time Series / Interpretability Keywords: Sparse Autoencoder, Time Series Foundation Model, mechanistic interpretability, Chronos-T5, Causal Ablation, Feature Hierarchy

TL;DR¶

This work is the first to apply Sparse Autoencoders (SAEs) to a time series foundation model (Chronos-T5-Large), revealing a depth-dependent feature hierarchy through 392 causal ablation experiments: mid-layer encoders concentrate causally critical change-point detection features, whereas the semantically richest final encoder layer exhibits the lowest causal importance.

Background & Motivation¶

Background: Time series foundation models such as Chronos-T5, TimesFM, MOMENT, and Moirai demonstrate strong zero-shot forecasting performance, yet their internal representations have never been examined at the mechanistic level.

Limitations of Prior Work: SAEs have been successfully applied in NLP to decompose dense, superposed activations of language models into interpretable features (Bricken et al., 2023; Templeton et al., 2024), and circuit analysis has identified interpretable computational subgraphs. In contrast, time series interpretability research remains confined to post-hoc methods such as saliency maps, perturbation-based explanations, counterfactual approaches, and concept-based frameworks; only Kalnāre et al. (2025) conducted a preliminary mechanistic analysis on small classifiers, and no prior work has examined foundation models.

Key Challenge: The T5 architecture underlying Chronos is mature, SAE training protocols are well-established, and Chronos's discrete tokenization (4,096 bins) provides a natural unit of analysis—yet mechanistic interpretability tools have not been applied in this domain.

Goal: This paper investigates whether SAE-learned features are causally relevant, whether a hierarchical structure exists across layers, and whether semantic richness is consistent with causal importance.

Method¶

Overall Architecture¶

SAEs with TopK sparsity are trained at six extraction points within Chronos-T5-Large (710M parameters, 24 encoder layers + 24 decoder layers, \(d_{\text{model}}=1024\)). A feature taxonomy is constructed using synthetic data, causal ablation is validated on real ETT data, and a correspondence between semantic labels and causal importance is established for each feature.

Key Design 1: TopK Sparse Autoencoder¶

Function: An SAE is trained on residual stream activations at each extraction point to decompose dense activations into sparse, interpretable features.
Mechanism: Given activation \(\mathbf{x} \in \mathbb{R}^{d_{\text{model}}}\), the SAE computes \(\mathbf{z} = \text{TopK}(\mathbf{W}_{\text{enc}}(\mathbf{x} - \mathbf{b}_{\text{dec}}) + \mathbf{b}_{\text{enc}}, k)\), retaining only the \(k=64\) largest activations, and reconstructs \(\hat{\mathbf{x}} = \mathbf{W}_{\text{dec}}\mathbf{z} + \mathbf{b}_{\text{dec}}\).
Design Motivation: TopK provides more direct sparsity control than L1 regularization; the expansion factor \(d_{\text{sae}} = 8 \times d_{\text{model}} = 8192\) provides sufficient capacity to decompose superposed features; periodic resampling of dead features ensures utilization.

Key Design 2: Six-Point Hierarchical Activation Extraction¶

Function: Forward hooks are registered at six positions—encoder layers 5, 11, and 23, and decoder layers 11 (residual stream + cross-attention output) and 23—to extract activations.
Mechanism: Early, middle, and late representative encoder layers are selected along with corresponding decoder layers, covering the full processing pipeline from input encoding to prediction generation.
Design Motivation: Research on language models has shown that layers at different depths serve distinct functions; whether a similar hierarchical structure exists in time series models is a central research question of this paper.

Key Design 3: Dual-Source Feature Taxonomy¶

Function: A synthetic diagnostic dataset (containing known properties such as trends, seasonality, change points, frequency sweeps, and heteroscedastic noise) is used to assign each SAE feature one of 11 temporal concept labels.
Mechanism: The Pearson correlation between each feature's activation pattern on synthetic data and the ground-truth attribute of each diagnostic category is computed; features whose maximum correlation falls below a threshold are labeled as unknown.
Design Motivation: Synthetic data provides ground-truth temporal attributes, avoiding the ambiguity of labeling on real data; the 11 categories cover core temporal concepts including trend, seasonality, change points, frequency, volatility, and noise.

Key Design 4: Single-Feature and Progressive Causal Ablation¶

Function: Single-feature ablation zeros out the sparse code of a given feature and measures the resulting change in CRPS; progressive ablation cumulatively removes 1–64 features ranked by their decoder norm contribution.
Mechanism: \(\Delta\text{CRPS}_j = \text{CRPS}_{\text{ablated}} - \text{CRPS}_{\text{original}}\); a positive value indicates that the feature carries information necessary for model prediction. Progressive ablation further reveals differences in robustness to feature removal across layers.
Design Motivation: Ablation establishes causal relationships directly, as opposed to correlation analysis; progressive ablation additionally distinguishes features that are "useful but redundant" from those that are "irreplaceable."

Loss & Training¶

SAEs are trained for 50,000 steps using MSE reconstruction loss with the Adam optimizer (learning rate \(3 \times 10^{-4}\), cosine decay). Ablation experiments are conducted on the ETT benchmark using a context window of 256, prediction length 64, and 4 forecast samples; an extended experiment on the final encoder layer uses 1,024 windows, 8 samples, and 200 features.

Key Experimental Results¶

Table 1: Single-Feature Ablation Summary¶

Layer	# Features	Mean \(\Delta\)CRPS	Median	Max	% Positive	Max/Median
Encoder Block 5	64	3.05	0.95	26.32	100%	27.7×
Encoder Block 11	64	5.15	1.26	38.61	100%	30.5×
Encoder Block 23	64	3.73	2.98	11.65	100%	3.9×
Encoder Block 23†	200	2.37	2.37	2.44	100%	1.03×

All 392 ablations produce positive \(\Delta\)CRPS, confirming that every feature is causally relevant. The mid-layer encoder (Block 11) exhibits the greatest causal influence (max \(\Delta\)CRPS = 38.61) with a strongly right-skewed distribution.

Table 2: Feature Taxonomy Distribution by Layer (Selected)¶

Concept	Enc 5	Enc 11	Enc 23
Seasonality	12	45	1,439
Level shift ↑	66	1,024	1,097
High frequency	97	91	668
Noise	32	413	315
Label coverage	4.9%	25.8%	59.8%

The final encoder layer is semantically richest (59.8% label coverage), while the mid-layer encoder concentrates change-point detection features (1,024 level_shift_up features).

Key Findings from Progressive Ablation¶

Block 11: CRPS rises sharply from 2.61 to 25.32 (catastrophic degradation).
Block 5: CRPS rises from 7.05 to 21.54.
Block 23: CRPS decreases from 3.62 to 2.73 (an improvement of 0.89); extended experiments confirm this trend is stable.

Highlights & Insights¶

Pioneering contribution: This is the first application of SAEs to a time series foundation model, successfully transferring mechanistic interpretability methodology from NLP.
Counter-intuitive finding: Causal importance is inversely correlated with semantic richness—mid-layer encoders are causally most critical yet semantically sparse, while the final layer is semantically richest yet yields improved performance upon ablation.
100% causal validation rate: All 392 ablations produce positive CRPS degradation, providing strong evidence for the causal relevance of SAE features.
Change-point detection as a core mechanism: Chronos-T5 relies primarily on change-point dynamics rather than periodic pattern recognition, offering guidance for model understanding and improvement.
Plausible explanation for the final-layer ablation paradox: The final encoder layer likely encodes cross-domain generalization features, and ablation on a specific dataset may function as implicit domain adaptation.

Limitations & Future Work¶

Limited dataset coverage: Causal ablation is conducted exclusively on ETT data; whether findings generalize to other time series domains remains unknown.
Low taxonomy coverage: 82.8% of features receive no label, and decoder-side coverage is below 6%, indicating that the feature taxonomy remains coarse.
Single model analysis: Only Chronos-T5-Large is studied; cross-architecture comparisons (e.g., TimesFM, MOMENT) are absent.
Limited statistical precision in ablation configuration: The fast configuration (256 windows, 4 samples) provides directional conclusions only, with insufficient quantitative precision.
Absence of circuit-level analysis: Only feature-level ablation is performed; the connectivity and computational graph structure among features are not examined.

Time series foundation models: Chronos-T5 (Ansari et al., 2024), TimesFM (Das et al., 2024), MOMENT (Goswami et al., 2024), Moirai (Woo et al., 2024).
SAEs and mechanistic interpretability: Bricken et al. (2023) first applied SAEs to decompose language models; Cunningham et al. (2024) proposed TopK SAEs; Templeton et al. (2024) scaled SAEs to Claude 3 Sonnet.
Time series interpretability: Saliency maps (Zhao et al., 2023), perturbation-based explanations (Enguehard, 2023; Liu et al., 2024), counterfactuals (Yan & Wang, 2023), concept-based frameworks (van Sprang et al., 2024).
Mechanistic analysis of time series models: Kalnāre et al. (2025) conducted a preliminary mechanistic analysis on small classifiers; this paper is the first to extend such analysis to foundation models.

Rating¶

Novelty: ⭐⭐⭐⭐ — First transfer of SAE methodology from NLP to time series foundation models; a pioneering contribution.
Experimental Thoroughness: ⭐⭐⭐ — 392 ablations are compelling but limited to ETT data, a single model, and incomplete taxonomy coverage.
Writing Quality: ⭐⭐⭐⭐ — Clear structure, well-articulated counter-intuitive findings, and well-designed figures and tables.
Value: ⭐⭐⭐⭐ — Opens a new direction for mechanistic interpretability of time series models; findings offer actionable guidance for model design and compression.