Temporal Superposition and Feature Geometry of RNNs under Memory Demands¶

Conference: ICLR 2026
OpenReview: https://openreview.net/forum?id=7cMzTpbJHC
Code: https://github.com/kashparty/iclr-rnn-superposition
Area: Interpretability / Representational Geometry / Recurrent Networks
Keywords: Temporal Superposition, Feature Geometry, RNN, Superposition Hypothesis, Working Memory

TL;DR¶

This paper extends the concept of "feature superposition" from feedforward networks to the temporal dimension, proposing temporal superposition: by training linear/nonlinear RNNs on \(k\)-delay recall tasks, analytically decomposing the loss into four terms, identifying the ReLU-induced "interference-free space," and a phase transition between dense and sparse mechanisms, it mechanistically explains why and how RNNs choose different representational geometries under memory pressure.

Background & Motivation¶

Background: In the field of interpretability, the "superposition hypothesis" (Elhage et al., 2022) has become the mainstream framework for understanding polysemanticity in neural networks. When the number of features in data exceeds the number of neurons and features are sparse (rarely co-occurring), networks pack features non-orthogonally into the activation space—representing more features with fewer dimensions at the cost of interference. This "spatial superposition" has been repeatedly verified in feedforward/Transformer models, with much mechanistic interpretability work focusing on extracting individual features from superposition.

Limitations of Prior Work: Previous studies on superposition have focused almost exclusively on the spatial dimension (number of features vs. neurons), completely ignoring time (memory) as an additional capacity constraint. While time is not a bottleneck in feedforward or Transformer models, recurrent structures naturally impose memory pressure: the RNN hidden state is a fixed-size bottleneck that must simultaneously accommodate "current input" and "past inputs that need to be retained for \(k\) steps." Once the memory window exceeds the hidden state dimension, the RNN is forced to either forget or compress features from multiple timesteps into the same low-dimensional space. This time-induced superposition has not been previously characterized.

Key Challenge: The hidden state dimension \(N_h\) is fixed, while the number of "task-relevant features" required by the task grows linearly with memory length \(k\). When \(k+1 > N_h\), the RNN must trade off between "retaining more features" and "reducing interference between features"—the root of superposition, though the interference patterns become more complex along the time axis.

Goal: (1) Formalize the concept of "temporal superposition"; (2) Elucidate how data sparsity, task memory demand \(k\), and network dimension jointly determine the representational geometry learned by RNNs; (3) Explain "why" these geometries are optimal.

Key Insight: The authors use a minimal but analytically tractable setup—linear recurrent RNNs, scalar inputs/outputs, 2D hidden states, and \(k\)-delay recall tasks—to derive closed-form loss solutions, then verify that these insights generalize to nonlinear RNNs and high-dimensional cases.

Core Idea: Each input feature is treated as having a dual identity: "what it is" (spatial direction) and "how long ago it appeared" (temporal component). Multiple feature directions \(w_s\) from the input history are superimposed in the hidden state, and the geometry learned by the RNN minimizes interference generated by the projections and combinations of these directions.

Method¶

Overall Architecture¶

Rather than proposing a new model, the paper establishes a theoretical analysis framework to explain the representational geometry of RNNs under memory pressure. The object of study is a linear recurrent RNN:

\[h_t = W_x x_t + W_h \sigma_h(h_{t-1}), \quad \hat{y}_t = \sigma_y(W_y^\top h_t)\]

where \(\sigma\) can be identity (linear) or ReLU. When \(\sigma_h\) is linear, the hidden state expands into a weighted sum of historical inputs. The authors introduce the quantity \(s := t-i\), representing "how many recursions have passed," and define \(W_s := W_h^s W_x\) to rewrite the hidden state as:

\[h_t = \sum_{s=0}^{t-1} W_h^s W_x \, x_{t-s} = \sum_{s=0}^{t-1} w_s \, x_{t-s}\]

This rewrite is the pivot of the analysis: it shows that at time \(t\), each historical input \(x_{t-s}\) is independently and linearly represented in direction \(w_s\) (the "feature direction"). The readout vector \(w_y\) only responds to directions projecting onto it. For the \(k\)-delay recall task (requiring \(\hat{y}_t = x_{t-k}\)), \(w_{s=k}\) is the "output feature direction" (should align with \(w_y\)), \(w_s\ (0\le s<k)\) are "intermediate features" to be retained, and \(s>k\) are "historically irrelevant features" to be forgotten.

The analysis proceeds in four steps: ① Deriving the closed-form expected loss for the linear case and decomposing it into four terms; ② Adding ReLU to the readout to reveal the "interference-free space" under high sparsity; ③ Scanning sparsity to observe a dense-to-sparse phase transition; ④ Swapping scalar inputs for vectors and varying \(k\) to study the trade-off between spatial and temporal superposition.

Key Designs¶

1. Temporal Superposition and Feature Directions: Encoding "When" into Geometry

The core conceptual innovation is extending superposition from spatial dimensions to the temporal dimension. Spatial superposition refers to "non-orthogonal compression when features exceed neurons"; temporal superposition points out that in an RNN, the same input \(A\) is represented as different directions at different times. After an impulse of \(A\), as it "ages" (\(s\) increases), it moves along a sequence of feature directions \(w_{s=0}, w_{s=1}, \dots\) in the hidden state until it is no longer task-relevant. Representation is thus determined by both "when" and "what."

Since the hidden state dimension is fixed, a longer memory window (larger \(k\)) forces more feature directions into the hidden state, tightening the bottleneck. The paper distinguishes temporal superposition: Temporal Superposition = representing features over longer time spans in lower-dimensional activation spaces; Spatial Superposition = packing more input features into the space at the same time.

2. Four-Term Loss Decomposition: Explaining Geometric Strategies as Competitive Incentives

To understand what geometry the learning problem "rewards," the authors derive the closed-form expected loss for a linear RNN under temporal independence and sparsity assumptions:

\[\mathbb{E}[L] = \sum_t \Big( \underbrace{p\nu (w_y^\top w_{s=k}-1)^2}_{\text{Task Reward}} - \underbrace{2p^2\mu^2 \sum_{s\ne k} w_y^\top w_s}_{\text{Mean Correction}} + \underbrace{p\nu \sum_{s\ne k}(w_y^\top w_s)^2}_{\text{Projection Interference}} + \underbrace{p^2\mu^2 \sum_{s\ne s'}(w_y^\top w_s)(w_y^\top w_{s'})}_{\text{Compositional Interference}} \Big)\]

where \(p\) controls temporal sparsity (\(p \to 0\) is sparser), and \(\mu, \nu\) are the mean and variance. Each term corresponds to an incentive: Task Reward aligns output features \(w_{s=k}\) with readout \(w_y\); Mean Correction allows the RNN to use projection interference to offset non-zero means; Projection Interference penalizes features at the wrong time (\(s \neq k\)) projecting onto the readout; Compositional Interference penalizes positive correlation between \(w_s\) and rewards negative correlation, encouraging features to spread out, ideally forming antipodal pairs.

3. Interference-Free Space: ReLU Readouts Carve Out a Half-Space for Intermediate Features

With ReLU readouts (\(\sigma_y = \text{ReLU}\)), the authors approximate the loss in the high temporal sparsity limit:

\[\mathbb{E}[L] \approx p\nu\Big( \sum_t (\text{ReLU}(w_y^\top w_{s=k})-1)^2 + \sum_t \sum_{s\ne k} \text{ReLU}(w_y^\top w_s)^2 \Big) + O(p^2)\]

The key shift is that ReLU only produces output for positive projections. Any feature direction falling in the opposite half-space of the readout \(w_y\) has its projection truncated to zero, contributing zero projection interference. This creates a "interference-free space" where the model is incentivized to pack as many intermediate feature directions \(w_{s\neq k}\) as possible.

4. Dense-Sparse Phase Transition and Spatial-Temporal Trade-offs

By scanning temporal sparsity, SSMs undergo a phase transition between two discrete mechanisms. In the dense mechanism (low sparsity), SSMs degenerate into a spiral sink similar to linear RNNs because compositional interference is more likely. In the sparse mechanism, SSMs fully utilize the interference-free space, spreading feature directions across \(\sim 270^{\circ}\) to avoid the readout.

When varying \(k\) with vector inputs, a trade-off between spatial and temporal superposition emerges. Nonlinear RNNs adopt an "all-or-none" strategy: to benefit from representing a feature, it must be maintained for all \(k+1\) steps; otherwise, it contributes nothing to loss reduction and is not represented at all.

Loss & Training¶

The task loss is the mean squared error for \(k\)-delay recall \(L = \sum_t \|y_t - \hat{y}_t\|^2\), where \(y_t = x_{t-k}\) (\(y_t=0\) for \(t \le k\)). \(k\) serves as a control parameter for memory duration, and \(p\) controls temporal sparsity. Theoretical analysis derives the expected loss under "temporal independence + sparsity" for linear (exact) and ReLU readout (sparse approximation) cases.

Key Experimental Results¶

The paper focuses on theoretical and mechanistic analysis, using small RNNs to verify predicted geometric strategies rather than benchmarking.

Main Results: Theory vs. Learned Geometry¶

Model	Recurrence / Readout	Geometry in Sparse Regime	Forgetting Mechanism
Linear RNN	Linear / Linear	No interference-free space; features spiral to origin (Spiral Sink)	Smooth Forgetting
SSM	Linear / ReLU	Approximates packing largest directions into interference-free space	Intermediate
Nonlinear RNN	ReLU / ReLU	Accurately packs all \(k\) intermediate features into interference-free space	Sharp Forgetting

Measured loss curves match the four-term decomposition \(\mathbb{E}[L] = (i) + (ii) + (iii) + (iv)\) closely, showing a "staircase" learning dynamic (readout alignment first, then temporal separation).

Ablation Study¶

Analysis	Setting	Key Finding
Dense-Sparse Transition	Scan sparsity \(1-p\)	Angle \(k\theta\) jumps from \(\sim 90^{\circ}\) to \(\sim 270^{\circ}\); spectral radius \(\rho\) drops
High-dim Hidden State	\(N_x=10, N_h \in \{2,5,10\}\)	\(W_y^\top W_{s=k}\) diagonal is positive; others are negative/zero (packed in interference-free space)
Large-scale Validation	\(N_h=100, 75\) features, \(k=2\)	Optimal models group largest feature directions into interference-free space, matching predictions

Key Findings¶

Loss decomposition is the pivot: It predicts measured loss and maps geometric motivations (alignment, spreading, mean-offsetting) to specific terms.
Expressivity determines access to interference-free space: Nonlinear RNNs can achieve sharp forgetting by clearing features via ReLU in the hidden state, unlike the gradual shrinkage in linear RNNs.
Capacity-limited RNNs are "all-or-none": They prefer to fully retain an important feature for \(k+1\) steps rather than partially retaining multiple features.

Highlights & Insights¶

Temporal as a capacity constraint: Unlike prior work focusing only on spatial dimensions, this paper shows that memory length linearly consumes capacity, a perspective applicable to Mamba/SSM/Long-context models.
Interference-free space as a "clean" mechanism: A "free lunch" provided by nonlinearity, explaining why nonlinear RNNs favor sharp over smooth forgetting.
Transferable methodology: The "feature direction + interference decomposition" framework (\(\sum_s w_s x_{t-s}\)) can be applied to other representation geometry problems involving time axes.

Limitations & Future Work¶

Theoretical assumptions: Assumes temporally independent features and uses small RNNs; temporal sparsity may not hold for all tasks.
Task simplicity: Limited to \(k\)-delay recall; does not involve input manipulation/transformation or variable memory demands.
Linear representation hypothesis: Assumes features are directions in activation space; applicability to highly overparameterized LLMs remains an open question.

vs. Elhage et al. (2022): Extends their spatial toy model to the temporal dimension, adding projection/compositional interference and phase transitions.
vs. Low-rank RNN theories (Mastrogiuseppe & Ostojic): Complements phase-portrait studies by explicitly focusing on "memory-induced capacity constraints."
vs. Working Memory theories: RNNs reflect both "slots" (directions) and "resources" (non-orthogonal interference in finite space).

Rating¶

Novelty: ⭐⭐⭐⭐⭐
Experimental Thoroughness: ⭐⭐⭐⭐
Writing Quality: ⭐⭐⭐⭐⭐
Value: ⭐⭐⭐⭐