Spatial-Aware Decision-Making with Ring Attractors in Reinforcement Learning Systems¶

Conference: NeurIPS 2025 arXiv: 2410.03119
Code: https://github.com/marcosaura/RA_RL
Area: Reinforcement Learning Keywords: Ring Attractors, Biologically Inspired RL, Spatial Awareness, Action Space Encoding, Uncertainty Quantification

TL;DR¶

This paper integrates ring attractor models from neuroscience into action selection in deep reinforcement learning (DRL). By mapping actions to spatial positions on a ring and injecting Gaussian signals encoding Q-values and uncertainty, the proposed approach achieves a 53% improvement over baseline on Atari 100K.

Background & Motivation¶

Background: Fast and efficient action selection remains a core challenge in DRL, particularly in environments with spatial structure (e.g., coupled joint movements in robotic manipulation, adjacent directional actions in games). Existing methods treat actions as orthogonal, independent one-hot vectors, completely ignoring the topological relationships among actions.

Limitations of Prior Work: - Standard DQN represents actions as orthogonal vectors, failing to reflect the fact that "left" and "upper-left" are closer to each other than "left" and "right." - Existing spatially-aware methods (relational RL, cognitive maps, etc.) rely on complex architectures to implicitly learn spatial understanding from data, requiring large amounts of samples. - Uncertainty quantification methods (e.g., Bootstrapped DQN) treat uncertainty as an independent module, without integrating it with spatial structure.

Key Challenge: Action spaces possess inherent topological structure, yet this structural information is entirely discarded in standard DRL.

Key Insight: The ring attractor circuit found in the central complex of Drosophila is an experimentally validated neural circuit capable of stably encoding directional and spatial information.

Core Idea: The ring attractor serves as the "brain" for action selection — Q-values are transformed into Gaussian input signals on the ring (amplitude = Q-value, angle = action direction, width = uncertainty), and excitatory-inhibitory dynamics select the optimal action.

Method¶

Overall Architecture¶

Two implementations are proposed: (1) Extrinsic model — a ring attractor implemented via a continuous-time recurrent neural network (CTRNN), serving as a post-processing module at the DQN output layer to replace the standard argmax for action selection; (2) Integrated model — a reusable RNN-based deep learning module embedded within the DRL agent for end-to-end training.

Key Designs¶

Ring Attractor Architecture (Touretzky Model):
- Function: \(N\) excitatory neurons arranged in a ring plus one central inhibitory neuron, with distance-decaying synaptic connections implementing local excitation and global inhibition.
- Core equation: Excitatory neuron dynamics \(\frac{dv_n}{dt} = \frac{f(x_n + \epsilon_n + \eta_n)}{\tau} - v_n\), where \(x_n\) is the external input, \(\epsilon_n\) is excitatory feedback, and \(\eta_n\) is inhibitory feedback.
- Synaptic weights decay with distance: \(w^{(E_m \rightarrow E_n)} = e^{-d^2_{(m,n)}}\)
- Design Motivation: The excitatory-inhibitory dynamics produce a winner-take-all effect — the activity peak stabilizes near the action with the highest Q-value, modulated by neighboring high-value actions, enabling smooth spatial reasoning.
Mapping Q-Values to Ring Inputs:
- Function: Transforms DQN output Q-values into Gaussian input signals for the ring attractor.
- Core equation: \(x_n(Q(s,a)) = \sum_{a=1}^{A} \frac{Q(s,a)}{\sqrt{2\pi\sigma_a}} \exp(-\frac{(\alpha_n - \alpha_a(a))^2}{2\sigma_a^2})\)
- Three key parameters: \(K_i = Q(s,a)\) (amplitude = action value), \(\mu_i = \alpha_a(a)\) (angle = action position on the ring), \(\sigma_i = \sigma_a\) (width = uncertainty of value estimate).
- Design Motivation: Actions with higher Q-values generate stronger input signals on the ring; after excitatory-inhibitory dynamics, spatially adjacent high-value actions mutually reinforce each other — this is the core mechanism by which spatial information is exploited.
Bayesian Uncertainty Injection:
- Function: Bayesian linear regression (BLR) is used as the DQN output layer, naturally providing the Q-value variance for each action.
- Mechanism: \(Q(s,a) = \Phi_\theta(s)^T w_a\), where \(w_a\) is drawn from the BLR posterior. The Gaussian signal width \(\sigma_a\) is directly set to the BLR posterior variance.
- Design Motivation: Actions with high uncertainty yield more "diffuse" signals (wide Gaussians), while low-uncertainty actions yield sharper signals (narrow Gaussians) — automatically achieving exploration-exploitation balance.
Integrated DL Module (Reusable RNN):
- Function: Implements ring attractor dynamics using GRU/LSTM as a plug-and-play component within the DRL framework.
- Mechanism: Recurrent layers simulate the temporal evolution of the ring attractor; inputs are Q-value sequences and outputs are action selections.
- Design Motivation: CTRNN requires manual configuration of iteration steps (~50 steps) and is non-differentiable; the RNN implementation supports end-to-end training and is more efficient.

Loss & Training¶

Extrinsic model: The Q-network is trained with standard DQN loss; the ring attractor performs action selection without participating in gradient computation; the BLR output layer updates its posterior online.
Integrated model: End-to-end training, with the ring attractor RNN module participating in backpropagation.

Key Experimental Results¶

Main Results — Atari 100K Benchmark (Extrinsic Model)¶

Method	Median Human-Normalized Score (MHNS)	Mean MHNS	Superhuman Games
DQN baseline	~50%	~45%	2/26
DQN + Ring Attractor (w/o UQ)	~72%	~68%	5/26
DQN + Ring Attractor (w/ UQ)	~80%	~75%	8/26
Gain	-	+53%	-

Ablation Study — Contribution of Each Component¶

Configuration	MHNS	Gain over Baseline	Notes
DQN baseline	~50%	-	Standard argmax selection
+Ring Attractor (spatial structure only)	~68%	+35%	Value of spatial topology
+Ring Attractor + Bayesian UQ	~80%	+53%	Uncertainty integration adds +18%
+Ring Attractor + varying ring size	~65–78%	Varies	N=32 neurons is optimal

Integrated DL Model vs. Extrinsic Model¶

Method	Training Speed	Final Performance	Scalability
Extrinsic CTRNN	Slow (~50 iterations required)	High	Limited
Integrated RNN	Fast	Comparable/slightly higher	Good

Key Findings¶

The ring attractor yields the largest gains in games with spatially structured action spaces (directional movement games > discrete selection games). For example, improvements are substantial on Pong (continuous directionality) but limited on Montezuma's Revenge (exploration-intensive).
The additional 18% gain from uncertainty injection is complementary to Thompson Sampling-style exploration: high-uncertainty actions produce wider Gaussian signals that "diffuse" into neighboring actions, equivalent to exploration.
The temporal filtering effect of the ring attractor smooths the action selection sequence — reducing frequent oscillation between two similarly valued actions, which is particularly valuable in physical control settings.
The number of neurons \(N\) on the ring requires careful selection: too few leads to insufficient information; too many incurs excessive computational overhead. \(N=32\) (approximately \(4\times\) the action space size) is optimal across most games.

Highlights & Insights¶

Introducing biological neural circuits as computational primitives in RL represents a uniquely interdisciplinary perspective — ring attractors have been experimentally validated in the head-direction system of Drosophila, and leveraging this computational structure refined over millions of years of evolution reflects a deep methodological commitment to borrowing from biology.
The elegance of the three-in-one design: Gaussian amplitude = Q-value (exploitation), angle = spatial position (structure), width = uncertainty (exploration) — a single formula simultaneously encodes three critical types of information.
The method is plug-and-play — it only requires appending a ring attractor module at the DQN output, with no modification to the Q-network itself.
The "winner-take-all + spatial diffusion" effect produced by excitatory-inhibitory dynamics is naturally suited to continuous or spatially structured action selection.

Limitations & Future Work¶

The mapping from actions to the ring assumes a circular topology among actions — not all action spaces satisfy this (e.g., hierarchical action spaces with tree-like structure).
The CTRNN model requires approximately 50 iterations to converge to a steady state, introducing inference latency — this may be problematic in real-time control scenarios.
Validation is currently limited to discrete action spaces — continuous action spaces would require discretizing the ring or extending the framework to higher-dimensional attractors.
The BLR posterior update may not adapt quickly enough in non-stationary environments — forgetting mechanisms or online Bayesian methods should be considered.

vs. Bootstrapped DQN: Bootstrapped DQN treats uncertainty as an independent module for Thompson Sampling-based exploration; the ring attractor unifies uncertainty and spatial structure within a single signal.
vs. Grid Cells (DeepMind): Grid cells encode positional information in state space; ring attractors encode structural information in action space — both are biologically inspired, but their encoding targets differ.
vs. Relational DRL: Relational RL implicitly learns inter-entity relationships via attention mechanisms; the ring attractor provides spatial inductive bias through explicit ring topology — the latter is more sample-efficient.
Future directions: Extending the ring attractor to multi-dimensional attractors (e.g., toroidal structures) for encoding high-dimensional continuous action spaces.

Rating¶

Novelty: ⭐⭐⭐⭐⭐ A unique and deep intersection of neuroscience and RL; using ring attractors for action space encoding has never been proposed before.
Experimental Thoroughness: ⭐⭐⭐⭐ Standard Atari 100K benchmark, two implementations, and ablation analysis, though continuous control tasks are absent.
Writing Quality: ⭐⭐⭐⭐ Detailed biological background and complete mathematical derivations, though the paper is lengthy.
Value: ⭐⭐⭐⭐ Spatial action encoding is a promising direction, and the plug-and-play design is practical.