BaRISTA: Brain-Scale Informed Spatiotemporal Representation of Human Intracranial EEG¶

Conference: NeurIPS 2025 arXiv: 2512.12135 Code: https://github.com/ShanechiLab/BaRISTA Area: Neuroscience / Foundation Models Keywords: Intracranial EEG, Spatiotemporal Transformer, Spatial Encoding Scale, Masked Reconstruction, Pretraining

TL;DR¶

BaRISTA systematically investigates spatial encoding scales (electrode/parcel/lobe) for iEEG Transformers, finding that atlas parcel-level encoding combined with spatial masked reconstruction achieves 86.2% AUC on language task decoding (vs. PopT 79.5%). The choice of encoding scale has greater impact than masking strategy, and the model generalizes well across subjects.

Background & Motivation¶

Background: iEEG provides high spatiotemporal resolution brain activity recordings. Transformer-based pretrained models (PopT, Brant) have been applied to iEEG, but the choice of spatial encoding has not been systematically studied.

Limitations of Prior Work: Electrode positions vary entirely across patients, making channel-level encoding difficult to generalize across subjects. Whether parcel-level or lobe-level encoding is superior, and how spatial encoding interacts with masking strategy, remain open questions.

Key Challenge: Fine-grained spatial resolution (channel) carries the most information but lacks cross-patient consistency; coarse-grained resolution (lobe) is consistent across patients but may sacrifice local information.

Goal: Systematically compare three spatial encoding scales to identify the optimal encoding–masking combination.

Key Insight: Channel, atlas parcel, and lobe are treated as experimental variables and systematically ablated within a masked reconstruction pretraining framework.

Core Idea: By systematically comparing three spatial encoding scales in an iEEG Transformer, the paper identifies atlas parcel-level encoding as the optimal spatial granularity—balancing cross-patient consistency with local information retention.

Method¶

Overall Architecture¶

iEEG data (2048 Hz) → Temporal tokenization (Dilated CNN extracting 250 ms patches) → Spatial encoding (learnable embeddings \(E_j\) at channel/atlas parcel/lobe granularity) → token \(S_{ij} = B_{ij} + E_j\) → Spatial masking (randomly selected spatial categories) → Transformer (12 layers / 4 heads / \(d=64\) + RoPE) → EMA target reconstruction (MSE between online encoder and EMA target encoder)

Key Designs¶

Three Spatial Encoding Scales:
- Channel: MNI coordinates \((x, y, z)\) → learnable embeddings (finest granularity but inconsistent across patients)
- Atlas parcels: Destrieux atlas parcellation (intermediate granularity, consistent across patients)
- Lobes: Cerebral lobes + subcortical regions (coarsest but most stable)
Spatial Masked Reconstruction Pretraining:
- Randomly selected spatial categories are masked (e.g., all electrode patches within a given brain region)
- Online tokenizer \(\mathcal{F}\) and EMA target tokenizer \(\tilde{\mathcal{F}}\) (momentum warmup from 0 to 0.996)
- \(\mathcal{L} = \frac{1}{|B_{target}|}\sum \|\tilde{B}_{ij} - \hat{B}_{ij}\|_2^2\)
Interleaved Spatiotemporal Sequences: Spatial and temporal tokens are interleaved so that attention jointly captures spatiotemporal dependencies.

Loss & Training¶

Brain Treebank dataset: 10 epilepsy patients, 26 sessions, 2048 Hz
Evaluation via pretraining followed by linear probing

Key Experimental Results¶

Main Results (Downstream Classification AUC %)¶

Encoding / Masking	Sentence Onset	Speech / Non-Speech
Channel	77.8%	76.4%
Parcel	86.2%	86.9%
Lobe	84.2%	84.1%
PopT baseline	79.5%	77.5%
Brant baseline	76.7%	69.1%

Ablation Study (ANOVA)¶

Factor	p-value	Effect Size
Encoding scale	p<1e-3	Large
Masking strategy	p=0.01–0.04	Medium
Interaction	Channel encoding + Channel masking yields best match	—

Key Findings¶

Parcel encoding significantly outperforms channel encoding (+8.4% sentence onset, +10.5% speech)—anatomical priors are more informative than precise coordinates.
The effect of encoding scale exceeds that of masking strategy—selecting the right encoding is more critical than selecting the right masking approach.
Cross-subject generalization: held-out subjects achieve 84.1% AUC (vs. 86.9% with target subjects), confirming that parcel-level encoding facilitates cross-patient generalization.
Performance scales positively with data volume: continuous improvement is observed as pretraining data increases from 5% to 75%.

Highlights & Insights¶

Spatial encoding scale is a critical yet underexplored design choice: prior works default to channel-level encoding, while BaRISTA demonstrates that atlas parcel-level encoding is superior.
"Intermediate granularity outperforms finest granularity": although channel-level encoding carries the most information, its cross-patient inconsistency leads to poor generalization, whereas atlas parcels balance precision and generalizability.

Limitations & Future Work¶

Relies solely on anatomical parcellation; functional brain region encoding is not explored.
Only spatial masking is evaluated; joint spatiotemporal masking is not tested.
Experiments are conducted at a single sampling rate (2048 Hz).
The dilated CNN temporal tokenizer may not be optimal.

vs. PopT: PopT employs channel-level encoding; BaRISTA demonstrates that parcel-level encoding is superior (+6.7%).
vs. Brant: Brant uses region-level encoding but provides no systematic ablation; BaRISTA offers a comprehensive analysis.

Rating¶

Novelty: ⭐⭐⭐⭐ — Systematic ablation of spatial encoding scales is conducted for the first time.
Experimental Thoroughness: ⭐⭐⭐⭐⭐ — Three scales × multiple masking strategies × ANOVA + cross-subject + data scaling.
Writing Quality: ⭐⭐⭐⭐ — Rigorous experimental design.
Value: ⭐⭐⭐⭐ — Provides critical guidance for iEEG foundation model design.
The hierarchical spatial structure of neural signals calls for multi-scale modeling—coarse and fine granularities are complementary.
Larger spatial scales (lobe-level) improve decoding performance; pretraining substantially benefits low-data regimes.
The core contribution lies in the simplicity and effectiveness of the design rationale.
Experimental results thoroughly validate the central hypothesis.