Tokenizing Single-Channel EEG with Time-Frequency Motif Learning¶
Conference: ICLR 2026 arXiv: 2502.16060 Code: https://github.com/Jathurshan0330/TFM-Tokenizer Area: Interpretability Keywords: EEG signal analysis, discrete tokenization, time-frequency motif, vector quantization, foundation models
TL;DR¶
This paper proposes TFM-Tokenizer, the first framework that learns a time-frequency motif vocabulary from single-channel EEG and encodes it into discrete tokens. It consistently improves performance on tasks such as event classification and seizure detection, and can serve as a plug-and-play component to enhance existing EEG foundation models.
Background & Motivation¶
- Background: Inspired by NLP, EEG analysis is shifting toward task-agnostic foundation model paradigms.
- Limitations of Prior Work: Tokenization is central to NLP, yet existing EEG foundation models simply segment continuous signals into short time windows without data-driven vocabulary learning. Although LaBraM proposes a neural tokenizer, it is used only as a training objective and discarded during downstream inference.
- Root Cause — Three Core Challenges:
- Tokenization granularity: Operation must be performed at the single-channel level to achieve device independence.
- Token resolution: Tokens should represent underlying motifs (short recurring patterns) rather than plain time segments.
- Learning objective: Time-frequency information must be explicitly incorporated, as the time domain alone cannot capture important frequency patterns.
Method¶
Overall Architecture¶
A two-stage design: 1. TFM-Tokenizer pre-training: Unsupervised learning of a time-frequency motif vocabulary from single-channel EEG. 2. Downstream Transformer training: Masked pre-training and fine-tuning using discrete token sequences.
Key Designs 1: Dual-Path Time-Frequency Encoding¶
Localized Spectral Window Encoder: - Divides the spectrogram into \(P\) non-overlapping patches along the frequency axis. - Each patch is independently projected: \(e_{(i,p)} = \text{GroupNorm}(\text{GeLU}(\mathbf{W}_p \mathbf{S}_{(i,p)}))\) - A frequency Transformer models cross-band dependencies. - Gated per-patch aggregation: A sigmoid gate selectively emphasizes informative frequency patches:
Temporal Encoder: Linearly projects raw EEG patches followed by GELU and GroupNorm.
Temporal Transformer: Models long-range dependencies by concatenating frequency embeddings \(\mathbf{E}_i^F\) and temporal embeddings \(\mathbf{E}_i^T\).
Key Designs 2: VQ Vocabulary Learning¶
Vector quantization (VQ-VAE) maps fused embeddings to a discrete codebook:
Key Designs 3: Time-Frequency Masked Prediction¶
A joint frequency–time masking strategy: - Random grouped masking \(M_F\) along the frequency axis and random masking \(M_T\) along the time axis. - Symmetric masking is applied for data augmentation.
Overall loss:
- Reconstruction loss + codebook update (commitment loss + exponential moving average).
- No positional encoding is used, given the non-stationary and potentially chaotic nature of EEG.
Downstream Transformer¶
- Initializes the token embedding lookup table with the VQ codebook.
- Linear-attention Transformer (~0.7M parameters).
- Incorporates channel embeddings and positional embeddings across channels.
- Pre-trained with masked token prediction, then fine-tuned on downstream tasks.
Key Experimental Results¶
Main Results: TUEV Event Classification¶
| Model | Parameters | Cohen's Kappa (Single Dataset) | Cohen's Kappa (Multi-Dataset) |
|---|---|---|---|
| SPaRCNet | 0.79M | 0.4233 | - |
| BIOT | 3.2M | 0.4482 | - |
| BIOT⋆ | 3.2M | 0.4890 | - |
| LaBraM⋆ | ~6M | - | 0.5588 |
| TFM-Tokenizer | ~0.7M | ~0.53 | 0.6189 (+11%) |
IIIC Seizure Classification¶
| Model | Cohen's Kappa (Multi-Dataset) |
|---|---|
| LaBraM | 0.3658 |
| CBraMod | 0.4792 |
| TFM-Tokenizer | 0.4979 (+36% vs LaBraM) |
Cross-Device Scalability: Ear-EEG Sleep Staging¶
| Setting | TFM-Tokenizer vs. Baseline |
|---|---|
| Ear-EEG (non-standard 10-20 system) | +14% |
Integration with Existing Foundation Models¶
| Foundation Model | Original | + TFM-Tokenizer |
|---|---|---|
| BIOT | baseline | +~4% (TUEV) |
| LaBraM | baseline | +~4% (TUEV) |
Key Findings¶
- TFM-Tokenizer achieves state-of-the-art performance with 3× fewer parameters than LaBraM and 1.5× fewer than BIOT.
- As a plug-and-play component, it consistently improves existing foundation models such as BIOT and LaBraM.
- Cross-device experiments on ear-EEG demonstrate that single-channel tokenization generalizes well across devices.
- Token analysis shows that learned tokens are class-discriminative, frequency-aware, and consistent.
- The gated aggregation mechanism effectively focuses on task-relevant frequency bands.
Highlights & Insights¶
- First genuine EEG tokenization: Learns a discrete motif vocabulary used directly as downstream model input, rather than solely as a training objective.
- Device-agnostic design: Single-channel operation allows the tokenizer to adapt to arbitrary channel configurations and devices.
- Extremely lightweight: A downstream Transformer with ~0.7M parameters achieves state-of-the-art performance.
- Interpretability: Discrete tokens correspond to specific neurophysiological events and support timestamp-level retrieval.
Limitations & Future Work¶
- The VQ codebook size \(K\) must be predefined and may require adjustment for different EEG types.
- Validation is currently limited to classification tasks; generative tasks (e.g., EEG reconstruction, cross-modal translation) remain unexplored.
- The frequency patch size and band-splitting strategy in gated aggregation may need tuning for different sampling rates.
- The scale of multi-dataset pre-training remains far smaller than NLP corpora, leaving the upper-bound potential of the tokenizer underexplored.
- The ear-EEG experiment involves only 10 subjects, limiting statistical power.
Related Work & Insights¶
- EEG Foundation Models: BIOT (segment-level continuous tokenization), LaBraM (VQ tokenizer used only as training objective), BRANT, MMM.
- VQ Tokenizers: Applications of VQ-VAE in images (VQGAN) and EEG (LaBraM).
- EEG Motif Learning: Only a few prior works (Schäfer & Leser 2022) address time-domain motifs; joint time-frequency motif learning is introduced here for the first time.
- Signal Tokenization: Design philosophy draws from NLP tokenization methods (BPE/WordPiece) and applies them to continuous signals.
Rating¶
| Dimension | Score |
|---|---|
| Novelty | ★★★★★ |
| Theoretical Depth | ★★★☆☆ |
| Experimental Thoroughness | ★★★★☆ |
| Value | ★★★★☆ |
| Writing Quality | ★★★★☆ |