Skip to content

Tokenizing Single-Channel EEG with Time-Frequency Motif Learning

Conference: ICLR 2026 arXiv: 2502.16060 Code: https://github.com/Jathurshan0330/TFM-Tokenizer Area: Interpretability Keywords: EEG signal analysis, discrete tokenization, time-frequency motif, vector quantization, foundation models

TL;DR

This paper proposes TFM-Tokenizer, the first framework that learns a time-frequency motif vocabulary from single-channel EEG and encodes it into discrete tokens. It consistently improves performance on tasks such as event classification and seizure detection, and can serve as a plug-and-play component to enhance existing EEG foundation models.

Background & Motivation

  • Background: Inspired by NLP, EEG analysis is shifting toward task-agnostic foundation model paradigms.
  • Limitations of Prior Work: Tokenization is central to NLP, yet existing EEG foundation models simply segment continuous signals into short time windows without data-driven vocabulary learning. Although LaBraM proposes a neural tokenizer, it is used only as a training objective and discarded during downstream inference.
  • Root Cause — Three Core Challenges:
  • Tokenization granularity: Operation must be performed at the single-channel level to achieve device independence.
  • Token resolution: Tokens should represent underlying motifs (short recurring patterns) rather than plain time segments.
  • Learning objective: Time-frequency information must be explicitly incorporated, as the time domain alone cannot capture important frequency patterns.

Method

Overall Architecture

A two-stage design: 1. TFM-Tokenizer pre-training: Unsupervised learning of a time-frequency motif vocabulary from single-channel EEG. 2. Downstream Transformer training: Masked pre-training and fine-tuning using discrete token sequences.

Key Designs 1: Dual-Path Time-Frequency Encoding

Localized Spectral Window Encoder: - Divides the spectrogram into \(P\) non-overlapping patches along the frequency axis. - Each patch is independently projected: \(e_{(i,p)} = \text{GroupNorm}(\text{GeLU}(\mathbf{W}_p \mathbf{S}_{(i,p)}))\) - A frequency Transformer models cross-band dependencies. - Gated per-patch aggregation: A sigmoid gate selectively emphasizes informative frequency patches:

\[\mathbf{E}_i^F = \text{Concat}\left[\sigma(\mathbf{W}_{g1} \mathbf{e}_{(i,p)}) \mathbf{W}_{g2} \mathbf{e}_{(i,p)}\right]\]

Temporal Encoder: Linearly projects raw EEG patches followed by GELU and GroupNorm.

Temporal Transformer: Models long-range dependencies by concatenating frequency embeddings \(\mathbf{E}_i^F\) and temporal embeddings \(\mathbf{E}_i^T\).

Key Designs 2: VQ Vocabulary Learning

Vector quantization (VQ-VAE) maps fused embeddings to a discrete codebook:

\[q(\mathbf{z}_i) = \arg\min_{\mathbf{v}_k \in \mathcal{V}} \|\mathbf{z}_i - \mathbf{v}_k\|_2^2\]

Key Designs 3: Time-Frequency Masked Prediction

A joint frequency–time masking strategy: - Random grouped masking \(M_F\) along the frequency axis and random masking \(M_T\) along the time axis. - Symmetric masking is applied for data augmentation.

Overall loss:

\[\mathcal{L}_{\text{token}} = \sum_{(f,t)} \|\mathbf{S}(f,t) - \hat{\mathbf{S}}(f,t)\|_2^2 + \alpha \sum_i \|\text{sg}[E_i] - v_i\|_2^2 + \beta \sum_i \|E_i - \text{sg}[v_i]\|_2^2\]
  • Reconstruction loss + codebook update (commitment loss + exponential moving average).
  • No positional encoding is used, given the non-stationary and potentially chaotic nature of EEG.

Downstream Transformer

  • Initializes the token embedding lookup table with the VQ codebook.
  • Linear-attention Transformer (~0.7M parameters).
  • Incorporates channel embeddings and positional embeddings across channels.
  • Pre-trained with masked token prediction, then fine-tuned on downstream tasks.

Key Experimental Results

Main Results: TUEV Event Classification

Model Parameters Cohen's Kappa (Single Dataset) Cohen's Kappa (Multi-Dataset)
SPaRCNet 0.79M 0.4233 -
BIOT 3.2M 0.4482 -
BIOT⋆ 3.2M 0.4890 -
LaBraM⋆ ~6M - 0.5588
TFM-Tokenizer ~0.7M ~0.53 0.6189 (+11%)

IIIC Seizure Classification

Model Cohen's Kappa (Multi-Dataset)
LaBraM 0.3658
CBraMod 0.4792
TFM-Tokenizer 0.4979 (+36% vs LaBraM)

Cross-Device Scalability: Ear-EEG Sleep Staging

Setting TFM-Tokenizer vs. Baseline
Ear-EEG (non-standard 10-20 system) +14%

Integration with Existing Foundation Models

Foundation Model Original + TFM-Tokenizer
BIOT baseline +~4% (TUEV)
LaBraM baseline +~4% (TUEV)

Key Findings

  • TFM-Tokenizer achieves state-of-the-art performance with 3× fewer parameters than LaBraM and 1.5× fewer than BIOT.
  • As a plug-and-play component, it consistently improves existing foundation models such as BIOT and LaBraM.
  • Cross-device experiments on ear-EEG demonstrate that single-channel tokenization generalizes well across devices.
  • Token analysis shows that learned tokens are class-discriminative, frequency-aware, and consistent.
  • The gated aggregation mechanism effectively focuses on task-relevant frequency bands.

Highlights & Insights

  • First genuine EEG tokenization: Learns a discrete motif vocabulary used directly as downstream model input, rather than solely as a training objective.
  • Device-agnostic design: Single-channel operation allows the tokenizer to adapt to arbitrary channel configurations and devices.
  • Extremely lightweight: A downstream Transformer with ~0.7M parameters achieves state-of-the-art performance.
  • Interpretability: Discrete tokens correspond to specific neurophysiological events and support timestamp-level retrieval.

Limitations & Future Work

  • The VQ codebook size \(K\) must be predefined and may require adjustment for different EEG types.
  • Validation is currently limited to classification tasks; generative tasks (e.g., EEG reconstruction, cross-modal translation) remain unexplored.
  • The frequency patch size and band-splitting strategy in gated aggregation may need tuning for different sampling rates.
  • The scale of multi-dataset pre-training remains far smaller than NLP corpora, leaving the upper-bound potential of the tokenizer underexplored.
  • The ear-EEG experiment involves only 10 subjects, limiting statistical power.
  • EEG Foundation Models: BIOT (segment-level continuous tokenization), LaBraM (VQ tokenizer used only as training objective), BRANT, MMM.
  • VQ Tokenizers: Applications of VQ-VAE in images (VQGAN) and EEG (LaBraM).
  • EEG Motif Learning: Only a few prior works (Schäfer & Leser 2022) address time-domain motifs; joint time-frequency motif learning is introduced here for the first time.
  • Signal Tokenization: Design philosophy draws from NLP tokenization methods (BPE/WordPiece) and applies them to continuous signals.

Rating

Dimension Score
Novelty ★★★★★
Theoretical Depth ★★★☆☆
Experimental Thoroughness ★★★★☆
Value ★★★★☆
Writing Quality ★★★★☆