Forecasting Epileptic Seizures from Contactless Camera via Cross-Species Transfer Learning¶

Conference: CVPR 2026 arXiv: 2603.12887 Code: N/A Area: Medical Imaging Keywords: epileptic seizure forecasting, video analysis, cross-species transfer learning, VideoMAE, few-shot learning

TL;DR¶

This work introduces the first purely vision-based epileptic seizure forecasting task, leveraging large-scale rodent epilepsy videos for cross-species self-supervised pre-training via the VideoMAE framework, achieving >70% forecasting accuracy within a 3–10 second prediction window.

Background & Motivation¶

Epileptic seizure forecasting is clinically valuable yet technically challenging. Existing approaches rely predominantly on neural signals such as EEG, requiring specialized equipment that is ill-suited for long-term deployment. Video data offers non-invasive and accessible alternatives; however, prior video-based studies focus mainly on post-ictal detection, leaving pre-ictal forecasting largely unexplored. The core challenges are: (1) annotated human epilepsy video data is extremely scarce due to privacy constraints; and (2) general-purpose video pre-trained models lack epilepsy-relevant behavioral representations. Rodent epilepsy models are widely used in epilepsy research, and their ictal behaviors exhibit cross-species consistency with humans, providing an opportunity for knowledge transfer.

Method¶

Overall Architecture¶

A two-stage framework is proposed: Stage 1 performs VideoMAE self-supervised pre-training on a cross-species mixed dataset; Stage 2 transfers the pre-trained encoder to few-shot classification on human epilepsy videos.

Key Designs¶

Cross-Species Self-Supervised Pre-training: A mixed dataset \(D_{pt} = \{v_r^{(1)}, \ldots, v_r^{(m)}, v_h^{(1)}, \ldots, v_h^{(n)}\}\) is constructed, comprising rodent epilepsy videos (2,952 seizure + 3,000 normal clips from the RodEpil dataset) and 1,870 inter-ictal video clips from 6 human patients. A tube masking strategy (optimal masking ratio = 0.3) is applied, with MSE loss supervising spatiotemporal patch reconstruction: \(\mathcal{L}_{MSE} = \frac{1}{\Omega}\sum_{i \in \Omega}(I_i - \hat{I}_i)^2\). The design motivation is to compensate for the scarcity of human epilepsy videos through cross-species data.
Few-Shot Fine-Tuning: The decoder is discarded; a lightweight classification head is appended to the CLS token to predict seizure probability: \(\hat{y} = \sigma(\mathbf{W} \cdot \mathbf{z}_{\text{cls}} + b)\). Model adaptation under data-scarce conditions is evaluated in 2/3/4-shot settings. Gradient checkpointing and 16-bit mixed-precision training are employed to ensure training stability and memory efficiency.
Pre-training Data Ablation Design: Different data combinations are systematically compared — human-only (+H), seizure rodents only (+R(Y)), normal rodents only (+R(N)), mixed rodents (+R(Y/N)), and the full cross-species combination (+R(Y/N)+H) — to validate the effectiveness of cross-species transfer and the contribution of each component.

Loss & Training¶

Stage 1: MSE reconstruction loss, Adam optimizer with LR = \(1 \times 10^{-4}\), 8 × NVIDIA L40 GPUs, tube masking ratio = 0.3
Stage 2: Binary cross-entropy classification loss, fine-tuned for 20 epochs
Input sampling: \(T=16\) frames, temporal stride 2, resolution \(224 \times 224\)

Key Experimental Results¶

Datasets¶

Pre-training data: RodEpil rodent dataset (2,952 seizure + 3,000 normal, 10-second clips) + 1,870 inter-ictal 5-second clips from 6 human patients
Evaluation benchmark: 40 video sequences (20 pre-ictal + 20 inter-ictal), sourced from 2 public and 1 private data sources
Evaluation protocol: 2/3/4-shot independent sampling with non-overlapping support and query sets

Main Results¶

Method	Metric	2-shot	3-shot	4-shot	Avg.
CSN	bacc	0.339	0.588	0.656	0.528
SlowFast	bacc	0.578	0.680	0.728	0.662
Human-only	bacc	0.744	0.694	0.706	0.715
Ours	bacc	0.739	0.718	0.713	0.723
Ours	roc_auc	0.768	0.737	0.762	0.756

Ablation Study¶

Configuration	avg bacc	avg roc_auc	Note
Base (Human-only)	0.715	0.749	Human-data-only baseline
+H (unlabeled human)	0.716	0.742	Marginal gain from unlabeled human data
+R(Y) (seizure rodents)	0.696	0.733	Performance drops with seizure rodents only
+R(Y/N) (all rodents)	0.697	0.750	Mixed rodent data
+R(Y/N)+H (full)	0.723	0.756	Full cross-species combination is optimal

Key Findings¶

Cross-species transfer learning is effective: the full cross-species configuration achieves the best performance across all averaged metrics.
The optimal masking ratio is 0.3, substantially lower than the standard VideoMAE setting (0.75–0.9), as seizure forecasting requires preserving richer spatiotemporal context.
Using seizure rodent data alone (+R(Y)) degrades performance, highlighting the regularization role of normal behavioral data.

Highlights & Insights¶

This work is the first to define a purely vision-based epileptic seizure forecasting task (predicting seizure occurrence within 5 seconds using a 3–10 second window), representing a clinically pioneering contribution.
The cross-species transfer learning paradigm is novel, exploiting pathological consistency between rodent and human epilepsy for knowledge transfer.
The finding of a low optimal masking ratio (0.3) reveals a fundamental difference in information density between medical and natural videos.
The performance degradation when pre-training on seizure samples alone underscores the importance of normal behavior as a contrastive baseline.

Limitations & Future Work¶

The evaluation set comprises only 40 video sequences, limiting statistical power.
A fixed 5-second prediction window is used; varying prediction horizons are not explored.
The method is purely visual, without incorporating audio or wearable device signals.
The theoretical basis of cross-species transfer — specifically which behavioral patterns are truly transferable — lacks in-depth analysis.
Clinical deployment requires validation on larger-scale, more diverse longitudinal datasets.

VideoMAE, as a powerful foundation model for video self-supervised learning, warrants further exploration in medical domains.
The RodEpil dataset provides a new data resource for cross-species learning research.
The few-shot learning paradigm is well-suited to data-scarce medical settings, though broader sample diversity is needed to validate robustness.
The cross-species consistency hypothesis may extend beyond epilepsy to other neurological and behavioral disorders, warranting generalizability studies.
Complementary fusion with EEG-based methods may represent an important future direction.

Rating¶

Novelty: ⭐⭐⭐⭐ First formulation of video-based seizure forecasting; cross-species transfer paradigm is highly original.
Experimental Thoroughness: ⭐⭐⭐ Ablation design is well-conceived, but the dataset size (40 videos) is limited and statistical significance is questionable.
Writing Quality: ⭐⭐⭐⭐ Problem definition is clear and the framework is described thoroughly.
Value: ⭐⭐⭐⭐ Opens a new direction for non-invasive seizure warning systems with significant clinical potential.

Additional Remarks¶

The core assumption underlying cross-species transfer learning — that pre-ictal behavioral prodromes share commonalities across species — has partial support in the neuroscience literature. Although validated at limited scale, this work opens new avenues for large-scale follow-up studies. Future integration of multimodal signals (video + HRV + audio) is expected to further improve forecasting performance.