Déjà Vu? Decoding Repeated Reading from Eye Movements¶
Conference: ACL 2025
arXiv: 2502.11061
Code: None
Area: NLP Understanding / Cognitive Science
Keywords: Eye Tracking, Repeated Reading, Reading Behavior Decoding, Cognitive Models, Predictive Modeling
TL;DR¶
This work introduces for the first time the task of automatically decoding whether a reader has previously read a text based on their eye movement patterns. Using feature-based XGBoost and neural RoBERTEye models, it achieves ~70% accuracy in single-trial experiments and ~91% in pairwise trials. It also incorporates synthetic saccadic pathways generated by the E-Z Reader cognitive model as auxiliary reference signals to enhance predictions.
Background & Motivation¶
Background: In daily life, individuals frequently reread the same text for revision, close reading, or enjoyment. Psycholinguistic research demonstrates that eye movement patterns during repeated reading systematically differ from those during first-time reading: readers show faster reading speeds, fewer fixations, shorter fixation durations, higher skip rates, and reduced regressions, reflecting cognitive facilitation effects of memory.
Limitations of Prior Work: Existing research is mostly limited to descriptive analyses of aggregate effects (averages across texts and participants) and cannot answer the more granular question of "whether a specific reader has read a specific text." There is a lack of predictive modeling studies, and no publicly available datasets support this task.
Key Challenge: Descriptive statistics identify population-level trends, whereas practical applications (e.g., personalized education, reading assistance) require individual-level classification capability. A significant methodological gap exists between population trends and individual-level prediction.
Goal: (1) Define the "repeated reading decoding" predictive task and its two variants; (2) Develop effective predictive models; (3) Leverage cognitive model synthetic data to enhance predictions; (4) Analyze model behavior to uncover the role of memory in repeated reading.
Key Insight: Leveraging the OneStop Eye Movements dataset (the first public dataset containing both first-time and repeated reading eye-gaze records), this work formalizes the problem as a binary classification task, combining psycholinguistic feature engineering with modern multimodal neural networks.
Core Idea: The memory effect in eye movements is utilized as a predictable signal. Machine learning models are trained to decode whether a reader has previously read a text from their eye-gaze trajectories, utilizing synthetic first-time reading data generated by a cognitive model as a reference signal to enhance predictions.
Method¶
Overall Architecture¶
The task is defined via two variants: (1) Single-trial task—given a single eye-tracking record of a participant reading a text, determine if it is a first-time or repeated reading; (2) Pairwise-trial task—given two eye-tracking records of the same participant reading the same text (with unknown order), determine which is the first-time and which is the repeated reading. The model takes text and eye-movement features as input and outputs classification probabilities for the reading type.
Key Designs¶
-
多层次特征表示(Feature-Based Approach):
- Function: Extracts psycholinguistically driven features from eye-movement trajectories for XGBoost classification.
- Mechanism: Designs a 35-dimensional global feature vector containing three categories of features: (a) 8 standard eye-tracking metrics (total fixation duration, first fixation duration, gaze duration, fixation count, skip rate, regression rate, etc.); (b) 20 lexical attribute coefficients—fitting a participant's speed-normalized eye-movement metrics to word frequency, surprisal, and word length via linear models to capture the reduced sensitivity to linguistic attributes during repeated reading; (c) 7 saccadic network features—constructing the eye-gaze trajectory as a directed graph and extracting graph-theoretical features such as connectivity, centrality, and clustering.
- Design Motivation: Features are designed directly based on known first-time/repeated reading differences in the psycholinguistic literature to ensure interpretability and a solid theoretical foundation.
-
RoBERTEye 多模态神经模型:
- Function: Synthesizes textual semantic information and eye-movement features for end-to-end prediction.
- Mechanism: Based on a RoBERTa extension, word-level or fixation-level eye-movement feature vectors are projected into the language model's embedding space and concatenated with the word embedding sequence before being input into the Transformer. Two variants are proposed: RoBERTEye-Words (using 13-dimensional word-level features) and RoBERTEye-Fixations (using a concatenation of 6-dimensional fixation-level features and word-level features). Special tokens are used to distinguish text embeddings from eye-movement embeddings.
- Design Motivation: Leverages the text comprehension capabilities of pretrained language models, enabling the system to discover interaction patterns between textual content and eye-movement behaviors.
-
E-Z Reader 合成扫视路径增强:
- Function: Generates synthetic "typical first-time reading" eye-gaze trajectories as an auxiliary reference signal.
- Mechanism: Employs the E-Z Reader cognitive model to generate 1,000 synthetic saccadic pathways for each text, and averages them to serve as a first-time reading reference. The differences between human features and synthetic features are used as auxiliary inputs: the global and word-level representations concatenate the human features with the differences, while the fixation-level representations utilize sequence-wise concatenation with a third special token to distinguish them. Validation shows that the E-Z Reader output is significantly closer to human first-time reading in terms of fixation count and skip rate (\(p<0.001\)).
- Design Motivation: Since existing cognitive models can only simulate first-time reading, they serve as an ideal baseline—the more human eye movements deviate from the synthetic first-time reading baseline, the more likely the reading is repeated.
Loss & Training¶
10-fold cross-validation is used, with data splits balanced across three evaluation regimes (unseen participants, unseen texts, and unseen both) as well as continuous vs. non-continuous repeated readings. Neural models are trained using PyTorch Lightning on an L40S-48GB GPU, while standard hyperparameter search is conducted for XGBoost.
Key Experimental Results¶
Main Results¶
| Task | Model | Unseen Text | Unseen Participant | Unseen Both | All |
|---|---|---|---|---|---|
| Single-trial | Reading Speed Baseline | 66.9 | 67.1 | 66.8 | 66.6 |
| Single-trial | XGBoost | 69.5 | 70.7 | 68.7 | 69.6 |
| Single-trial | XGBoost+E-Z | 70.1 | 71.2 | 69.3 | 70.2 |
| Single-trial | RoBERTEye-Words+E-Z | 70.3 | 69.7 | 69.7 | 69.9 |
| Pairwise-trial | Reading Speed Baseline | 88.0 | 88.1 | 87.2 | 87.7 |
| Pairwise-trial | XGBoost | 91.5 | 92.2 | 90.6 | 91.4 |
Ablation Study¶
| Configuration | Single-trial All | Pairwise-trial All |
|---|---|---|
| XGBoost (without E-Z) | 69.6 | 91.4 |
| XGBoost + E-Z Reader | 70.2 | - |
| Reading Speed Baseline | 66.6 | 87.7 |
| Random Baseline | 50.0 | 50.0 |
Key Findings¶
- Pairwise-trial accuracy reaches up to 91.4%: When two reading records are provided simultaneously, XGBoost reliably distinguishes between first-time and repeated reading, significantly outperforming the reading speed baseline (87.7%).
- Single-trial tasks remain challenging: Classification based on a single trial achieves ~70% accuracy, which is 20 percentage points higher than random chance, though showing room for improvement.
- Feature-based models outperform neural models: XGBoost significantly outperforms RoBERTEye in pairwise tasks. This is likely because hand-crafted psycholinguistic features capture crucial variances more directly.
- E-Z Reader enhancement is limited but positive: Synthetic references provide significant improvements in some scenarios (\(p<0.05\)), though results are inconsistent.
- Performance attenuates as the experiment progresses: In first-time reading, readers read faster over time (due to practice effects), making it harder for the model to distinguish first-time from repeated reading.
Highlights & Insights¶
- Highly innovative task definition: This study formalizes "reading history decoding" as a predictive task for the first time, carving out a novel direction in cognitive NLP. Such tasks hold direct application prospects in educational technology and personalized content recommendation.
- Cognitive models as synthetic data sources: Employing first-time reading baselines simulated by E-Z Reader is an ingenious strategy—bridging theoretical models from cognitive science with machine learning, which can be extended to other cognitive tasks.
- Model analysis as a scientific instrument: Investigating predictive model behaviors (e.g., changes in accuracy relative to experimental positions) to study cognitive processes demonstrates the scientific value of predictive modeling.
Limitations & Future Work¶
- Data is constrained to English-speaking adult native readers (L1) and laboratory-grade Eyelink 1000 Plus eye trackers (1000 Hz sampling rate), leaving its generalizability still to be validated.
- Repeated reading intervals span at most 10 articles, with longer time lags yet to be evaluated.
- The study only considers two trials of reading, whereas multiple repetitions occur in reality.
- The text domain is limited to news articles, leaving other domains (e.g., academic, literary) unaddressed.
- Future work needs to explore feasibility on low-resolution consumer devices (e.g., laptop/smartphone front cameras).
Related Work & Insights¶
- vs. Traditional Psycholinguistic Research: Conventional studies restrict themselves to group-level descriptive statistical analysis. This work presents the first individual-level predictions, turning psycholinguistic findings into quantifiable signals for machine learning.
- vs. Reading Comprehension Prediction: Previous work used eye movements to predict reading comprehension levels, reading objectives, etc. This work focuses on the entirely novel dimension of "prior exposure" (whether a text has been read).
Rating¶
- Novelty: ⭐⭐⭐⭐⭐ Decodes reading history from eye movements for the first time with a brand-new task definition.
- Experimental Thoroughness: ⭐⭐⭐⭐ Includes multi-model comparisons, ablations, and fine-grained analyses, though constrained by database size.
- Writing Quality: ⭐⭐⭐⭐⭐ Rigorous formalization, exquisite experimental design, and deep analysis.
- Value: ⭐⭐⭐⭐ Significant academic contribution, though further progress requires support for low-cost, consumer-grade hardware.