Déjà Vu? Decoding Repeated Reading from Eye Movements¶

Conference: ACL 2025
arXiv: 2502.11061
Code: None
Area: NLP Understanding / Cognitive Science
Keywords: Eye Tracking, Repeated Reading, Reading Behavior Decoding, Cognitive Models, Predictive Modeling

TL;DR¶

This work introduces for the first time the task of automatically decoding whether a reader has previously read a text based on their eye movement patterns. Using feature-based XGBoost and neural RoBERTEye models, it achieves ~70% accuracy in single-trial experiments and ~91% in pairwise trials. It also incorporates synthetic saccadic pathways generated by the E-Z Reader cognitive model as auxiliary reference signals to enhance predictions.

Background & Motivation¶

Background: In daily life, individuals frequently reread the same text for revision, close reading, or enjoyment. Psycholinguistic research demonstrates that eye movement patterns during repeated reading systematically differ from those during first-time reading: readers show faster reading speeds, fewer fixations, shorter fixation durations, higher skip rates, and reduced regressions, reflecting cognitive facilitation effects of memory.

Limitations of Prior Work: Existing research is mostly limited to descriptive analyses of aggregate effects (averages across texts and participants) and cannot answer the more granular question of "whether a specific reader has read a specific text." There is a lack of predictive modeling studies, and no publicly available datasets support this task.

Key Challenge: Descriptive statistics identify population-level trends, whereas practical applications (e.g., personalized education, reading assistance) require individual-level classification capability. A significant methodological gap exists between population trends and individual-level prediction.

Goal: (1) Define the "repeated reading decoding" predictive task and its two variants; (2) Develop effective predictive models; (3) Leverage cognitive model synthetic data to enhance predictions; (4) Analyze model behavior to uncover the role of memory in repeated reading.

Key Insight: Leveraging the OneStop Eye Movements dataset (the first public dataset containing both first-time and repeated reading eye-gaze records), this work formalizes the problem as a binary classification task, combining psycholinguistic feature engineering with modern multimodal neural networks.

Core Idea: The memory effect in eye movements is utilized as a predictable signal. Machine learning models are trained to decode whether a reader has previously read a text from their eye-gaze trajectories, utilizing synthetic first-time reading data generated by a cognitive model as a reference signal to enhance predictions.

Method¶

Overall Architecture¶

The task is defined via two variants: (1) Single-trial task—given a single eye-tracking record of a participant reading a text, determine if it is a first-time or repeated reading; (2) Pairwise-trial task—given two eye-tracking records of the same participant reading the same text (with unknown order), determine which is the first-time and which is the repeated reading. The model takes text and eye-movement features as input and outputs classification probabilities for the reading type.

Key Designs¶

多层次特征表示（Feature-Based Approach）:
- Function: Extracts psycholinguistically driven features from eye-movement trajectories for XGBoost classification.
- Mechanism: Designs a 35-dimensional global feature vector containing three categories of features: (a) 8 standard eye-tracking metrics (total fixation duration, first fixation duration, gaze duration, fixation count, skip rate, regression rate, etc.); (b) 20 lexical attribute coefficients—fitting a participant's speed-normalized eye-movement metrics to word frequency, surprisal, and word length via linear models to capture the reduced sensitivity to linguistic attributes during repeated reading; (c) 7 saccadic network features—constructing the eye-gaze trajectory as a directed graph and extracting graph-theoretical features such as connectivity, centrality, and clustering.
- Design Motivation: Features are designed directly based on known first-time/repeated reading differences in the psycholinguistic literature to ensure interpretability and a solid theoretical foundation.
RoBERTEye 多模态神经模型:
- Function: Synthesizes textual semantic information and eye-movement features for end-to-end prediction.
- Mechanism: Based on a RoBERTa extension, word-level or fixation-level eye-movement feature vectors are projected into the language model's embedding space and concatenated with the word embedding sequence before being input into the Transformer. Two variants are proposed: RoBERTEye-Words (using 13-dimensional word-level features) and RoBERTEye-Fixations (using a concatenation of 6-dimensional fixation-level features and word-level features). Special tokens are used to distinguish text embeddings from eye-movement embeddings.
- Design Motivation: Leverages the text comprehension capabilities of pretrained language models, enabling the system to discover interaction patterns between textual content and eye-movement behaviors.
E-Z Reader 合成扫视路径增强:
- Function: Generates synthetic "typical first-time reading" eye-gaze trajectories as an auxiliary reference signal.
- Mechanism: Employs the E-Z Reader cognitive model to generate 1,000 synthetic saccadic pathways for each text, and averages them to serve as a first-time reading reference. The differences between human features and synthetic features are used as auxiliary inputs: the global and word-level representations concatenate the human features with the differences, while the fixation-level representations utilize sequence-wise concatenation with a third special token to distinguish them. Validation shows that the E-Z Reader output is significantly closer to human first-time reading in terms of fixation count and skip rate (\(p<0.001\)).
- Design Motivation: Since existing cognitive models can only simulate first-time reading, they serve as an ideal baseline—the more human eye movements deviate from the synthetic first-time reading baseline, the more likely the reading is repeated.

Loss & Training¶

10-fold cross-validation is used, with data splits balanced across three evaluation regimes (unseen participants, unseen texts, and unseen both) as well as continuous vs. non-continuous repeated readings. Neural models are trained using PyTorch Lightning on an L40S-48GB GPU, while standard hyperparameter search is conducted for XGBoost.

Key Experimental Results¶

Main Results¶

Task	Model	Unseen Text	Unseen Participant	Unseen Both	All
Single-trial	Reading Speed Baseline	66.9	67.1	66.8	66.6
Single-trial	XGBoost	69.5	70.7	68.7	69.6
Single-trial	XGBoost+E-Z	70.1	71.2	69.3	70.2
Single-trial	RoBERTEye-Words+E-Z	70.3	69.7	69.7	69.9
Pairwise-trial	Reading Speed Baseline	88.0	88.1	87.2	87.7
Pairwise-trial	XGBoost	91.5	92.2	90.6	91.4

Ablation Study¶

Configuration	Single-trial All	Pairwise-trial All
XGBoost (without E-Z)	69.6	91.4
XGBoost + E-Z Reader	70.2	-
Reading Speed Baseline	66.6	87.7
Random Baseline	50.0	50.0

Key Findings¶

Pairwise-trial accuracy reaches up to 91.4%: When two reading records are provided simultaneously, XGBoost reliably distinguishes between first-time and repeated reading, significantly outperforming the reading speed baseline (87.7%).
Single-trial tasks remain challenging: Classification based on a single trial achieves ~70% accuracy, which is 20 percentage points higher than random chance, though showing room for improvement.
Feature-based models outperform neural models: XGBoost significantly outperforms RoBERTEye in pairwise tasks. This is likely because hand-crafted psycholinguistic features capture crucial variances more directly.
E-Z Reader enhancement is limited but positive: Synthetic references provide significant improvements in some scenarios (\(p<0.05\)), though results are inconsistent.
Performance attenuates as the experiment progresses: In first-time reading, readers read faster over time (due to practice effects), making it harder for the model to distinguish first-time from repeated reading.

Highlights & Insights¶

Highly innovative task definition: This study formalizes "reading history decoding" as a predictive task for the first time, carving out a novel direction in cognitive NLP. Such tasks hold direct application prospects in educational technology and personalized content recommendation.
Cognitive models as synthetic data sources: Employing first-time reading baselines simulated by E-Z Reader is an ingenious strategy—bridging theoretical models from cognitive science with machine learning, which can be extended to other cognitive tasks.
Model analysis as a scientific instrument: Investigating predictive model behaviors (e.g., changes in accuracy relative to experimental positions) to study cognitive processes demonstrates the scientific value of predictive modeling.

Limitations & Future Work¶

Data is constrained to English-speaking adult native readers (L1) and laboratory-grade Eyelink 1000 Plus eye trackers (1000 Hz sampling rate), leaving its generalizability still to be validated.
Repeated reading intervals span at most 10 articles, with longer time lags yet to be evaluated.
The study only considers two trials of reading, whereas multiple repetitions occur in reality.
The text domain is limited to news articles, leaving other domains (e.g., academic, literary) unaddressed.
Future work needs to explore feasibility on low-resolution consumer devices (e.g., laptop/smartphone front cameras).

vs. Traditional Psycholinguistic Research: Conventional studies restrict themselves to group-level descriptive statistical analysis. This work presents the first individual-level predictions, turning psycholinguistic findings into quantifiable signals for machine learning.
vs. Reading Comprehension Prediction: Previous work used eye movements to predict reading comprehension levels, reading objectives, etc. This work focuses on the entirely novel dimension of "prior exposure" (whether a text has been read).

Rating¶

Novelty: ⭐⭐⭐⭐⭐ Decodes reading history from eye movements for the first time with a brand-new task definition.
Experimental Thoroughness: ⭐⭐⭐⭐ Includes multi-model comparisons, ablations, and fine-grained analyses, though constrained by database size.
Writing Quality: ⭐⭐⭐⭐⭐ Rigorous formalization, exquisite experimental design, and deep analysis.
Value: ⭐⭐⭐⭐ Significant academic contribution, though further progress requires support for low-cost, consumer-grade hardware.