Skip to content

Déjà Vu? Decoding Repeated Reading from Eye Movements

Conference: ACL 2025
arXiv: 2502.11061
Code: None
Area: NLP Understanding / Cognitive Science
Keywords: Eye Tracking, Repeated Reading, Reading Behavior Decoding, Cognitive Models, Predictive Modeling

TL;DR

This work introduces for the first time the task of automatically decoding whether a reader has previously read a text based on their eye movement patterns. Using feature-based XGBoost and neural RoBERTEye models, it achieves ~70% accuracy in single-trial experiments and ~91% in pairwise trials. It also incorporates synthetic saccadic pathways generated by the E-Z Reader cognitive model as auxiliary reference signals to enhance predictions.

Background & Motivation

Background: In daily life, individuals frequently reread the same text for revision, close reading, or enjoyment. Psycholinguistic research demonstrates that eye movement patterns during repeated reading systematically differ from those during first-time reading: readers show faster reading speeds, fewer fixations, shorter fixation durations, higher skip rates, and reduced regressions, reflecting cognitive facilitation effects of memory.

Limitations of Prior Work: Existing research is mostly limited to descriptive analyses of aggregate effects (averages across texts and participants) and cannot answer the more granular question of "whether a specific reader has read a specific text." There is a lack of predictive modeling studies, and no publicly available datasets support this task.

Key Challenge: Descriptive statistics identify population-level trends, whereas practical applications (e.g., personalized education, reading assistance) require individual-level classification capability. A significant methodological gap exists between population trends and individual-level prediction.

Goal: (1) Define the "repeated reading decoding" predictive task and its two variants; (2) Develop effective predictive models; (3) Leverage cognitive model synthetic data to enhance predictions; (4) Analyze model behavior to uncover the role of memory in repeated reading.

Key Insight: Leveraging the OneStop Eye Movements dataset (the first public dataset containing both first-time and repeated reading eye-gaze records), this work formalizes the problem as a binary classification task, combining psycholinguistic feature engineering with modern multimodal neural networks.

Core Idea: The memory effect in eye movements is utilized as a predictable signal. Machine learning models are trained to decode whether a reader has previously read a text from their eye-gaze trajectories, utilizing synthetic first-time reading data generated by a cognitive model as a reference signal to enhance predictions.

Method

Overall Architecture

The task is defined via two variants: (1) Single-trial task—given a single eye-tracking record of a participant reading a text, determine if it is a first-time or repeated reading; (2) Pairwise-trial task—given two eye-tracking records of the same participant reading the same text (with unknown order), determine which is the first-time and which is the repeated reading. The model takes text and eye-movement features as input and outputs classification probabilities for the reading type.

Key Designs

  1. 多层次特征表示(Feature-Based Approach):

    • Function: Extracts psycholinguistically driven features from eye-movement trajectories for XGBoost classification.
    • Mechanism: Designs a 35-dimensional global feature vector containing three categories of features: (a) 8 standard eye-tracking metrics (total fixation duration, first fixation duration, gaze duration, fixation count, skip rate, regression rate, etc.); (b) 20 lexical attribute coefficients—fitting a participant's speed-normalized eye-movement metrics to word frequency, surprisal, and word length via linear models to capture the reduced sensitivity to linguistic attributes during repeated reading; (c) 7 saccadic network features—constructing the eye-gaze trajectory as a directed graph and extracting graph-theoretical features such as connectivity, centrality, and clustering.
    • Design Motivation: Features are designed directly based on known first-time/repeated reading differences in the psycholinguistic literature to ensure interpretability and a solid theoretical foundation.
  2. RoBERTEye 多模态神经模型:

    • Function: Synthesizes textual semantic information and eye-movement features for end-to-end prediction.
    • Mechanism: Based on a RoBERTa extension, word-level or fixation-level eye-movement feature vectors are projected into the language model's embedding space and concatenated with the word embedding sequence before being input into the Transformer. Two variants are proposed: RoBERTEye-Words (using 13-dimensional word-level features) and RoBERTEye-Fixations (using a concatenation of 6-dimensional fixation-level features and word-level features). Special tokens are used to distinguish text embeddings from eye-movement embeddings.
    • Design Motivation: Leverages the text comprehension capabilities of pretrained language models, enabling the system to discover interaction patterns between textual content and eye-movement behaviors.
  3. E-Z Reader 合成扫视路径增强:

    • Function: Generates synthetic "typical first-time reading" eye-gaze trajectories as an auxiliary reference signal.
    • Mechanism: Employs the E-Z Reader cognitive model to generate 1,000 synthetic saccadic pathways for each text, and averages them to serve as a first-time reading reference. The differences between human features and synthetic features are used as auxiliary inputs: the global and word-level representations concatenate the human features with the differences, while the fixation-level representations utilize sequence-wise concatenation with a third special token to distinguish them. Validation shows that the E-Z Reader output is significantly closer to human first-time reading in terms of fixation count and skip rate (\(p<0.001\)).
    • Design Motivation: Since existing cognitive models can only simulate first-time reading, they serve as an ideal baseline—the more human eye movements deviate from the synthetic first-time reading baseline, the more likely the reading is repeated.

Loss & Training

10-fold cross-validation is used, with data splits balanced across three evaluation regimes (unseen participants, unseen texts, and unseen both) as well as continuous vs. non-continuous repeated readings. Neural models are trained using PyTorch Lightning on an L40S-48GB GPU, while standard hyperparameter search is conducted for XGBoost.

Key Experimental Results

Main Results

Task Model Unseen Text Unseen Participant Unseen Both All
Single-trial Reading Speed Baseline 66.9 67.1 66.8 66.6
Single-trial XGBoost 69.5 70.7 68.7 69.6
Single-trial XGBoost+E-Z 70.1 71.2 69.3 70.2
Single-trial RoBERTEye-Words+E-Z 70.3 69.7 69.7 69.9
Pairwise-trial Reading Speed Baseline 88.0 88.1 87.2 87.7
Pairwise-trial XGBoost 91.5 92.2 90.6 91.4

Ablation Study

Configuration Single-trial All Pairwise-trial All
XGBoost (without E-Z) 69.6 91.4
XGBoost + E-Z Reader 70.2 -
Reading Speed Baseline 66.6 87.7
Random Baseline 50.0 50.0

Key Findings

  • Pairwise-trial accuracy reaches up to 91.4%: When two reading records are provided simultaneously, XGBoost reliably distinguishes between first-time and repeated reading, significantly outperforming the reading speed baseline (87.7%).
  • Single-trial tasks remain challenging: Classification based on a single trial achieves ~70% accuracy, which is 20 percentage points higher than random chance, though showing room for improvement.
  • Feature-based models outperform neural models: XGBoost significantly outperforms RoBERTEye in pairwise tasks. This is likely because hand-crafted psycholinguistic features capture crucial variances more directly.
  • E-Z Reader enhancement is limited but positive: Synthetic references provide significant improvements in some scenarios (\(p<0.05\)), though results are inconsistent.
  • Performance attenuates as the experiment progresses: In first-time reading, readers read faster over time (due to practice effects), making it harder for the model to distinguish first-time from repeated reading.

Highlights & Insights

  • Highly innovative task definition: This study formalizes "reading history decoding" as a predictive task for the first time, carving out a novel direction in cognitive NLP. Such tasks hold direct application prospects in educational technology and personalized content recommendation.
  • Cognitive models as synthetic data sources: Employing first-time reading baselines simulated by E-Z Reader is an ingenious strategy—bridging theoretical models from cognitive science with machine learning, which can be extended to other cognitive tasks.
  • Model analysis as a scientific instrument: Investigating predictive model behaviors (e.g., changes in accuracy relative to experimental positions) to study cognitive processes demonstrates the scientific value of predictive modeling.

Limitations & Future Work

  • Data is constrained to English-speaking adult native readers (L1) and laboratory-grade Eyelink 1000 Plus eye trackers (1000 Hz sampling rate), leaving its generalizability still to be validated.
  • Repeated reading intervals span at most 10 articles, with longer time lags yet to be evaluated.
  • The study only considers two trials of reading, whereas multiple repetitions occur in reality.
  • The text domain is limited to news articles, leaving other domains (e.g., academic, literary) unaddressed.
  • Future work needs to explore feasibility on low-resolution consumer devices (e.g., laptop/smartphone front cameras).
  • vs. Traditional Psycholinguistic Research: Conventional studies restrict themselves to group-level descriptive statistical analysis. This work presents the first individual-level predictions, turning psycholinguistic findings into quantifiable signals for machine learning.
  • vs. Reading Comprehension Prediction: Previous work used eye movements to predict reading comprehension levels, reading objectives, etc. This work focuses on the entirely novel dimension of "prior exposure" (whether a text has been read).

Rating

  • Novelty: ⭐⭐⭐⭐⭐ Decodes reading history from eye movements for the first time with a brand-new task definition.
  • Experimental Thoroughness: ⭐⭐⭐⭐ Includes multi-model comparisons, ablations, and fine-grained analyses, though constrained by database size.
  • Writing Quality: ⭐⭐⭐⭐⭐ Rigorous formalization, exquisite experimental design, and deep analysis.
  • Value: ⭐⭐⭐⭐ Significant academic contribution, though further progress requires support for low-cost, consumer-grade hardware.