Skip to content

Language Models Grow Less Humanlike beyond Phase Transition

Conference: ACL 2025
arXiv: 2502.18802
Code: None
Area: LLM NLP / Cognitive Science
Keywords: Pre-training phase transition, psychometric predictive power, attention heads, human reading behavior, LM cognitive alignment

TL;DR

This paper finds that the alignment of language models with human reading behavior (PPP) during pre-training undergoes an inflection point where it first increases and then decreases. Through correlation and causal experiments, it demonstrates that this inflection point is caused by a phase transition (the rapid emergence of specialized attention heads) during pre-training. Furthermore, this phase transition does not directly produce harmful attention patterns; rather, it alters the model's subsequent learning dynamics, causing a continuous deviation from human patterns.

Background & Motivation

Background: Psycholinguistic research demonstrates that the surprisal of language models (the negative log probability of each word in a sentence) can predict human reading behavior metrics (such as fixation duration, self-paced reading time, etc.). This capability is referred to as psychometric predictive power (PPP). PPP typically improves with the number of training steps in the early stages of pre-training, as models gradually learn to better encode linguistic statistical regularities.

Limitations of Prior Work: The improvement of PPP has an inflection point—after reaching a peak, it either plateaus or begins to decline. This inflection point phenomenon has been independently observed by multiple studies, but no unified explanation has been established. Existing theories include the word frequency effect (over-focusing on low-frequency words), recency bias in attention, and context window size, but none of these theories adequately explain why the inflection point exists and its relationship with pre-training dynamics.

Key Challenge: Intuitively, better language models (lower perplexity) should better predict human behavior. In practice, however, more training can actually lead to a decline in PPP—why do models become "smarter" but "less humanlike"?

Goal: To identify the root cause of the PPP inflection point and explain its relationship with pre-training dynamics.

Key Insight: The authors hypothesize that the PPP inflection point is related to phase transitions in pre-training. A phase transition refers to the sudden emergence of certain capabilities during training, manifested as the rapid appearance of specialized attention heads (such as induction heads). These heads specialize in capabilities like in-context learning, and their emergence may alter the way the model encodes linguistic statistical regularities.

Core Idea: The pre-training phase transition (the rapid emergence of specialized attention heads) is the root cause of the PPP inflection point. Instead of directly harming PPP by generating "bad" attention patterns, the phase transition alters the model's subsequent learning dynamics, leading to a continuous deviation from human reading behavior patterns during continued training.

Method

Overall Architecture

The study is conducted in three stages: (1) Correlation analysis—measuring PPP and phase transition metrics across multiple pre-training checkpoints to verify their temporal correspondence; (2) Causal experiments—verifying that the phase transition indeed causes the decline in PPP through ablation and replacement of specific attention heads; (3) Mechanism analysis—distinguishing between the direct effects of the phase transition and its indirect effects on subsequent learning dynamics.

Key Designs

  1. PPP Measurement Method:

    • Function: To quantify the predictive power of model surprisal on human reading behavior.
    • Mechanism: Using multiple human reading behavior datasets (eye-tracking, self-paced reading), word-by-word surprisal generated by the model at various checkpoints is used as a regression feature to predict human reading time. The \(\Delta\)LogLik (log-likelihood improvement after adding surprisal) is used as the PPP metric. PPP curves are computed across multiple intermediate checkpoints of GPT-2-level models.
    • Design Motivation: PPP is a standard bridging metric connecting language models and cognitive science, and its inflection point behavior has been repeatedly observed but remains unexplained.
  2. Phase Transition Detection Method:

    • Function: To locate the temporal point of the emergence of specialized attention heads during pre-training.
    • Mechanism: Tracking the quantity and strength of induction heads across checkpoints. Induction heads are a specialized attention head pattern responsible for performing in-context copying (predicting "B" when seeing the sequence "A B ... A"). They are detected by testing attention head matching scores on synthetic repetitive sequences. When the number of induction heads increases sharply within a short period, it is marked as the phase transition point.
    • Design Motivation: Olsson et al. (2022) observed that the sudden emergence of induction heads in Transformer training constitutes a phase transition. The present paper hypothesizes that this phase transition simultaneously triggers the PPP inflection point.
  3. Causal Intervention Experiment:

    • Function: To progress from correlation to causality.
    • Mechanism: Designing three sets of experiments: (a) ablating specialized attention heads at post-phase-transition checkpoints to observe if PPP recovers; (b) replacing post-phase-transition heads with corresponding heads from pre-phase-transition checkpoints to test if the phase transition's impact can be "reversed"; (c) comparing the PPP trajectory of the ablated model during continued training with that of normal training to distinguish direct effects from indirect effects (changes to subsequent learning dynamics).
    • Design Motivation: Correlation analysis can only show that the phase transition and the PPP inflection point occur simultaneously; causal experiments are required to confirm that the former causes the latter.

Loss & Training

This is an analytical work and does not involve new training strategies. Multiple intermediate checkpoints of pre-trained GPT-2-level models are used for analysis.

Key Experimental Results

Main Results (Correlation Analysis)

The temporal alignment between the PPP inflection point and the phase transition point:

Dataset PPP Inflection Point (Steps) Phase Transition Start (Steps) Phase Transition End (Steps) Temporal Correspondence
Dundee (eye-tracking) ~3k-5k ~3k ~5k Inflection point during phase transition
Natural Stories ~3k-5k ~3k ~5k Inflection point during phase transition
Brown Corpus ~3k-5k ~3k ~5k Inflection point during phase transition

Ablation Study (Causal Verification)

Intervention Method PPP Change Description
No intervention (baseline) Continuous decline after phase transition PPP steadily declines after inflection point
Ablation of induction heads Slight recovery Direct effect is relatively small
Replacement with pre-phase-transition heads Slight recovery Confirms phase transition is a necessary condition
Continued training after ablation Decline in PPP slows down Indirect effects are the primary cause

Key Findings

  • Phase transition and PPP inflection point are highly aligned temporally: This is consistent across multiple datasets and reading behavior metrics. During the phase transition, the number of induction heads increases sharply, while PPP simultaneously peaks and then begins to decline.
  • The indirect effects of the phase transition are far greater than the direct effects: Ablating attention heads that emerge after the phase transition yields only a slight recovery in PPP. However, if bifurcated training is conducted before the phase transition (preventing it from occurring), the subsequent PPP trajectory is significantly different. This indicates that the phase transition does not directly impair PPP by producing "bad" attention patterns, but rather shifts the overall learning dynamics of the model—after the phase transition, the model enters a new optimization trajectory, where continued training steadily deviates from human reading patterns.
  • Mechanistic explanation of "stronger models are less humanlike": The phase transition equips the model with stronger in-context learning capabilities (lowering perplexity), but the process of learning this capability alters the internal representations of the model, inducing a systematic deviation in encoding linguistic information compared to how the human brain processes language.

Highlights & Insights

  • Elegant Causal Inference Chain: Progressing from observation (PPP inflection point) \(\rightarrow\) hypothesis (caused by phase transition) \(\rightarrow\) correlational verification (temporal alignment) \(\rightarrow\) causal verification (ablation experiments) \(\rightarrow\) distinction of mechanisms (direct vs. indirect effects). The logical chain is complete and convincing.
  • Insight into "Phase Transition Alters Learning Dynamics": Rather than blaming specific structures produced by the phase transition, the study highlights how the phase transition alters the subsequent learning path of the model. This finding has broader implications for understanding LLM training dynamics—the emergence of capabilities may be accompanied by systematic changes in other dimensions.
  • Interdisciplinary Value of Cognitive Science & AI: To align LLMs with human cognition, understanding when and why they deviate from humans during training is a crucial first step. This work suggests that scaling up training may naturally lead to a divergence from human cognition.

Limitations & Future Work

  • Limited Model Scale: Experiments are only validated on GPT-2-level models (~100M parameters). Larger models (e.g., GPT-3/4 level) might exhibit different phase transition dynamics—their phase transitions may occur at different training stages, or multiple phase transition points may exist.
  • Limitations of Causal Experiments: Ablating attention heads is a coarse-grained intervention. Post-phase-transition changes in internal representations may be distributed, which cannot be fully captured by ablating specific components.
  • Limitations of the PPP Metric: PPP only captures certain aspects of human reading behavior (e.g., fixation duration) and does not represent the entirety of human language comprehension. The model's behavior on other cognitive metrics (e.g., N400 ERP) may differ.
  • No Mitigation Strategies Provided: The paper explains why PPP declines but does not propose methods to maintain PPP. Future directions could explore: using human reading data as an auxiliary training objective after phase transition, adjusting learning rate strategies to mitigate the impact of the phase transition, or designing curricula that preserve cognitive alignment.
  • vs Olsson et al. (2022): Olsson et al. discovered the phase transition phenomenon of induction heads but did not connect it to cognitive science metrics. This paper demonstrates a "side effect" of this phase transition—while obtaining in-context learning capabilities, the model deviates from human cognition.
  • vs Oh & Schuler (2023): Oh & Schuler observed the PPP inflection point and proposed the word frequency effect hypothesis. The current paper provides a deeper, unified explanation—the phase transition alters learning dynamics, and the word frequency effect may simply be one of its manifestations.
  • vs Scaling Laws: Traditional scaling laws suggest that larger models = better language modeling = better everything. This paper highlights a counterexample dimension: in terms of "processing language like humans," stronger models can perform worse.

Rating

  • Novelty: ⭐⭐⭐⭐⭐ First to connect pre-training phase transition with cognitive science metrics, and the discovery of "indirect effects" is highly profound.
  • Experimental Thoroughness: ⭐⭐⭐⭐ Rigorous correlation + causal experiment design with validation across multiple datasets, though limited by model scale.
  • Writing Quality: ⭐⭐⭐⭐ Clear chain of reasoning, but limitations in accessing details due to the unavailable HTML version.
  • Value: ⭐⭐⭐⭐⭐ Highly meaningful for understanding LLM training dynamics and cognitive alignment, representing a major interdisciplinary contribution.