Geometry of Decision Making in Language Models¶

Conference: NeurIPS 2025 arXiv: 2511.20315 Code: None Area: Model Compression Keywords: Intrinsic Dimension, Hidden Representation Geometry, Decision Dynamics, Multiple-Choice QA, Transformer

TL;DR¶

By measuring the Intrinsic Dimension (ID) of hidden representations across layers in 28 open-source Transformer models at scale, this paper reveals a consistent "low–high–low" pattern: early layers operate on low-dimensional manifolds, middle layers expand the representational space, and later layers re-compress into low-dimensional representations aligned with decision-making.

Background & Motivation¶

Core Problem¶

Large language models (LLMs) exhibit strong generalization across diverse tasks, yet the internal decision-making process—how a model progresses from input to prediction—remains opaque. Prior work has studied internal mechanisms through the lens of attention analysis and probing classifiers, but the geometric structure of hidden representations has received comparatively little attention.

Intrinsic Dimension (ID)¶

Intrinsic dimension is a statistic that measures the true dimensionality of the manifold on which a set of high-dimensional data points lies. Intuitively, even when a hidden layer has \(d = 4096\) dimensions, the representation vectors may effectively concentrate on a submanifold of far lower dimensionality. ID can reveal the degree to which each layer compresses or expands information.

Why the MCQA Setting¶

Multiple-choice question answering (MCQA) provides a well-defined decision structure: the model must select the correct answer from a fixed set of options. This enables researchers to: - Quantify each layer's contribution to the final decision via layer-wise accuracy - Correlate ID variation with decision quality - Control experimental variables and avoid the uncertainty inherent in open-ended generation tasks

Method¶

Overall Architecture¶

The experimental pipeline proceeds as follows: 1. Select 28 open-source Transformer models spanning different architectures and parameter scales 2. Feed test data through each model on MCQA tasks 3. Extract hidden representations at each layer 4. Compute the intrinsic dimension at each layer using multiple ID estimators 5. Simultaneously compute per-layer MCQA accuracy by performing classification directly on each layer's output 6. Analyze the relationship between ID and layer-wise performance

Key Designs¶

ID Estimation Methods¶

Multiple ID estimators are employed to ensure the robustness of the conclusions:

Estimator	Type	Principle
TwoNN	Local	Based on nearest-neighbor distance ratios
MLE (Levina–Bickel)	Local	Maximum likelihood estimation
PCA (explained variance)	Global	Proportion of variance explained
Other topological methods	Hybrid	Based on persistent homology, etc.

Using multiple estimators mitigates the bias of any single method and strengthens the credibility of the findings.

Layer-wise Performance Quantification¶

For each layer \(l\), the hidden representation \(h^{(l)}\) is used directly for prediction: - Representational similarity between answer options is computed, or a linear probe is applied - The resulting per-layer MCQA accuracy \(\text{Acc}^{(l)}\) is recorded - The correspondence between \(\text{ID}^{(l)}\) and \(\text{Acc}^{(l)}\) is established

Loss & Training¶

This paper does not involve training new models. All analyses are conducted on existing pretrained models, constituting an analytical study.

Key Experimental Results¶

Main Results: ID Variation Pattern¶

Across all 28 models, the following three-stage pattern is consistently observed:

Layer Range	ID Behavior	Interpretation
Early layers (0–20% depth)	Low ID	Input embeddings lie on a low-dimensional manifold; initial encoding is compact
Middle layers (20–70% depth)	ID rises to peak	Spatial expansion; model explores rich representations
Late layers (70–100% depth)	ID decreases again	Compression into a low-dimensional structure aligned with decision-making

Model Category	Representative Models	Early ID	Peak ID	Final ID
Small (~1B)	Pythia-1B, GPT-Neo-1.3B	~10–20	~40–60	~15–25
Medium (~7B)	LLaMA-2-7B, Mistral-7B	~15–30	~80–120	~20–40
Large (~13B+)	LLaMA-2-13B, Falcon-40B	~20–40	~100–150	~30–50

Ablation Study¶

Relationship Between ID and Layer-wise Performance¶

Layer Range	Mean MCQA Accuracy	ID Trend	Relationship
Early layers	Near random (~25%)	Low ID	Information not yet integrated
Middle-to-late layers	Rapid increase	ID begins to decline	Decision formation begins
Final layers	Highest	Low ID	Decision compressed into low-dimensional representation

Key finding: ID decline and accuracy improvement are highly correlated, indicating that the model compresses representations onto a low-dimensional manifold precisely when arriving at a decision.

Consistency Across Estimators¶

Estimator Pair	Spearman Rank Correlation
TwoNN vs. MLE	> 0.95
TwoNN vs. PCA	> 0.90
MLE vs. PCA	> 0.88

The high agreement across estimators validates the robustness of the conclusions.

Key Findings¶

Universal "low–high–low" ID pattern: Consistently observed across all 28 models and multiple ID estimators; this constitutes an architecture- and scale-agnostic property.
ID compression co-occurs with decision formation: The sharp ID decline in the final layers coincides with a rapid rise in MCQA accuracy, suggesting that late layers project representations onto a structured low-dimensional manifold aligned with task-relevant decisions.
Effect of model scale: Larger models tend to exhibit higher peak IDs, indicating richer representational spaces in the middle layers, while still ultimately compressing to a relatively low-dimensional decision manifold.

Highlights & Insights¶

Novelty of the geometric perspective: Unlike probing or attention analysis, ID analysis provides a more fundamental, task-agnostic geometric measure
Large-scale validation across 28 models: Coverage spans multiple architectures and scales including Pythia, LLaMA, Mistral, Falcon, and GPT-Neo
Support for "representation learning as dimensionality selection": The results suggest that LLM training can be understood as identifying the correct low-dimensional manifold within a high-dimensional space
Implications for layer pruning and early exit: If late layers are primarily performing dimensionality compression, more efficient alternatives for achieving this step may exist

Limitations & Future Work¶

Validation is limited to the MCQA setting; whether the same ID patterns hold for open-ended generation tasks remains unexamined
ID estimation under limited sample sizes introduces statistical noise, particularly for extremely high-dimensional representations
The effect of fine-tuning or RLHF on ID patterns has not been explored
No quantitative comparison with probing accuracy or information bottleneck theory is provided
Causal analysis is absent; it remains unclear whether ID variation causes decision formation or is merely a byproduct

Ansuini et al. (2019): First systematic study of ID variation patterns in deep networks
Cai et al. (2023): Analysis of ID in Vision Transformers
Information Bottleneck Theory (Shwartz-Ziv & Tishby, 2017): Posits that deep learning proceeds through "fitting" and "compression" phases, consistent with the ID patterns reported in this work
Mechanistic Interpretability: Elhage et al., Olsson et al., and others analyze Transformers through the lens of circuits
This paper complements the mechanistic understanding of LLM internals from a geometric perspective

Rating¶

Novelty: ⭐⭐⭐⭐ — Large-scale ID analysis in LLMs represents a new direction
Technical Depth: ⭐⭐⭐ — Experimental design is solid, though the methodology itself is relatively straightforward
Practicality: ⭐⭐⭐ — Analytical in nature, with implications for model compression and interpretability
Clarity: ⭐⭐⭐⭐ — Conclusions are intuitive and clearly presented
Overall Score: 7.5/10