On the Effect of Uncertainty on Layer-wise Inference Dynamics¶

Conference: ICML 2025 (Actionable Interpretability Workshop)
arXiv: 2507.06722
Code: None
Area: Interpretability / LLM
Keywords: uncertainty, layer-wise dynamics, Tuned Lens, interpretability, hallucination detection

TL;DR¶

Using Tuned Lens, this work systematically analyzes the layer-wise token probability evolution trajectories of 5 LLMs on 11 datasets. It reveals that the layer-wise inference dynamics of certain and uncertain predictions are highly aligned (sudden jumps in confidence occur at similar layers). This indicates that uncertainty does not affect the structural dynamics of model inference, which challenges the feasibility of detecting uncertainty through simple intermediate-layer features.

Background & Motivation¶

Background: Understanding how LLMs internally represent and process predictions is one of the core questions in current interpretability research. Existing studies suggest that models can encode uncertainty information within their hidden states. This has spawned a series of methods (such as probing classifiers, early exit strategies, etc.) to detect hallucinations and uncertainty based on intermediate-layer features.

Limitations of Prior Work: Although it is known where the model encodes uncertainty, little is understood about how uncertainty affects the model's processing flow. Existing uncertainty detection methods implicitly assume that certain and uncertain predictions exhibit different processing patterns across layers (such as uncertain predictions showing more "hesitation" in intermediate layers). However, this assumption has never been systematically verified.

Key Challenge: If models employ the same inference dynamics (i.e., the same layer-wise processing paradigm) for both certain and uncertain inputs, any simple method trying to distinguish them using intermediate-layer statistics will face fundamental difficulties.

Goal: To systematically verify whether uncertainty affects the layer-wise inference dynamics of LLMs—specifically, whether the layer-wise probability trajectories of certain and uncertain predictions exhibit distinct patterns.

Key Insight: Utilize Tuned Lens (an improved version of Logit Lens) to project the hidden state of each layer into the vocabulary space to obtain the probability distribution, thereby tracking the layer-wise probability evolution trajectory of the final predicted token. Use "correct/incorrect prediction" as a proxy indicator for certainty/uncertainty.

Core Idea: By comparing the layer-wise probability trajectories of certain and uncertain predictions, the authors find that they are highly aligned, indicating that uncertainty does not alter the structure of inference dynamics.

Method¶

Overall Architecture¶

A systematic analysis is conducted on 5 LLMs of varying scales (covering different architectures and parameter sizes) across 11 datasets that span different task types. The specific pipeline is as follows: (1) For each (model, dataset) pair, Tuned Lens is used to calculate the probability of the final predicted token at each layer; (2) All samples are split into two groups based on whether the prediction is correct or incorrect (correct = certain, incorrect = high epistemic uncertainty); (3) The average probability at each layer is computed for both groups, and the trajectory curves are plotted; (4) The trajectories are compared across various characteristics such as shape, the position of sudden confidence jumps, and convergence patterns.

Key Designs¶

Tuned Lens as a Layer-wise Probability Probe:
- Function: Projects the hidden state of each layer into the vocabulary space to obtain the probability estimation of that layer for the final token.
- Mechanism: Logit Lens directly projects intermediate hidden states using the unembedding matrix of the final layer. However, because the representation distribution of the intermediate layers differs substantially from that of the final layer, this introduces severe noise. Tuned Lens trains an independent affine transformation \(p_l = \text{softmax}(W_l h_l + b_l)\) for each layer, where \(W_l, b_l\) are trained by fitting the final layer output on a validation set. This yields more accurate probability estimations and smoother trajectories for intermediate layers.
- Design Motivation: Tuned Lens minimizes the noise introduced by the probing tool itself, ensuring that any observed trajectory differences (or lack thereof) are reliably attributed to the model itself rather than the measurement artifact.
Incorrect Prediction as a Proxy for Epistemic Uncertainty:
- Function: Categorizes samples into "certain predictions" and "uncertain predictions".
- Mechanism: When a model yields a correct answer, it suggests a higher level of cognitive certainty regarding the relevant knowledge. Conversely, an incorrect answer reflects high epistemic uncertainty about the prompt. Although this binary classification is coarse, it provides broad alignment for large-scale systematic experiments.
- Design Motivation: Directly measuring epistemic uncertainty requires multiple sampling iterations or ensemble methods, which are computationally expensive and not applicable to all models. Using correct/incorrect predictions as a proxy is simple, straightforward, and statistically reasonable.
A Systematic Comparison Framework Across Models and Datasets:
- Function: Validates the robustness and generalizability of the findings.
- Mechanism: Five models of different scales and architectures and eleven datasets covering QA, commonsense reasoning, and knowledge queries are selected, forming 55 independent (model, dataset) pairs. Each pair is analyzed independently, and then the consistency of the patterns is compared.
- Design Motivation: Self-contained findings that apply only to specific models or datasets have limited value. The selection of 55 combinations ensures the generalizability of the discoveries.

Loss & Training¶

This work is purely analytical and does not involve model training. Tuned Lens itself is a lightweight, pre-trained module.

Key Experimental Results¶

Main Results¶

Feature Dimension	Certain Prediction (Correct)	Uncertain Prediction (Incorrect)	Degree of Difference
Position of Confidence Jump	Middle-to-late layers	Middle-to-late layers	Minimal, almost aligned
Shape of Probability Trajectory	Stable then sudden rise	Stable then sudden rise	Matching shapes
Final Layer Confidence	High (~0.7-0.9)	Relatively lower (~0.3-0.5)	Expected difference
Jump Magnitude	Broad surge	Broad surge	Similar magnitude
Post-jump Slope	Plateau-like	Plateau-like	Identical convergence pattern

Ablation Study¶

Analysis Dimension	Number of Pairs	Trajectory Alignment Observation	Exceptions
5 Models	All	Consistent trends, aligned trajectories	Marginally distinct patterns in stronger models
11 Datasets	All	Robust across tasks	No significant exceptions
Model Capability Level	High- vs. Low-capability	Slightly more differentiated in high-capability models	Insignificant differences
Different Layer Intervals	Early/Middle/Late layers	Almost no information in early layers	Information concentrated in middle and late layers

Key Findings¶

Highly Aligned Layer-wise Probability Trajectories: Both groups of samples undergo an extremely similar "stable — sudden jump — stable" progression, with the sudden confidence increase occurring at similar layer indexes.
Uncertainty Does Not Affect the Structure of the "Inference Pipeline": Models do not employ different processing paths for uncertain inputs; only the final output confidence differs—which represents a significant negative finding.
More Capable Models Might Learn to Use Differentiated Processing: A weak sign of trajectory differentiation is observed in more capable models, suggesting that uncertainty awareness may emerge as model capability scales, though this trend is currently not prominent.
Direct Challenge to Simple Layer-wise Uncertainty Detection Methods: Since the dynamical patterns of both prediction types are highly similar, detectors relying purely on statistical metrics of layer-wise trajectories might perform close to "random guessing."

Highlights & Insights¶

Important "Negative Result": Proving that something "does not work" is sometimes more valuable than proving it does. This paper systematically demonstrates that simple layer-wise detection paradigms face fundamental barriers, helping the community avoid wasting effort on dead ends.
Novel Interdisciplinary Perspective of Using Interpretability Tools to Study Uncertainty: Using Tuned Lens—a tool originally designed to understand model computation processes—to explore the mechanisms of uncertainty handling highlights an unconventional application of interpretability techniques.
Inspiring Observation that "Stronger Models May Differentiate": Although not highly significant yet, this observation is inspiring. If this trend is validated on larger models, it suggests that uncertainty awareness could be an emergent capability.

Limitations & Future Work¶

Using Incorrect Predictions as a Proxy for Uncertainty is Coarse: Failures can stem from out-of-distribution issues rather than epistemic uncertainty, while some correct predictions might simply be lucky guesses with low confidence.
Exclusion of Other Probing Tools: The analysis relies solely on Tuned Lens. Other probing approaches (such as probing classifiers, Logit Lens, and CKA analysis) might reveal different dimensions of information.
Constrained Scope as a Workshop Paper: Detailed statistical tests, effect size analyses, and validations on much larger-scale models (70B+) are currently missing.
No Distinction Between Aleatoric and Epistemic Uncertainty: Treating all errors uniformly as epistemic uncertainty is overly simplified.

vs. Logit Lens Family: While Logit Lens is primarily utilized to visualize the layer-by-layer "thinking process" of a model, this paper extends it methodologically to compare dynamical differences between two types of inputs.
vs. Probing-based Uncertainty Detection: Probing methods train linear classifiers at specific layers to distinguish certain/uncertain states. This study questions the upper bound of such methods from a dynamical standpoint—if the dynamical patterns are identical, probes might simply be exploiting confidence differences near the final layer.
Implications for Hallucination Detection: Intermediate-layer-based hallucination detection may require more complex or joint multi-layer features rather than simple single-layer statistical metrics.

Rating¶

Novelty: ⭐⭐⭐⭐ Interdisciplinary perspective of interpretability + uncertainty, and the negative findings themselves are valuable.
Experimental Thoroughness: ⭐⭐⭐ Wide coverage with 5 models \(\times\) 11 datasets, but lacks diverse analytical methods and statistical testing.
Writing Quality: ⭐⭐⭐⭐ Concise, clear, and logically coherent.
Value: ⭐⭐⭐⭐ Provides crucial negative evidence for the direction of uncertainty detection, preventing the community from taking detour-prone paths.