Beyond Confidence: The Rhythms of Reasoning in Generative Models¶
Conference: ICLR 2026 arXiv: 2602.10816 Code: None Area: Image Generation Keywords: Token Constraint Bound, prediction robustness, hidden state perturbation, output embedding geometry, prompt engineering
TL;DR¶
This paper proposes the Token Constraint Bound (\(\delta_{\text{TCB}}\)) metric, which quantifies the largest perturbation to an LLM's hidden state that preserves the next-token prediction, measuring local prediction robustness and revealing instabilities that traditional perplexity fails to capture.
Background & Motivation¶
Background: LLMs are highly sensitive to minor variations in input context—minor formatting changes can cause accuracy to fluctuate by 76%, and reordering in-context examples can shift accuracy from 54% to 93%.
Limitations of Prior Work: - Accuracy provides only an aggregate view and cannot assess the stability of individual predictions. - Perplexity conflates probability distributions and ignores the geometric structure of internal states. - Softmax normalization can yield high-probability yet unstable predictions—high probability may stem from relative normalization rather than robust internal states.
Key Challenge: A high-probability, high-confidence prediction may correspond to an unstable equilibrium in the internal state space—existing metrics cannot distinguish between genuinely stable high confidence and fragile high confidence.
Goal: Quantify the robustness of the internal state \(\mathbf{h}\) produced by an LLM in a given context to small perturbations.
Key Insight: Analyze the first-order sensitivity of softmax outputs to hidden states via the Jacobian matrix.
Core Idea: Prediction robustness = the maximum perturbation radius around the hidden state that preserves the output distribution, determined by the geometric dispersion of output embeddings.
Method¶
Overall Architecture¶
The final-layer hidden state \(\mathbf{h} \in \mathbb{R}^d\) is mapped through the output weight matrix \(\mathbf{W} \in \mathbb{R}^{\mathcal{V} \times d}\) and softmax to yield probability distribution \(\mathbf{o}\). \(\delta_{\text{TCB}}\) quantifies the radius of a perturbation ball around \(\mathbf{h}\) within which the change in \(\mathbf{o}\) does not exceed tolerance \(\epsilon\).
Key Designs¶
-
Token Constraint Bound (\(\delta_{\text{TCB}}\)) Definition:
- Function: Measures the robustness of LLM predictions to perturbations of the internal state.
- Mechanism: Using the first-order linear approximation \(\Delta\mathbf{o} \approx \mathbf{J}_\mathbf{W}(\mathbf{h}) \Delta\mathbf{h}\), the condition \(\|\Delta\mathbf{o}\|_2 \leq \epsilon\) implies \(\|\Delta\mathbf{h}\|_2 \leq \epsilon / \|\mathbf{J}_\mathbf{W}(\mathbf{h})\|_F\), yielding the definition \(\delta_{\text{TCB}}(\mathbf{h}) = \epsilon / \|\mathbf{J}_\mathbf{W}(\mathbf{h})\|_F\).
- Design Motivation: A larger \(\delta_{\text{TCB}}\) indicates that the model's prediction remains stable under a wider range of hidden state perturbations.
-
Precise Connection to Output Embedding Geometry:
- Function: Derives an analytic expression for the Jacobian norm.
- Mechanism: Proves that \(\|\mathbf{J}_\mathbf{W}(\mathbf{h})\|_F^2 = \sum_{i=1}^{\mathcal{V}} o_i^2 \|\mathbf{w}_i - \boldsymbol{\mu}_\mathbf{w}(\mathbf{h})\|_2^2\), where \(\boldsymbol{\mu}_\mathbf{w}(\mathbf{h}) = \sum_j o_j \mathbf{w}_j\) is the probability-weighted mean embedding.
- Geometric Interpretation: Sensitivity is determined by the dispersion of token embeddings relative to the weighted centroid, weighted by \(o_i^2\)—the embedding positions of high-probability tokens exert the greatest influence.
-
Analysis of Two Prediction Regimes:
- High-confidence regime (low \(\mathcal{V}_{\text{eff}}\)): \(\boldsymbol{\mu}_\mathbf{w} \to \mathbf{w}_k\) (dominant token), \(\delta_{\text{TCB}} \to \infty\). In this regime, \(\delta_{\text{TCB}}\) strongly correlates with the top-2 logit margin (\(r = 0.62\)).
- Uncertain regime (high \(\mathcal{V}_{\text{eff}}\)): Probability is distributed across multiple tokens; \(\delta_{\text{TCB}}\) correlates positively with \(\sqrt{\mathcal{V}_{\text{eff}}}\) (\(r = 0.95\)). Crucially, even with high \(\mathcal{V}_{\text{eff}}\), if the embeddings of high-probability tokens are geometrically clustered, \(\delta_{\text{TCB}}\) can remain large.
Loss & Training¶
- \(\delta_{\text{TCB}}\) is an analytical metric and does not involve training.
- Computation requires only a forward pass to obtain \(\mathbf{h}\), \(\mathbf{o}\), and \(\mathbf{W}\), followed by evaluation of the analytic formula.
- \(\epsilon = 1.0\) is adopted as the normalization standard.
Key Experimental Results¶
Main Results — Prediction Regime Validation (LLaMA-3.1-8B)¶
| Dataset | Corr(\(\delta_{\text{TCB}}, \mathcal{V}_{\text{eff}}\)) | Corr(\(\delta_{\text{TCB}}, z_{top1} - z_{top2}\)) |
|---|---|---|
| Diverse Prompts (N=309) | 0.95 (strong positive) | -0.40 |
| Low-\(\mathcal{V}_{\text{eff}}\) Targeted (N=360) | 0.08 (near zero) | 0.62 (strong positive) |
Ablation Study — Embedding Geometry Validation¶
| Embedding Operation | Rate at which \(\delta_{\text{cluster}} > \delta_{\text{orig}} > \delta_{\text{disperse}}\) holds |
|---|---|
| Low \(\mathcal{V}_{\text{eff}}\) (< 20) | 95% |
| Overall | 90% |
- Holding \(\mathbf{o}\) fixed, artificially clustering or dispersing competing token embeddings causes \(\delta_{\text{TCB}}\) to increase or decrease accordingly.
- This confirms that geometric structure influences prediction stability independently of the probability distribution.
Key Findings¶
- \(\delta_{\text{TCB}}\) discriminates prompt quality: well-constructed prompts yield higher \(\delta_{\text{TCB}}\), even when accuracy is identical.
- Identifies instabilities missed by perplexity: in text generation, positions with low perplexity but sharp drops in \(\delta_{\text{TCB}}\) may correspond to semantic turning points or potential errors.
- ICL example effects are reflected in \(\delta_{\text{TCB}}\): effective few-shot examples not only improve accuracy but also increase \(\delta_{\text{TCB}}\).
Highlights & Insights¶
- High probability ≠ stability: This core insight is highly valuable—softmax normalization may create a "false sense of security," while \(\delta_{\text{TCB}}\) directly probes the true stability of internal states.
- Dominant role of embedding geometry: Even with an identical probability distribution, altering the geometric structure of the embedding space changes prediction stability—this offers insights into representation learning in LLMs.
- Elegant analytic formula: \(\|\mathbf{J}\|_F^2 = \sum o_i^2 \|\mathbf{w}_i - \boldsymbol{\mu}\|^2\) reduces the complex Jacobian norm to an intuitively clear weighted dispersion measure.
Limitations & Future Work¶
- The first-order linear approximation may be inaccurate for large perturbations.
- Validation is limited to LLaMA-3.1-8B; experiments across more models and scales are needed.
- The choice of \(\epsilon = 1.0\) lacks theoretical justification.
- It remains unexplored how to incorporate \(\delta_{\text{TCB}}\) into training objectives to directly improve robustness.
- The Frobenius norm as a sensitivity measure may be overly conservative compared to the spectral norm.
Related Work & Insights¶
- vs. Perplexity: PPL measures sequence likelihood; \(\delta_{\text{TCB}}\) measures local prediction robustness—the two are complementary rather than substitutes.
- vs. Calibration metrics: Calibration concerns the alignment between probability and correctness; \(\delta_{\text{TCB}}\) concerns the stability of predictions under perturbation—orthogonal dimensions.
- vs. Adversarial robustness: Adversarial research seeks worst-case perturbations in input space; \(\delta_{\text{TCB}}\) quantifies safety margins in hidden state space.
Rating¶
- Novelty: ⭐⭐⭐⭐ Jacobian analysis is not new, but connecting it to output embedding geometry and defining a practically meaningful metric is novel.
- Experimental Thoroughness: ⭐⭐⭐⭐ Theoretical validation + prompt analysis + ICL analysis + text generation analysis, though model diversity is limited.
- Writing Quality: ⭐⭐⭐⭐ Mathematical derivations are clear, though the exposition is somewhat verbose.
- Value: ⭐⭐⭐⭐ Provides a new analytical perspective on LLMs with practical utility for prompt engineering and reliability evaluation.