Learning Uncertainty from Sequential Internal Dispersion in Large Language Models¶
Conference: ACL 2026
arXiv: 2604.15741
Code: GitHub
Area: Uncertainty Estimation / Hallucination Detection
Keywords: Uncertainty Estimation, Hallucination Detection, Hidden State Variance, Sequence Aggregation, Internal Representation Dispersion
TL;DR¶
Ours proposes the SIVR framework, which computes the internal variance of LLM hidden states across layers (Generalised Variance, Circular Variance, and Token Entropy) as token-level features. These features are then processed by a lightweight Transformer encoder to aggregate full-sequence patterns for uncertainty estimation and hallucination detection, significantly outperforming baselines with stronger generalization.
Background & Motivation¶
Background: Uncertainty estimation is a critical means for detecting hallucinations in LLMs. Existing approaches include sampling consistency (e.g., Semantic Entropy), output probability methods (e.g., Entropy), and internal state probing methods.
Limitations of Prior Work: (1) Sampling-based methods incur high computational overhead; (2) Methods like CoE (Consistency of Evolution) impose overly strict assumptions on layer-wise evolution that do not hold across different models or tasks; (3) Relying solely on the last or average token loses temporal patterns.
Key Challenge: Compressing internal signals into a single score (as in CoE) ignores variance patterns at different token positions. For instance, in the phrase "Praia is in Portugal," a variance spike at "Portugal" can signal an error, but simple mean aggregation would mask this signal.
Goal: Design internal state features based on more relaxed assumptions while preserving full sequence information.
Key Insight: Uncertainty is reflected in the "degree of dispersion" of hidden states across layers—representations are more concentrated when correct and more dispersed when incorrect.
Core Idea: Use three dispersion statistics (Generalised Variance, Circular Variance, and Token Entropy) to describe the cross-layer dispersion of each token, and employ a Transformer encoder to learn full-sequence patterns for hallucination prediction.
Method¶
Overall Architecture¶
For each generated token, the hidden states from all layers are extracted. Three internal variance features \(\bm{v}_t = [v_t, c_t, e_t]\) are calculated to form a sequence, which is then fed into a lightweight Transformer encoder for binary classification.
Key Designs¶
-
Generalised Variance:
- Function: Measures "volume" dispersion across layers.
- Mechanism: Computes the log-determinant of the regularized covariance matrix \(v_t = \log\det(\Sigma') = \sum_i \log \lambda_i\), aggregating the entire feature spectrum.
- Design Motivation: Unlike CoE which only considers differences between adjacent layers, Generalised Variance is directly related to differential entropy and provides a more comprehensive measure of dispersion.
-
Circular Variance:
- Function: Measures "directional" dispersion across layers.
- Mechanism: Calculates the magnitude of the mean vector after normalizing hidden states at each layer: \(c_t = 1 - \|\frac{1}{L+1}\sum_l \hat{\bm{h}}_t^l\|\).
- Design Motivation: Complementary to Generalised Variance—one captures magnitude while the other captures direction. It implicitly considers all pairwise relationships between layers.
-
Sequence Aggregation Transformer Classifier:
- Function: Learns hallucination detection from full-sequence dispersion patterns.
- Mechanism: Consists of an embedding layer (128 dimensions), a single-layer Transformer encoder, and a linear classification head.
- Design Motivation: Preserves sequential order to capture temporal patterns like "variance spikes," which is more effective than mean or last-token aggregation.
Loss & Training¶
Binary Cross-Entropy with \(l_2\) regularization. Only a few hundred to a few thousand labeled samples are required.
Key Experimental Results¶
Main Results¶
Comparison of AUC on 7 datasets using Llama-3.1-8B:
| Method | TriviaQA | SciQ | MedMCQA | MATH | Avg AUC | Rank |
|---|---|---|---|---|---|---|
| Entropy | 80.46 | 72.85 | 62.76 | 62.77 | 67.63 | 7.96 |
| SE | 84.44 | 79.44 | 66.88 | 67.27 | 68.87 | 7.13 |
| CoE-C | 66.97 | 75.06 | 62.14 | 58.67 | 61.25 | 11.08 |
| SIVR | 90.75 | 83.64 | 68.37 | 71.22 | 75.35 | 1.88 |
Ablation Study¶
| Configuration | Avg AUC | Description |
|---|---|---|
| Token Entropy Only | 71.2 | Basic effectiveness but insufficient |
| Generalised Variance Only | 72.8 | Complementary signal |
| Triple Combination (SIVR) | 75.35 | Best performance |
| Mean Aggregation instead of Sequence | 72.5 | Loss of temporal patterns |
Key Findings¶
- SIVR achieves an average rank of 1.88, significantly outperforming the runner-up, with strong complementarity among the three features.
- Sequence aggregation improves AUC by 2-3 points compared to mean or last-token aggregation, proving the value of temporal patterns.
- OOD (Out-Of-Distribution) generalization is significantly better than CoE, requiring only minimal training data.
Highlights & Insights¶
- "Dispersion" assumption is more robust than "step size": CoE's assumption of consistent layer-wise evolution varies across models, whereas SIVR's dispersion assumption is more fundamental and universal.
- Transferable sequence-structure paradigm: Any task requiring the inference of sequence-level properties from token-level signals can benefit from this approach.
- Lightweight yet effective: Using only 3 statistics and a single-layer Transformer, the inference overhead is nearly negligible.
Limitations & Future Work¶
- Requires labeled data; although the amount is small, new domains require additional annotations.
- Only validated on greedy decoding; performance under sampling-based decoding remains to be evaluated.
- Insufficient validation on large-scale models (70B+).
- Did not explore using SIVR for active hallucination mitigation.
Related Work & Insights¶
- vs CoE: CoE has overly strong assumptions that fail across tasks; SIVR uses a more relaxed dispersion assumption.
- vs Semantic Entropy: SE requires multiple samples and is computationally expensive; SIVR only requires a single forward pass.
- vs Lookback Lens: Focuses on specific layers or attention patterns, whereas SIVR provides a more global perspective.
Rating¶
- Novelty: ⭐⭐⭐⭐ The concept of internal variance features is clear, and while components are simple, the combination is effective.
- Experimental Thoroughness: ⭐⭐⭐⭐⭐ Comprehensive ablation studies across 7 datasets and multiple models.
- Writing Quality: ⭐⭐⭐⭐ Clear motivation and effective visualizations.
- Value: ⭐⭐⭐⭐⭐ High practicality with direct value for hallucination detection applications.