Skip to content

Learning Uncertainty from Sequential Internal Dispersion in Large Language Models

Conference: ACL 2026 arXiv: 2604.15741 Code: GitHub Area: Uncertainty Estimation / Hallucination Detection Keywords: uncertainty estimation, hallucination detection, hidden state variance, sequential aggregation, internal representation dispersion

TL;DR

This paper proposes the SIVR framework, which computes internal variance statistics (generalised variance, circular variance, and token entropy) across layers of LLM hidden states as token-level features, and aggregates full-sequence patterns via a lightweight Transformer encoder to estimate uncertainty and detect hallucinations, achieving significant improvements over baselines with stronger generalization.

Background & Motivation

State of the Field: Uncertainty estimation is a critical approach for detecting hallucinations in LLMs. Existing methods include sampling-based consistency (e.g., Semantic Entropy), output probability methods (e.g., Entropy), and internal state probe methods.

Limitations of Prior Work: (1) Sampling-based methods incur substantial computational overhead; (2) methods such as CoE impose overly strict assumptions about layer-wise evolution that do not hold across models or tasks; (3) using only the last or average token discards temporal patterns in the sequence.

Root Cause: CoE compresses information into a single scalar, ignoring variance patterns across different token positions. For example, in "Praia is in Portugal," a variance spike at "Portugal" can flag an error, but mean aggregation masks such signals.

Paper Goals: To design internal state features based on more relaxed assumptions while preserving complete sequential information.

Starting Point: Uncertainty is reflected in the degree of "dispersion" of hidden states across layers — representations are more concentrated when correct and more dispersed when incorrect.

Core Idea: Three dispersion statistics (generalised variance, circular variance, and token entropy) characterize the cross-layer dispersion of each token, and a Transformer encoder learns full-sequence patterns to predict hallucinations.

Method

Overall Architecture

For each generated token, hidden states from all layers are extracted, and three internal variance features \(\bm{v}_t = [v_t, c_t, e_t]\) are computed, forming a sequence that is fed into a lightweight Transformer encoder for binary classification.

Key Designs

  1. Generalised Variance:

    • Function: Measures the volumetric dispersion across layers.
    • Mechanism: Computes the log-determinant of the regularised covariance matrix \(v_t = \log\det(\Sigma') = \sum_i \log \lambda_i\), aggregating the full feature spectrum.
    • Design Motivation: Unlike CoE, which only examines differences between adjacent layers, generalised variance is directly related to differential entropy and provides a more comprehensive measure of dispersion.
  2. Circular Variance:

    • Function: Measures the directional dispersion across layers.
    • Mechanism: Normalizes hidden states at each layer and computes the magnitude of the mean vector, \(c_t = 1 - \|\frac{1}{L+1}\sum_l \hat{\bm{h}}_t^l\|\).
    • Design Motivation: Complementary to generalised variance — the former captures magnitude while the latter captures direction, implicitly encoding all pairwise layer relationships.
  3. Sequential Aggregation Transformer Classifier:

    • Function: Learns hallucination detection from the full sequence of dispersion patterns.
    • Mechanism: An embedding layer (128-dimensional) followed by a single-layer Transformer encoder and a linear classification head, trained with binary cross-entropy and \(l_2\) regularization.
    • Design Motivation: Preserving sequential order captures temporal patterns such as variance spikes, which is more effective than mean or last-token aggregation.

Loss & Training

Binary cross-entropy with \(l_2\) regularization; requires only hundreds to thousands of labeled samples.

Key Experimental Results

Main Results

AUC comparison across 7 datasets on Llama-3.1-8B:

Method TriviaQA SciQ MedMCQA MATH Avg. AUC Rank
Entropy 80.46 72.85 62.76 62.77 67.63 7.96
SE 84.44 79.44 66.88 67.27 68.87 7.13
CoE-C 66.97 75.06 62.14 58.67 61.25 11.08
SIVR 90.75 83.64 68.37 71.22 75.35 1.88

Ablation Study

Configuration Avg. AUC Notes
Token Entropy only 71.2 Effective but insufficient alone
Generalised Variance only 72.8 Complementary signal
All three combined (SIVR) 75.35 Best performance
Mean aggregation instead of sequence 72.5 Loss of temporal patterns

Key Findings

  • SIVR achieves an average rank of 1.88, substantially outperforming the second-best method; the three features exhibit strong complementarity.
  • Sequential aggregation improves AUC by 2–3 points over mean/last-token aggregation, demonstrating the value of temporal patterns.
  • Out-of-distribution generalization is significantly better than CoE, requiring only a small amount of training data.

Highlights & Insights

  • The "dispersion" assumption is more robust than the "step size" assumption — CoE's assumptions are inconsistent across models, whereas SIVR's assumptions are more fundamental and general.
  • The paradigm of preserving sequential structure is transferable — any task that requires inferring sequence-level properties from token-level signals can benefit from this approach.
  • Lightweight yet effective — three statistics combined with a single-layer Transformer result in negligible inference overhead.

Limitations & Future Work

  • Labeled data are required; although the quantity is small, new domains necessitate additional annotation.
  • Evaluation is limited to greedy decoding; performance under sampling-based decoding remains to be assessed.
  • Validation on large-scale models (70B+) is insufficient.
  • The use of SIVR for proactive hallucination mitigation has not been explored.
  • vs. CoE: CoE's overly strong assumptions fail across tasks; SIVR adopts more relaxed assumptions.
  • vs. Semantic Entropy: SE requires multiple sampling passes and is computationally expensive; SIVR requires only a single forward pass.
  • vs. Lookback Lens: Lookback Lens focuses on specific layers and attention patterns, whereas SIVR provides a more global perspective.

Rating

  • Novelty: ⭐⭐⭐⭐ The internal variance feature idea is conceptually clear; individual components are simple but effective in combination.
  • Experimental Thoroughness: ⭐⭐⭐⭐⭐ Comprehensive ablations across 7 datasets and multiple models.
  • Writing Quality: ⭐⭐⭐⭐ Motivation is clearly argued with effective visualizations.
  • Value: ⭐⭐⭐⭐⭐ Highly practical with direct applicability to hallucination detection.