Skip to content

LogitDynamics: Reliable ViT Error Detection from Layerwise Logit Trajectories

Conference: CVPR 2026 arXiv: 2604.10643 Code: N/A Area: AI Safety / Reliability Keywords: Error prediction, confidence estimation, Vision Transformer, layerwise dynamics, hallucination detection

TL;DR

LogitDynamics attaches lightweight classification heads to each layer of a ViT to extract layerwise logit trajectories and top-K competition dynamics, then trains a linear probe to predict model errors, outperforming existing methods in cross-dataset generalization.

Background & Motivation

Background: Reliable confidence estimation is critical for high-stakes applications. Existing approaches include Bayesian uncertainty estimation (MC Dropout, deep ensembles) and logit/softmax-based post-hoc methods.

Limitations of Prior Work: Modern models can be overconfident even when wrong, a problem that is exacerbated under distribution shift. Using only the final-layer logit ignores how class evidence evolves across network depth.

Key Challenge: The confidence score from the final layer is a static snapshot that cannot reflect the stability of the model's "belief" throughout the inference process.

Goal: Leverage internal layerwise signals within ViTs to better predict when a model will make an error.

Key Insight: Inspired by the use of internal signals for LLM hallucination detection, the paper investigates whether analogous depth-wise signals exist in ViTs.

Core Idea: Correct predictions tend to exhibit a stable top-K structure, whereas erroneous predictions are accompanied by dramatic fluctuations among top candidate classes — capturing these layerwise dynamics enables error prediction.

Method

Overall Architecture

Freeze pretrained ViT → attach a linear classification head to each of the last \(L\) layers → extract layerwise logit features and top-K dynamics statistics → concatenate into a feature vector → train a linear probe to predict an error indicator.

Key Designs

  1. Layer-wise Class Projections:

    • Function: Expose intermediate class evidence at each layer.
    • Mechanism: A lightweight linear head is trained on the CLS token of each of the last \(L\) layers, producing a sequence of layerwise logits. From each layer, the target-class logit and the top-K competing-class logits are extracted; together with the corresponding vectors from the final classifier, they are concatenated into a \((L+1)(K+1)\)-dimensional feature.
    • Design Motivation: Prior work has shown that intermediate predictions can shift across layers and exhibit "overthinking" behavior; these variation patterns are informative for error prediction.
  2. Top-K Dynamics Features:

    • Function: Quantify the stability of the model's top hypotheses along the depth dimension.
    • Mechanism: Seven statistics are computed — Top-1 switching rate, top-K weighted Jaccard similarity, unique top-K count, Top-1 mode frequency, Top-1 entropy, Top-1 unique count, and Top-1 lock-in depth.
    • Design Motivation: Correct predictions typically lock in early and remain stable, whereas erroneous predictions are accompanied by intense competition among top candidates. These statistics capture robustness signals that persist under distribution shift.
  3. Linear Error Predictor:

    • Function: Map the above features to an error probability.
    • Mechanism: A simple linear classifier; the backbone is fully frozen, and inference requires only a single forward pass plus a small amount of linear computation.
    • Design Motivation: Maintains the same efficiency as post-hoc confidence estimation while incorporating richer internal signals.

Loss & Training

The layerwise linear heads are trained with standard cross-entropy loss (backbone frozen); the error predictor is trained with binary cross-entropy loss.

Key Experimental Results

Main Results

Dataset Metric (AUCPR) LogitDynamics Top-K logits Gain
ImageNet AUCPR 0.6458 0.6098 +0.036
CIFAR-100 AUCPR 0.4430 0.4164 +0.027
Places365 AUCPR 0.7232 0.7283 −0.005

Ablation Study

Configuration In-domain Mean Cross-domain Mean Note
w/ dynamics baseline +0.0155 Dynamics features improve cross-domain transfer
w/o dynamics baseline baseline Slightly better in-domain, worse cross-domain

Key Findings

  • Dynamics features contribute little in-domain (−0.0107) but yield significant improvement in cross-dataset transfer (+0.0155), serving as a robustness signal.
  • LLM hallucination detection methods (linear probing, ACT-ViT) do not transfer well to vision tasks directly.
  • Logit-based methods consistently outperform activation-based methods, indicating that internal signal characteristics differ between vision and language models.

Highlights & Insights

  • Cross-modal inspiration: Transferring ideas from LLM hallucination detection to vision models reveals that visual models possess distinctive logit dynamics patterns.
  • Simplicity and efficiency: The method is extremely straightforward (a linear probe plus 7 statistics), yet substantially outperforms more complex methods in cross-domain generalization.

Limitations & Future Work

  • Validated only on ViT-Large; other architectures remain untested.
  • Training the layerwise linear heads incurs additional overhead.
  • Future work may explore applicability to a broader range of architectures and tasks.
  • vs. ACT-ViT: ACT-ViT processes activation tensors with a ViT-style architecture, which is overly complex and generalizes poorly across domains.
  • vs. Mahalanobis: Feature-space distance methods perform poorly on the error prediction task (AUCPR 0.32).

Rating

  • Novelty: ⭐⭐⭐⭐ Cross-modal inspiration is novel; the method is simple yet effective.
  • Experimental Thoroughness: ⭐⭐⭐⭐ In-domain and out-of-domain evaluations are complete; ablations are clear.
  • Writing Quality: ⭐⭐⭐⭐ Motivation is well-articulated; structure is well-organized.
  • Value: ⭐⭐⭐ The direction is meaningful, but the magnitude of improvement is limited.