ELogitNorm: Enhancing OOD Detection with Extended Logit Normalization¶

Conference: CVPR 2026 arXiv: 2504.11434 Code: GitHub Area: Other Keywords: out-of-distribution detection, logit normalization, feature collapse, decision boundary, calibration

TL;DR¶

This paper diagnoses two feature collapse problems in LogitNorm (dimensional collapse and origin collapse), and proposes ELogitNorm — replacing the feature norm with the average distance to decision boundaries as an adaptive temperature scaling factor. The method requires no hyperparameters, is compatible with all post-hoc OOD detection methods, achieves a 10.48% far-OOD AUROC improvement on CIFAR-10 (with SCALE), reduces FPR95 from 51.45% to 27.74% on ImageNet-1K, and simultaneously improves classification accuracy and ECE calibration.

Background & Motivation¶

Among training-time OOD detection methods, LogitNorm improves post-hoc detection performance by modifying the loss function — specifically by dividing logits by their norm to alleviate overconfidence. However, the authors identify two critical issues: (1) Dimensional collapse: the singular value spectrum of features contains many near-zero values, indicating compression into a small number of dominant directions; (2) Origin collapse: since $\|f\| \propto \|z\|$, LogitNorm implicitly regularizes features by their distance to the origin, pulling both OOD and ID samples toward the origin. These issues limit compatibility with various post-hoc methods and degrade classification accuracy.

Core Problem¶

How to design a hyperparameter-free training-time method that improves OOD detection without sacrificing ID classification accuracy, without restricting the choice of post-hoc methods, and while improving confidence calibration?

Method¶

Overall Architecture¶

The scaling factor in LogitNorm, $s = \tau\|f\|$ (distance to origin), is replaced by $s = D(z)$ (average distance to decision boundaries), thereby extending distance-awareness from a single origin point to all inter-class decision hyperplanes.

Key Designs¶

Distance from features to decision boundaries: For the predicted class $f_{\max}$, the method computes the average point-to-plane distance to the decision boundaries of all other classes: $$D(z) = \frac{1}{c-1} \sum \frac{|(w_{f_{\max}} - w_i)^T z + (b_{f_{\max}} - b_i)|}{\|w_{f_{\max}} - w_i\|_2}$$ This is a geometrically precise measure of distance to decision boundaries.
ELogitNorm loss: $L = -\log\!\left(\frac{\exp(f_y / D(z))}{\sum \exp(f_i / D(z))}\right)$. This directly replaces the CE loss during training with no additional hyperparameters (LogitNorm requires tuning $\tau$).
Collapse prevention mechanism (Proposition 2): The minimum scaling factor space of LogitNorm is the origin (0-dimensional), whereas that of ELogitNorm is the intersection of all decision boundaries ($m - c + 1$ dimensional, e.g., 503-dimensional for ResNet-18 on CIFAR-10). Optimization is therefore no longer attracted to a single point but is distributed over a high-dimensional affine subspace.

Loss & Training¶

ResNet-18 on CIFAR-10/100: 100 epochs, SGD momentum=0.9, lr=0.1, weight decay 5e-4, batch size 128. ImageNet-1K ResNet-50: fine-tuned for 30 epochs with lr=0.001. No additional hyperparameters.

Key Experimental Results¶

CIFAR-10 Far-OOD (ResNet-18, AUROC improvement with various post-hoc methods)¶

Post-hoc Method	CE → +ELogitNorm (AUROC↑)
MSP	90.73 → 96.68 (+5.95)
GEN	91.19 → 97.30 (+6.11)
ReAct	92.56 → 97.63 (+5.07)
SCALE	86.99 → 97.47 (+10.48)
KNN	93.86 → 97.75 (+3.89)

ImageNet-1K (ResNet-50, MSP)¶

Method	Near AUROC	Far AUROC	Far FPR95↓
CE	76.02	85.23	51.45
LogitNorm	74.62	91.54	31.32
ELogitNorm	76.88	92.81	27.74

Classification Accuracy (Table 5, 200 epochs)¶

Dataset	CE	LogitNorm	ELogitNorm
CIFAR-10	95.10	94.83	95.11
CIFAR-100	77.47	76.06	77.37
ImageNet-200	86.58	86.41	87.12

Calibration (ECE, CIFAR-10 ResNet-18)¶

Method	$f$ raw	$f/\tau\\|f\\|$	$f/D(z)$
CE	3.3	4.8	2.3
LogitNorm	58.7	4.1	52.3
ELogitNorm	26.7	4.7	1.8

Ablation & Analysis Highlights¶

LogitNorm degrades with ReAct: On CIFAR-100, LogitNorm+ReAct underperforms CE+ReAct (Fig. 3), whereas ELogitNorm consistently improves all post-hoc methods.
Singular value spectrum: LogitNorm exhibits many near-zero singular values (collapse), while ELogitNorm yields a more uniformly distributed spectrum.
$D(z)$ vs. $\|z\|$: The two quantities are no longer linearly correlated (Fig. 2d vs. 2c), confirming that ELogitNorm introduces additional decision boundary information.
Limited near-OOD improvement: All training-time methods show limited gains on near-OOD benchmarks, which is a common challenge in the field.

Highlights & Insights¶

"Distance to what?" is the core question: LogitNorm measures distance to the origin; ELogitNorm measures distance to decision boundaries — the latter is physically more meaningful (farther from the boundary = more certain).
Hyperparameter-free design: LogitNorm requires tuning $\tau$; ELogitNorm requires no additional hyperparameters, as $D(z)$ adapts naturally to the data.
Geometric insight of Proposition 2: The minimum scaling factor space expands from 0-dimensional (the origin) to $m - c + 1$ dimensional, fundamentally altering the optimization landscape and preventing feature collapse to a single point.
Orthogonality of training-time and post-hoc methods: As a training-time method, ELogitNorm consistently benefits all post-hoc methods (MSP/GEN/ReAct/SCALE/KNN) — this composability is a key practical advantage for deployment.
Diagnosing feature collapse: Singular value spectrum analysis combined with 2D feature visualization is an effective tool for assessing representation quality.

Limitations & Future Work¶

Near-OOD improvement remains limited (on the IDK dataset), a challenge shared by all training-time methods.
Validation is limited to ResNet-18/50; modern architectures such as ViT have not been tested.
Computing $D(z)$ involves all $c$ decision boundaries; while efficiently implemented, it scales in principle with $c$ (e.g., $c = 1000$ for ImageNet).
Outlier synthesis methods (VOS/NPOS/Dream) represent a complementary direction; combinations with ELogitNorm remain unexplored.

vs. LogitNorm: Both are training-time logit scaling methods, but ELogitNorm replaces the norm with decision boundary distance, resolving feature collapse, eliminating hyperparameters, and improving compatibility with more post-hoc methods.
vs. CIDER/NPOS: These follow a deep metric learning + outlier synthesis paradigm (two-stage); ELogitNorm is end-to-end (one-stage) and requires no external data generation.
vs. SCALE: SCALE performs poorly on CIFAR-10 (Fig. 1), whereas ELogitNorm consistently improves all settings.
vs. fDBD: Both leverage decision boundary distances, but fDBD applies them to the scoring function (inference-time), while ELogitNorm applies them to the training loss (training-time).

Relevance to My Research¶

The adaptive temperature scaling framework (Eq. 9) can be generalized to other settings requiring confidence calibration (e.g., VLMs).
The concept of feature-to-decision-boundary distance may be useful in multimodal learning (e.g., identifying boundary regions between modalities).
The paradigm of "improving representation quality at training time → benefiting multiple post-hoc methods at inference time" is worth further attention.

Rating¶

Novelty: ⭐⭐⭐⭐ — The diagnosis of feature collapse is valuable; replacing the norm with decision boundary distance is a natural and effective idea.
Experimental Thoroughness: ⭐⭐⭐⭐⭐ — OpenOOD benchmark across 4 datasets, 5+ post-hoc methods, comparisons with training-time methods, calibration analysis, singular value spectrum analysis, and classification accuracy verification.
Writing Quality: ⭐⭐⭐⭐ — Theoretical derivations (Prop. 1/2) are clear; the motivation figure (Fig. 2) is convincing.
Value: ⭐⭐⭐ — OOD detection is not a core research direction, but the adaptive temperature scaling and feature collapse diagnosis ideas are useful references.