Improved Balanced Classification with Theoretically Grounded Loss Functions¶
Conference: NeurIPS 2025 arXiv: 2512.23947 Authors: Corinna Cortes, Mehryar Mohri, Yutao Zhong Code: None Area: Machine Learning Theory / Class-Imbalanced Classification Keywords: Balanced classification loss, surrogate loss, H-consistency, logit adjustment, class-aware weighting
TL;DR¶
Two theory-driven surrogate loss families are proposed—Generalized Logit-Adjusted (GLA) loss and Generalized Class-Aware weighted (GCA) loss—providing stronger theoretical guarantees and improved empirical performance for multi-class classification under class imbalance.
Background & Motivation¶
Class imbalance is a pervasive challenge in multi-class classification. Balanced classification losses promote fairness by assigning equal importance to all classes, ensuring that minority classes are not neglected. However, directly minimizing the balanced classification loss is generally intractable, making the design of effective surrogate losses a central problem.
Existing surrogate loss methods suffer from the following limitations:
Standard class-weighted loss: Scales losses by the inverse of class frequencies, with limited theoretical guarantees.
Logit-Adjusted (LA) loss: Shifts logits according to class prior probabilities and is effective within the standard cross-entropy family, but H-consistency holds only for the complete (unbounded) hypothesis class.
Insufficient theoretical guarantees: The H-consistency bounds of existing methods depend on the inverse of the minimum class probability \(p_{\min}\), which degrades severely in highly imbalanced settings.
The core motivation of this paper is: can surrogate losses be designed with stronger theoretical guarantees under broader conditions while maintaining strong empirical performance?
Method¶
Overall Architecture¶
Two surrogate loss families are proposed, both extended to the generalized cross-entropy loss family:
- GLA (Generalized Logit-Adjusted): Extends logit adjustment to the generalized cross-entropy family.
- GCA (Generalized Class-Aware weighted): Introduces class-dependent confidence margins, extending the standard class-weighted loss.
Key Designs¶
1. Generalized Logit-Adjusted Loss (GLA)¶
The standard LA loss adjusts logits by adding an offset \(\log p_c\) for each class \(c\):
GLA generalizes this to the generalized cross-entropy loss family \(\Phi\), allowing a more general convex function \(\Phi\) to replace \(\log\):
where the offsets \(\tau_c\) may take values other than \(\log p_c\) and can be analyzed uniformly within the generalized cross-entropy framework.
Theoretical properties: - Bayes consistency: GLA loss is Bayes consistent. - H-consistency: Holds only for the complete (unbounded) hypothesis class. - H-consistency bound: Depends on \(1/p_{\min}\); insufficiently tight in imbalanced settings.
2. Generalized Class-Aware Weighted Loss (GCA)¶
GCA introduces two key innovations over the standard class-weighted loss:
a. Class-dependent confidence margins: A distinct confidence margin \(m_c\) is assigned to each class rather than a uniform threshold:
These margins can be calibrated via theoretical analysis, assigning larger margins to minority classes and stricter margins to majority classes.
b. Generalized cross-entropy extension: Analogously to GLA, the loss is extended to a broader family of convex functions.
Theoretical properties: - H-consistency: Holds for any bounded or complete hypothesis class (broader than GLA). - H-consistency bound: Depends on \(1/\sqrt{p_{\min}}\), which is superior to GLA's \(1/p_{\min}\). - Provides significantly stronger theoretical guarantees in highly imbalanced settings.
3. Theoretical Analysis Framework¶
The core theoretical tool is H-consistency bounds, which quantify the gap between the minimizer of the surrogate loss and the balanced classification error:
| Loss Type | Bayes Consistent | H-consistency Condition | Bound Dependence |
|---|---|---|---|
| Standard class-weighted | Yes | Bounded/Complete | Baseline |
| LA (original) | Yes | Complete only | \(1/p_{\min}\) |
| GLA (Ours) | Yes | Complete only | \(\geq 1/p_{\min}\) |
| GCA (Ours) | Yes | Bounded/Complete | \(1/\sqrt{p_{\min}}\) |
Loss & Training¶
Margin calibration strategy for GCA: - Margins \(m_c\) are set according to class frequencies \(p_c\). - Minority classes receive larger margins (lower confidence requirements). - Margin selection is designed to optimize H-consistency bounds. - Fine-tuning via cross-validation on a validation set is supported.
Key Experimental Results¶
Main Results¶
Standard Imbalanced Classification Benchmarks¶
| Method | CIFAR-10-LT (IF=100) | CIFAR-100-LT (IF=100) | ImageNet-LT | Theoretical Guarantee |
|---|---|---|---|---|
| Standard class-weighted | Baseline | Baseline | Baseline | Bounded/Complete |
| LA (original) | Above baseline | Above baseline | Above baseline | Complete only |
| GLA (Ours) | Typically best | Typically best | Typically best | Complete only |
| GCA (Ours) | Near-best | Near-best | Near-best | Bounded/Complete |
IF = Imbalance Factor, denoting the ratio of sample counts between the largest and smallest class.
Performance Under Extreme Imbalance¶
| Method | IF=10 | IF=50 | IF=100 | IF=200 |
|---|---|---|---|---|
| Standard class-weighted | Baseline | Baseline | Baseline | Baseline |
| LA loss | +small | +moderate | +moderate | +moderate |
| GLA | +large | +large | Best | Near-best |
| GCA | +moderate | +large | Near-best | Best |
Key observation: GLA tends to perform marginally better on common benchmarks, while GCA exhibits a marginal advantage under extreme imbalance (IF≥100), consistent with theoretical analysis—the \(1/\sqrt{p_{\min}}\) bound of GCA is more favorable under severe imbalance.
Ablation Study¶
Effect of Margin Calibration¶
| GCA Variant | No margin | Uniform margin | Calibrated (theory) | Calibrated (val set) |
|---|---|---|---|---|
| Balanced accuracy | Baseline | +small | +moderate | +largest |
Choice of Generalized Cross-Entropy Function \(\Phi\)¶
| \(\Phi\) Choice | GLA Performance | GCA Performance | Characteristics |
|---|---|---|---|
| Standard log | Baseline | Baseline | Classical cross-entropy |
| Polynomial | Slightly higher | Slightly higher | Smooth gradients |
| Exponential | Similar | Similar | Emphasizes hard samples |
Key Findings¶
- Complementarity of GLA and GCA: GLA performs slightly better on common benchmarks; GCA excels under extreme imbalance.
- Theory–empirical alignment: The tightness of H-consistency bounds corresponds to observed performance differences.
- Strength of class-weighted baseline: Simple class weighting is already a strong baseline, but GLA/GCA yield further gains.
- Importance of margin calibration: GCA performance is substantially dependent on correct margin specification.
Highlights & Insights¶
- Theoretical rigor: The team from Google Research (Cortes is a co-inventor of SVMs) provides complete theoretical analysis.
- Improvement in H-consistency bounds: GCA's \(1/\sqrt{p_{\min}}\) bound represents a fundamental improvement over LA's \(1/p_{\min}\).
- Practical contribution: GLA and GCA serve as direct drop-in replacements for existing loss functions.
- Importance of hypothesis class: The paper reveals the critical distinction between bounded and complete hypothesis classes in loss consistency analysis.
- Adaptive to imbalance degree: GCA adapts to varying imbalance levels through margin calibration.
Limitations & Future Work¶
- Extreme long-tail scenarios: Performance with more than 1,000 classes in extreme long-tail settings remains untested.
- Combination with other long-tail methods: Whether GLA/GCA can complement decoupled training, data augmentation, and similar approaches is unexplored.
- Fine-tuning large models: Performance under pretrain-then-finetune paradigms has not been validated.
- Computational overhead: Margin calibration increases the cost of hyperparameter tuning.
- Theory-practice gap: The theoretically optimal choice of \(\Phi\) does not fully align with the empirically optimal choice.
Related Work & Insights¶
- Logit Adjustment (Menon et al., 2021): The original LA loss, generalized in this work.
- Class-Balanced Loss (Cui et al., 2019): A classical class-weighted loss; GCA is its theoretically enhanced counterpart.
- H-consistency bounds (Awasthi et al., 2022): The core theoretical tool, applied in depth to the imbalanced classification setting.
- Focal Loss (Lin et al., 2017): An alternative approach to imbalance, orthogonal to the proposed methods.
- Prior work (Mao, Mohri, Zhong, 2023–2024): Theoretical contributions from the same team on problems such as multi-class abstention.
Rating¶
- Novelty: ★★★★☆ — Class-dependent margin design in GCA and stronger theoretical guarantees.
- Theoretical Depth: ★★★★★ — Rigorous and complete H-consistency analysis.
- Experimental Thoroughness: ★★★★☆ — Validated across multiple imbalance scales and datasets.
- Value: ★★★★☆ — Direct replacement for existing loss functions.
- Writing Quality: ★★★★★ — From a leading theory group; presentation is clear and well-structured.