Improve Representation for Imbalanced Regression through Geometric Constraints¶

Conference: CVPR 2025
arXiv: 2503.00876
Code: Yes
Area: Representation Learning / Human Understanding
Keywords: Imbalanced Regression, Geometric Constraints, Representation Uniformity, Enveloping Loss, Surrogate-driven Learning

TL;DR¶

This work is the first to study representation space uniformity in deep imbalanced regression (DIR). It proposes two geometric constraints, namely enveloping loss and homogeneity loss, to ensure that regression representations are uniformly distributed on the hypersphere. It also designs a surrogate-driven representation learning (SRL) framework to integrate global geometric constraints into mini-batch training, achieving SOTA on several DIR tasks such as age estimation.

Background & Motivation¶

Background: Imbalanced datasets are pervasive across various domains. In imbalanced classification, uniformity of the representation space has been shown to be crucial for effectively learning under-represented classes. Existing methods include decoupled training, contrastive learning, and class-level uniformization.
Limitations of Prior Work: Classification methods group features into discrete clusters, whereas regression tasks require continuous and ordered representations. Existing DIR methods primarily focus on training unbiased regressors (e.g., LDS, FDS, BalancedMSE) or modeling the relationship between label and feature spaces (e.g., RankSim, RNC, SupReMix), while neglecting how representations are distributed across the entire feature space.
Key Challenge: In imbalanced regression, feature spaces trained with vanilla loss are dominated by majority samples, causing representations in minority sample regions to be "compressed" into tiny spaces, which leads to poor prediction accuracy. However, uniformity metrics in classification (such as class center dispersion) are not applicable to continuous and ordered regression scenarios.
Goal: Define and quantify the concept of "uniformity" in regression representation spaces, and design loss functions that promote uniform distribution of regression representations on the hypersphere.
Key Insight: Compare the trajectory of regression representations to "winding yarn around a ball" — the yarn (latent trace) should cover the sphere as much as possible (envelopment) while remaining tight, smooth, and not loose (homogeneity).
Core Idea: Use enveloping loss to make the regression representation trace fully cover the hypersphere, use homogeneity loss to ensure that representations are uniformly and smoothly distributed along the trace, and apply global geometric constraints to batch training via a surrogate mechanism.

Method¶

Overall Architecture¶

Input: Imbalanced regression data pairs \((x_i, y_i)\). Output: Normalized representation \(z_i = f(x_i)\) from feature extractor \(f(\cdot)\), followed by a regression head for prediction. The core is to constrain the feature representations on a unit hypersphere and compute global geometric losses via the SRL framework. Training pipeline: In each mini-batch, (1) encode samples into the latent space; (2) calculate the mean of samples in the same bin to serve as the centroid; (3) use centroids from the previous epoch to fill in missing bins in the current batch; (4) calculate geometric losses on the complete surrogate; (5) update the surrogate at the end of the epoch (momentum update).

Key Designs¶

Enveloping Loss:
- Function: Encourage the trajectory of regression representations (latent trace) to cover the hypersphere surface as much as possible.
- Mechanism: Define the tubular neighborhood \(T(l, \epsilon)\) of trace \(l\) as all points on the hypersphere whose distance to trace is less than \(\epsilon\). Enveloping loss is defined as \(\mathcal{L}_{\text{env}} = -\text{vol}(T(l,\epsilon))/\text{vol}(\mathcal{U})\). For practical computation, a continuous-to-discrete approximation is used: uniformly sample \(N\) points on the hypersphere (Monte Carlo method), and calculate the proportion of points falling within the tubular neighborhood. To ensure differentiability, instead of using a hard threshold, the cosine similarity between each sample point and its nearest point on the trace is maximized (softened version).
- Design Motivation: The trace of regression representations is a line, whose volume is directly evaluated as zero. By expanding it into a tubular neighborhood, the "space coverage" of the trace is converted into an optimizable loss.
Homogeneity Loss:
- Function: Ensure that representations are uniformly distributed along the trace and that the trace is smooth without folding.
- Mechanism: Achieved by penalizing the arc length of the trace. The discrete form is \(\mathcal{L}_{\text{homo}} = \sum_{k=1}^{K-1}\frac{\|l(y_{k+1}) - l(y_k)\|^2}{y_{k+1} - y_k}\), which is the squared distance between adjacent centroids divided by the label difference. Theorem 1 proves that, given a fixed trace shape, the homogeneity loss is minimized if and only if representations are uniformly distributed along the trace (i.e., \(\|\nabla_y l(y)\| = c\), a constant).
- Design Motivation: Using only the enveloping loss might lead to an uneven distribution of representations along the trace (dense in majority regions, sparse in minority regions) or a jagged trace. Homogeneity loss serves as a regularizer to address both distribution uniformity and smoothness. Utilizing homogeneity loss alone would cause the features to collapse into a circle or a single point.
Surrogate-driven Representation Learning Framework (SRL Framework):
- Function: Integrate global geometric constraints into mini-batch-based SGD training.
- Mechanism: A mini-batch typically does not contain all label bins. SRL maintains a surrogate \(\mathcal{S}\) containing centroids for all \(K\) bins. In each batch, centroids are computed for present bins, and missing bins are filled with centroids from the previous epoch. Geometric losses are computed on the complete surrogate. Between epochs, \(\mathcal{S}\) is updated via momentum: \(\mathcal{S}^{e+1} \leftarrow \alpha \cdot \mathcal{S}^e + (1-\alpha) \cdot \hat{\mathcal{S}}^e\). A contrastive loss \(\mathcal{L}_{\text{con}}\) is also added to pull individual representations closer to their corresponding centroids and push them away from other centroids.
- Design Motivation: Geometric constraints only make sense when observing the representation distribution across the complete label range, but batch sampling is stochastic. The surrogate mechanism elegantly reconstructs the global view using historical information.

Loss & Training¶

The total loss is \(\mathcal{L}_\theta = \mathcal{L}_{\text{reg}} + \mathcal{L}_G + \mathcal{L}_{\text{con}}\), where \(\mathcal{L}_{\text{reg}}\) is the MSE regression loss, \(\mathcal{L}_G = \lambda_e \mathcal{L}_{\text{env}} + \lambda_h \mathcal{L}_{\text{homo}}\) is the geometric constraint, and \(\mathcal{L}_{\text{con}}\) is the centroid contrastive loss. In the first epoch, training is conducted using only MSE (as the surrogate is not yet initialized) using the AdamW optimizer with momentum updates.

Key Experimental Results¶

Main Results¶

On the AgeDB-DIR age estimation task (using MAE and GM as metrics):

Method	All MAE↓	Many MAE↓	Med MAE↓	Few MAE↓	All GM↓
Vanilla	7.67	6.66	9.30	12.61	4.85
LDS+FDS	7.55	7.03	8.46	10.52	4.86
RankSim	7.41	6.49	8.73	12.47	4.71
ConR	7.41	6.51	8.81	12.04	4.70
SRL (ours)	7.22	6.64	8.28	9.81	4.50

On the UCI-Airfoil dataset (using MAE):

Method	All↓	Many↓	Med↓	Few↓
Vanilla	5.66	5.11	5.03	6.75
RankSim	5.23	5.05	4.91	5.72
SRL (ours)	5.10	4.83	4.75	5.69

Ablation Study¶

Configuration	Observations	Explanation
Baseline (MSE only)	Features collapse into majority sample regions	Minority sample representations are compressed
SRL w/o \(\mathcal{L}_{\text{env}}\)	Features collapse into trivial shapes	Lacks enveloping constraints, failing to fully utilize the feature space
SRL w/o \(\mathcal{L}_{\text{homo}}\)	Features are unevenly distributed along the trace	Possesses envelopment but the distribution is not smooth
SRL (full)	Uniformly and smoothly fills the feature space	The two losses are complementary

Key Findings¶

Improvements are most significant in the few-shot regions (AgeDB: 12.61 \(\rightarrow\) 9.81, a 22% reduction), demonstrating that uniformity is crucial for minority samples.
Both enveloping loss and homogeneity loss are indispensable: t-SNE visualizations clearly demonstrate the degradation modes when either loss is omitted.
The surrogate mechanism is necessary to introduce global constraints to batch training — directly computing geometric losses on a batch is ineffective.
The introduction of the novel Imbalanced Operator Learning (IOL) task validates the effectiveness of the method in mapping function spaces.

Highlights & Insights¶

Yarn-on-a-sphere analogy: Explaining the regression representation uniformity problem with an intuitive geometric analogy makes the two concepts of envelopment and homogeneity highly comprehensible. This methodology of designing loss functions starting from geometric intuition can be generalized to other continuous space learning problems.
Clever design of the Surrogate mechanism: Fills in missing bins in the batch with historical centroids and maintains stability through momentum updates. This "memory bank-style" design effectively solves the compatibility issue between global constraints and batch training, making it transferable to other batch training scenarios requiring global information.
Theoretical backup of Theorem 1: Proves that when the trace shape is fixed, minimizing arc length is equivalent to a uniform distribution, providing a solid mathematical foundation for the homogeneity loss.

Limitations & Future Work¶

Enveloping loss requires Monte Carlo sampling of a large number of points to approximate the hypersphere, which may suffer from limited sampling efficiency in high-dimensional spaces.
The momentum update of the surrogate introduces delays, and the representation quality in early training stages (where only MSE is used in the first epoch) might hurt subsequent convergence.
Bin partitioning affects the performance, yet the sensitivity analysis of bin partitioning in the paper is not sufficiently thorough.
Future work could explore extending geometric constraints to multi-dimensional label regression (multi-objective regression) scenarios.
Compared to contrastive learning-based DIR methods (such as ConR, RNC), combining them is theoretically viable.

vs RankSim: RankSim focuses on order consistency between label and feature spaces but does not concern the coverage of features across the entire space. The proposed enveloping loss directly optimizes space utilization.
vs ConR: ConR models global and local label similarity with contrastive learning but does not explicitly constrain the uniformity of the feature space. These methods address different levels of problems and could be complementary.
vs SupReMix/RNC: These methods learn continuous and ordered representations, which serves as one of the baseline assumptions of this paper, but they do not solve how representations interact with the entire feature space.
The idea of extending the concept of uniformity from classification to regression is highly inspiring, and future research can continue to generalize it to other continuous prediction tasks (e.g., depth estimation, density estimation).

Rating¶

Novelty: ⭐⭐⭐⭐⭐ First to generalize the representation uniformity concept from classification to regression; geometric loss design is backed by theory.
Experimental Thoroughness: ⭐⭐⭐⭐ Validated across multiple datasets with rich visualization analysis, though experiments on large-scale datasets are lacking.
Writing Quality: ⭐⭐⭐⭐⭐ Intuitive analogies, mathematically rigorous, and outstanding visualizations.
Value: ⭐⭐⭐⭐ Opens a new perspective of representation learning in the DIR field, with widely applicable geometric constraint designs.