Adaptive Bounding Box Uncertainties via Two-Step Conformal Prediction¶

Conference: ECCV 2024
arXiv: 2403.07263
Code: https://github.com/alextimans/conformal-od
Area: Object Detection
Keywords: Object Detection, Conformal Prediction, Uncertainty Quantification, Bounding Box Regression, Safety-Critical Systems

TL;DR¶

This paper proposes a two-step conformal prediction framework for uncertainty quantification in multi-object detection: the first step generates conformal prediction sets of class labels to handle classification errors, and the second step produces adaptive bounding box uncertainty intervals based on ensembles and quantile regression, providing practically useful tight prediction intervals while guaranteeing coverage.

Background & Motivation¶

Safety-critical applications (such as autonomous driving and mobile robotics) require precise quantification of models' predictive uncertainty. Existing object detection uncertainty methods (e.g., Bayesian inference, MC Dropout, deep ensembles) require substantial modifications to model architectures or training processes, and fail to provide coverage guarantees. Conformal Prediction (CP), as a distribution-free uncertainty quantification framework, can provide post-hoc, model-agnostic probabilistic guarantees. However, applying it to object detection poses two core challenges: (1) bounding box predictions depend on class labels, where classification errors lead to choosing the wrong conformal quantiles, invalidating the coverage guarantees; (2) fixed-width intervals generated by standard CP cannot adapt to object sizes, resulting in over-coverage for small objects and under-coverage for large objects. The key insight of this paper is to design a two-step conformal pipeline: first use CP to handle classification uncertainty, and then propagate it to the bounding box interval construction. Core Idea: Use class-conditional conformal prediction to simultaneously guarantee classification and localization coverage while making the intervals adaptive to object sizes.

Method¶

Overall Architecture¶

The overall framework is a two-step sequential conformal prediction pipeline: (1) Step 1 - Classification CP: Apply conformal prediction to the classification head of the object detector to generate a predicted class label set \(\hat{C}_L(X_{n+1})\), ensuring the true class is contained with a probability of \((1-\alpha_L)\); (2) Step 2 - Regression CP: Construct conformal prediction intervals for each bounding box coordinate individually, using the label set obtained from the first step to select the correct class-conditional quantiles to guarantee a joint coverage of \((1-\alpha_B)\) for the four coordinates. The overall coverage guarantee is \((1-\alpha_L)(1-\alpha_B)\).

Key Designs¶

Adaptive Bounding Box Methods (Box-Ens / Box-CQR):
- Function: Generate prediction intervals that adaptively scale with the object size.
- Mechanism: Box-Ens (conformalized ensembles) uses normalized residuals as the non-conformity score \(s = |c^k - \hat{c}^k| / \hat{\sigma}(X)\), where \(\hat{\sigma}\) represents the standard deviation predicted by the ensemble detector, allowing the generated intervals to scale based on model uncertainty. Box-CQR (conformalized quantile regression) trains an extra quantile regression head to predict the upper and lower quantiles \(\hat{Q}_{\alpha_B/2}\) and \(\hat{Q}_{1-\alpha_B/2}\), with the interval width naturally determined by the quantile predictions.
- Design Motivation: Standard CP (Box-Std) produces fixed-width intervals that are not wide enough for large objects and overly conservative for small objects, which disrupts the balance of coverage across different object sizes.
Two-Step Conformal Pipeline and ClassThr:
- Function: Propagate classification uncertainty to bounding box intervals, extending coverage guarantees to misclassified objects.
- Mechanism: Use a class-conditional conformal classifier (ClassThr) to generate a label prediction set \(\hat{C}_L(X_{n+1}) = \{y \in \mathcal{Y}: \hat{\pi}_y(X_{n+1}) \geq 1 - \hat{q}_L^y\}\), and then apply a max strategy to select the bounding box quantile from the label set: \(\hat{q}_B^k = \max\{\hat{q}_B^{k,y}\}_{y \in \hat{C}_L(X_{n+1})}\). Setting \(\alpha_L=0.01\) ensures \((1-\alpha_L)(1-\alpha_B) \approx (1-\alpha_B)\).
- Design Motivation: Prior work only provided guarantees on correctly classified objects, which has limited utility in multi-class scenarios (e.g., co-existence of car/person/bicycle in autonomous driving). The two-step method extends the coverage guarantees to all detected objects.
Multiple Testing Correction (Max-Rank):
- Function: Address the multiple testing problem when performing CP on each of the four coordinates individually.
- Mechanism: Performing CP on \(m\) coordinates individually is equivalent to conducting \(m\) hypothesis tests in parallel, where a naive Bonferroni correction is overly conservative. An improved max-rank method based on the Westfall & Young permutation correction is adopted, operating in the rank space and leveraging the positive correlation structure between coordinates to obtain a tighter correction.
- Design Motivation: Bonferroni assumes independence, but bounding box coordinates are naturally highly correlated (as they jointly parameterize a box). Leveraging this correlation structure helps avoid over-conservatism.

Loss & Training¶

CP is a post-hoc method that does not require modifying model training. The Box-CQR method requires adding quantile regression heads to the detector and training them using quantile loss. The ensemble method requires training multiple independent detectors. Key hyperparameters: \(\alpha_L=0.01\) (99% label coverage), \(\alpha_B=0.1\) (90% bounding box coverage), and an IoU threshold of 0.5 for Hungarian matching.

Key Experimental Results¶

Main Results¶

Comparison with previous methods on the COCO dataset (target coverage 90%, averaged across classes):

Method	Detector	MPIW (Two-sided)	Coverage (Two-sided)	MPIW (One-sided)	Coverage (One-sided)
Deep Ensembles	5×Faster R-CNN	12.31	0.21 ❌	74.15	0.49 ❌
GaussianYOLO	YOLOv3	7.00	0.08 ❌	87.07	0.35 ❌
Andéol et al. (Best)	Faster R-CNN	N/A	-	87.62	0.91 ✓
Box-Std (Ours)	Faster R-CNN	55.47	0.88 ✓	85.42	0.88 ✓
Box-Std (Ours)	Sparse R-CNN	41.92	0.89 ✓	77.33	0.89 ✓

Ablation Study¶

Configuration	Coverage	MPIW	Description
Box-Std (Fixed Width)	Valid	Min	Most efficient but uneven coverage for small/large objects
Box-Ens (Ensemble-adaptive)	Valid	Slightly larger	More balanced coverage across small, medium, and large objects
Box-CQR (Quantile-adaptive)	Valid	Medium	Significant improvement in coverage for large objects
Top (Single class label)	❌ Invalid	Min	Relies on classification accuracy, no guarantees
Naive (Density-based levels)	❌ Invalid Label	Small	Sensitive to model calibration
ClassThr (Conformal Threshold)	✓ Valid	Medium	The only method meeting both label and box guarantees

Key Findings¶

Traditional uncertainty methods such as Deep Ensembles and GaussianYOLO severely under-cover (coverage of only 0.08-0.49), proving that methods lacking guarantees are unreliable in safety-critical scenarios.
Box-Ens achieves the most balanced coverage across different object sizes: the coverage for large objects improves significantly at the cost of a slightly larger MPIW.
The average size of label sets generated by ClassThr is \(\leq 4\), indicating that label CP introduces very low overhead and does not cause excessive expansion of bounding box intervals.
The max-rank correction yields significantly tighter intervals compared to Bonferroni, validating the value of leveraging the correlation structure of coordinates.

Highlights & Insights¶

An end-to-end safe bounding box uncertainty framework is proposed: post-hoc, efficient, and generalizable, without requiring modifications to the underlying detectors.
The design of the two-step method is highly elegant: by explicitly propagating classification uncertainty to localization uncertainty, it provides truly practical safety guarantees.
The choice of class-conditional guarantees is stronger and more practical than marginal guarantees, avoiding the problem of uneven coverage across different classes.
Controllable trade-off of coverage: users can flexibly balance safety requirements of classification and localization by adjusting \(\alpha_L\) and \(\alpha_B\).

Limitations & Future Work¶

Guarantees are only provided for detected true positives, while missed detections (false negatives) are not handled, which is an inherent limitation of the CP framework itself.
The max strategy is somewhat conservative in selecting quantiles, which tends to cause over-coverage in the ClassThr method. A weighted quantile selection strategy based on confusion matrices could be explored.
The exchangeability assumption requires the data distribution to remain constant; thus, guarantees may fail under distribution shift scenarios (e.g., weather changes).
Currently validated only on 2D object detection; extensions to 3D object detection, instance segmentation, etc., require further exploration.
Box-Ens requires training multiple detectors, which incurs high computational costs.

Core difference from Andéol et al.: Extended to multi-class settings and addresses classification errors, whereas prior work only provided guarantees on single-class and correctly classified objects.
Insight: The CP framework provides a highly valuable post-hoc guarantee tool for black-box models, which is particularly suitable for deployment scenarios where model architectures cannot be modified.
The concept of two-step sequential CP can be generalized to other multi-stage prediction tasks (e.g., segmentation followed by classification), where upstream uncertainty needs to be propagated downstream.
The max-rank multiple testing correction can be applied to any scenario involving multi-dimensional conformal prediction.

Rating¶

Novelty: ⭐⭐⭐⭐ The two-step conformal framework and adaptive bounding box methods are meaningful methodological innovations.
Experimental Thoroughness: ⭐⭐⭐⭐⭐ Extremely thorough, utilizing three datasets, multiple detectors, various baselines, and statistics over 1000 trials.
Writing Quality: ⭐⭐⭐⭐⭐ Rigorous theory, complete notation, clear logical derivation, and discussions that are highly valuable to practitioners.
Value: ⭐⭐⭐⭐ Direct application value to safety-critical domains such as autonomous driving, with a framework that possesses excellent scalability.