Conformal Prediction as Bayesian Quadrature¶

Conference: ICML 2025 Oral
arXiv: 2502.13228
Code: None
Area: Interpretability
Keywords: Conformal prediction, Bayesian quadrature, uncertainty quantification, probabilistic numerical methods, distribution-free

TL;DR¶

Revisiting conformal prediction from a Bayesian perspective—proving that both split conformal prediction and conformal risk control are special cases of the Bayesian Quadrature framework, proposing practical Bayesian alternatives, and providing interpretable guarantees as well as a richer representation of future loss ranges.

Background & Motivation¶

Background: Distribution-free uncertainty quantification (such as conformal prediction) provides statistical guarantees for the deployment of black-box models—without requiring knowledge of how the model was trained. Conformal prediction guarantees that the prediction set/interval contains the true value with a probability of at least \(1-\alpha\).

Limitations of Prior Work: - Conformal prediction is based on frequentist probability—making it difficult to incorporate potential prior knowledge (such as partial information about the data distribution). - Frequentist guarantees control the "expected loss averaged over many datasets"—rather than "the loss on the actually observed dataset". - It only produces a single quantile estimate—without quantifying the uncertainty of the quantile itself. - It is challenging to incorporate additional structural assumptions (such as monotonicity or symmetry) to tighten the guarantees.

Key Challenge: Bayesian methods are thought to require prior distributions (hence not being "distribution-free"), whereas conformal prediction is distribution-free but inflexible. Is this conflict truly irreconcilable?

Goal: Re-unify and extend conformal prediction using Bayesian probability.

Key Insight: The core computation of conformal prediction—estimating quantiles from calibration data—is essentially a numerical integration problem (the inverse of the distribution function), which can be executed via Bayesian quadrature (a probabilistic numerical method).

Core Idea: Treat the empirical distribution of calibration scores as observations on a probability measure \(\rightarrow\) model the unobserved distribution function using a Gaussian Process (GP) prior \(\rightarrow\) the posterior provides a complete representation of uncertainty for the quantile (rather than just a point estimate). Without prior knowledge, this is equivalent to conformal prediction; with prior knowledge, it automatically leverages it.

Method¶

Overall Architecture¶

Given the losses/scores \(s_1, \ldots, s_n\) of the calibration set.
Model the quantile estimation as a Bayesian quadrature assignment.
Place a Gaussian Process prior on the distribution function \(F\).
Condition on observations \(\rightarrow\) posterior quantile distribution.
Extract prediction sets/risk guarantees from the posterior.

Key Designs¶

Quantile Estimation as Bayesian Quadrature:
- Function: Re-model the quantile calculation of conformal prediction as probabilistic numerical integration.
- Mechanism: \(\hat{q} = F^{-1}(1-\alpha)\), where \(F\) is the CDF of the calibration scores \(\rightarrow\) \(F\) is unknown but has \(n\) observations \(\rightarrow\) model \(F\) with a GP prior \(\rightarrow\) the posterior yields a distribution of \(F^{-1}(1-\alpha)\) (rather than a point estimate).
- Equivalence to Conformal Prediction: When using a "step function" prior, the posterior median is exactly equal to the quantile of conformal prediction—proving that conformal prediction is a special case of Bayesian quadrature.
- Design Motivation: A unified framework allows leveraging prior knowledge when available (e.g., knowing the distribution is unimodal), while automatically degrading to standard conformal prediction in the absence of priors.
Posterior Distribution of Quantiles:
- Function: Provide a complete representation of uncertainty for the quantile.
- Mechanism: Since the GP posterior yields the complete distribution of \(F \rightarrow\) the distribution of \(F^{-1}(1-\alpha)\) can also be calculated \(\rightarrow\) yielding a credible interval for the quantile.
- Value:
  - Standard conformal prediction states: "Under 95% confidence, the prediction set covers the true value."
  - The Bayesian version states: "The 90% credible interval of the quantile is [q_low, q_high], corresponding to a coverage rate between [93%, 97%]."
- Design Motivation: Richer information helps decision-makers understand the reliability of the guarantees.
Incorporation of Prior Knowledge:
- Function: Tighten guarantees when additional information is available.
- Mechanism: The kernel function of the GP prior encodes assumptions about \(F\)—e.g., a smoothness kernel \(\rightarrow\) assuming \(F\) is smooth; a monotonicity constraint \(\rightarrow\) ensuring \(F\) is monotonically increasing.
- Concrete Example: If the loss distribution is known to be symmetric \(\rightarrow\) imposing a symmetry constraint on the prior \(\rightarrow\) effectively doubles the sample size \(\rightarrow\) tightening the guarantees.
- Design Motivation: Distribution-free guarantees come with a cost—standard conformal prediction yields very loose guarantees when \(n\) is small, whereas prior knowledge can yield significant improvements.

Loss & Training¶

No training—purely an inference/post-processing method.
The GP posterior has an analytical solution (for standard kernels).
Computational complexity is \(O(n^3)\) (the standard cost of GP), which is practical for small/medium calibration sets (\(n < 10000\)).

Key Experimental Results¶

Main Results¶

Quality of coverage estimation under different calibration set sizes:

Method	n=50 coverage error ↓	n=200 coverage error ↓	Provides posterior?
Split Conformal	4.2%	1.8%	✗ (Point estimate)
Bayesian Quadrature (No Prior)	4.2%	1.8%	✓
Bayesian Quadrature (Smooth Prior)	2.8%	1.2%	✓
Bayesian Quadrature (Monotonic Prior)	2.1%	0.9%	✓

Unified Conformal Risk Control:

Method	Framework	Loss Guarantee at n=100
Standard Conformal Risk Control	Frequentist	\(\hat{\lambda}\) Point estimate
Bayesian Quadrature Version	Bayesian	\(\hat{\lambda}\) Distribution + Credible interval

Ablation Study¶

Type of Prior Knowledge	Coverage Error Improvement (n=50)	Description
No Prior (Default)	0% (Equivalent to conformal prediction)	Baseline
Smoothness Assumption	-33%	Smooth CDF
Monotonicity Constraint	-50%	Monotonically increasing CDF (always holds)
Symmetry Assumption	-45%	Applicable to symmetric loss distributions
Misspecified Prior (Incorrect Assumptions)	+15%	Prior misspecification carries a cost

Key Findings¶

In the absence of priors, Bayesian quadrature is exactly equivalent to standard conformal prediction—proving theoretical unification.
Reasonable priors can halve the coverage error on small calibration sets—the value of prior knowledge is greatest when \(n\) is small.
The posterior distribution provides much richer information—decision-makers can see "how certain" the guarantee is.
Misspecified priors lead to performance degradation—but the benefit of the Bayesian approach is that prior plausibility can be checked via the posterior.
Monotonicity constraints are almost always safe (the CDF is inherently monotonic) \(\rightarrow\) recommended as a default.

Highlights & Insights¶

"Bayesian probability is not in conflict with being distribution-free"—this epistemological insight is the most profound contribution of this paper. A Bayesian prior is an assumption about "the shape of \(F\)", not "the data distribution".
Standard conformal prediction is recovered as a special case of Bayesian quadrature—a unified perspective that eliminates the artificial divide between the two schools of thought.
The posterior distribution of quantiles is a highly practical new tool—upgrading from "whether 95% coverage is achieved" to "the 90% interval of coverage is [93%, 97%]".
The intersection of Probabilistic Numerics and uncertainty quantification is a novel and promising research direction.
For practitioners deploying safety-critical ML systems, posterior quantiles offer more responsible guarantees than point-estimate quantiles.

Limitations & Future Work¶

GP computational complexity \(O(n^3)\) is impractical for large calibration sets—requiring sparse/approximate GPs.
Choosing prior kernel functions requires domain knowledge.
Extensions to non-exchangeable data (such as time series) require conditional Bayesian quadrature.
Integration with active/online conformal prediction (adaptive conformal) remains unexplored.
Bayesian quadrature in multivariate/high-dimensional settings is more challenging.

vs Standard Conformal Prediction: An upgrade from frequentist point estimation to Bayesian posterior distribution, equivalent when no prior is used.
vs Bayesian approaches to conformal: Previous Bayesian conformal methods typically modify the predictor, whereas this paper keeps the predictor as a black box and only modifies the calibration step.
vs Probabilistic Numerics: A new direction applying probabilistic numerical quadrature to statistical inference.
Insight: Frequentist methods can often be re-understood and naturally extended within a Bayesian framework—incorporating prior knowledge is merely optional rather than mandatory.

Rating¶

Novelty: ⭐⭐⭐⭐⭐ Unifying conformal prediction into the Bayesian quadrature framework is a profound theoretical contribution.
Experimental Thoroughness: ⭐⭐⭐⭐ Synthetic + real-world data, comparing multiple priors.
Writing Quality: ⭐⭐⭐⭐⭐ Elegant theory and highly intuitive.
Value: ⭐⭐⭐⭐⭐ Has a foundational impact on uncertainty quantification methodologies.