Skip to content

One Sample is Enough to Make Conformal Prediction Robust

Conference: NeurIPS 2025 arXiv: 2506.16553 Code: None Area: Machine Learning / Uncertainty Quantification Keywords: conformal prediction, robustness, randomized smoothing, prediction sets, conformal risk control

TL;DR

This paper proposes RCP1 (Robust Conformal Prediction with One sample), which certifies the conformal procedure itself rather than individual conformity scores. Requiring only a single randomly perturbed forward pass at inference, RCP1 yields smaller robust prediction sets than state-of-the-art methods that require 100 forward passes.

Background & Motivation

Background: Conformal Prediction (CP) provides prediction sets with tunable probabilistic coverage guarantees for arbitrary black-box models. Robust CP (RCP) extends these guarantees to worst-case perturbations within a predefined radius.

Limitations of Prior Work: Randomized smoothing–based RCP methods require multiple forward passes per input (e.g., 100) to estimate smoothed conformity scores, incurring prohibitively high computational cost.

Key Challenge: A fundamental tension exists between robustness and computational efficiency — deterministic methods (e.g., RSCP+) produce overly large prediction sets, while smoothing-based methods (e.g., RSCP/SmoothFull) yield smaller sets at substantial computational expense.

Core Insight: Even with a single randomly perturbed forward pass, the conformal prediction procedure itself already possesses a certain degree of robustness.

Core Idea: Certify the conformal procedure rather than individual conformity scores, shifting the computational burden of smoothing from inference time to the calibration phase.

Method

Overall Architecture

Given a black-box model \(f\) and noise radius \(\epsilon\): 1. Calibration phase: Compute the threshold \(\hat{q}\) from conformity scores on calibration data. 2. Inference phase: Apply a single random perturbation \(\delta \sim \mathcal{N}(0, \sigma^2 I)\) to input \(x\) and compute \(f(x+\delta)\). 3. Certification: Use a binary certificate to determine whether the perturbed score is suitable for robust prediction.

Key Designs

  1. Procedure-Level Certification

  2. Conventional approach: robustly certify the smoothed conformity score \(\bar{s}(x)\) for each sample.

  3. Proposed approach: directly certify the coverage guarantee of the conformal procedure, i.e., \(\Pr[Y \in C_\epsilon(X)] \geq 1-\alpha\).
  4. Core inequality: exploits the distributional properties of the random perturbation \(\delta\) to establish a probabilistic relationship between \(s(x+\delta, y)\) and \(s(x', y)\), where \(x'\) denotes the adversarially perturbed sample.

  5. Binary Certificate

  6. For any binary certificate \(\phi(x, \delta)\), if \(\Pr_\delta[\phi=1] \geq p\), then \(x\) is certified robust within radius \(\epsilon\).

  7. Concrete realization: derives the optimal binary certificate via the Neyman–Pearson lemma.
  8. RCP1 requires only a single sample to check whether \(\phi=1\); a conservative prediction set is returned upon failure.

  9. Extension to Robust Conformal Risk Control

  10. Generalizes the framework to the broader conformal risk control setting.

  11. Applicable to both classification and regression tasks.

Theoretical Guarantees

  • Theorem 1: The coverage of RCP1 satisfies \(\Pr[Y \in C_\epsilon^{RCP1}(X)] \geq 1-\alpha\) for any adversarial perturbation within radius \(\epsilon\).
  • Theorem 2: The expected prediction set size of RCP1 is no larger than that of smoothed RCP with \(N\) samples; the two converge as \(N \to \infty\).

Key Experimental Results

Main Results — CIFAR-10 Classification (\(\epsilon = 0.25\), \(\alpha = 0.1\))

Method Forward Passes Avg. Set Size↓ Coverage
RSCP+ (deterministic) 1 4.82 0.912
RSCP (N=100) 100 2.31 0.903
SmoothFull (N=100) 100 2.15 0.907
RCP1 (Ours) 1 1.98 0.905

CIFAR-100 Classification (\(\epsilon = 0.25\), \(\alpha = 0.1\))

Method Forward Passes Avg. Set Size↓ Coverage
RSCP+ 1 28.7 0.914
RSCP (N=100) 100 14.2 0.906
SmoothFull (N=100) 100 12.8 0.909
RCP1 1 11.3 0.903

Ablation Study — Effect of Noise Radius \(\epsilon\) (CIFAR-10)

\(\epsilon\) RSCP+ Set Size SmoothFull Set Size RCP1 Set Size
0.125 2.41 1.52 1.38
0.25 4.82 2.15 1.98
0.5 7.93 3.87 3.52
1.0 9.85 6.14 5.71

Key Findings

  • RCP1 achieves smaller prediction sets than the 100-sample SOTA baseline using only a single forward pass.
  • The deterministic method RSCP+ produces sets 2–3× larger than RCP1.
  • Coverage guarantees are satisfied in all cases, consistent with theoretical requirements.
  • The advantage of RCP1 over baselines becomes more pronounced as \(\epsilon\) increases.
  • The method is also effective on regression tasks.

Highlights & Insights

  • Elegant conceptual shift: Transitioning from "certifying each score" to "certifying the entire procedure" elegantly bypasses the computational bottleneck of smoothing.
  • 100× speedup: Reducing inference-time forward passes from 100 to 1 has significant practical deployment value.
  • Theory–experiment consistency: Theoretical analysis of prediction set sizes aligns perfectly with empirical results.
  • Task-agnostic: Applicable to both classification and regression settings.

Limitations & Future Work

  • Robustness guarantees are restricted to \(\ell_2\)-norm ball perturbations; other threat models (\(\ell_\infty\), semantic perturbations) require separate analysis.
  • Single-sample estimation introduces randomness, which may yield conservative sets in extreme cases.
  • The choice of certificate affects performance; the optimal certificate requires knowledge of the noise distribution.
  • RSCP (Gendler et al. 2022): Pioneering work on smoothing-based robust conformal prediction.
  • RSCP+ (Yan et al. 2024): Deterministic robust conformal prediction.
  • Randomized Smoothing (Cohen et al. 2019): Foundational framework for adversarial robustness certification.
  • Inspiration: The procedure-level certification paradigm may generalize to other statistical inference tasks.

Rating

  • Novelty: ⭐⭐⭐⭐⭐ The procedure-level certification idea is pioneering.
  • Experimental Thoroughness: ⭐⭐⭐⭐ Validated across multiple datasets and tasks.
  • Writing Quality: ⭐⭐⭐⭐ Motivation and theoretical exposition are clearly presented.
  • Value: ⭐⭐⭐⭐⭐ The 100× speedup carries substantial significance for practical deployment.