ICML2025 Computational Biology Conformal Selection Multivariate Response FDR Control Nonconformity Score BH Procedure Regional Monotonicity Differentiable Sorting

Multivariate Conformal Selection¶

Conference: ICML2025
arXiv: 2505.00917
Code: None
Area: Optimization/Theory
Keywords: Conformal Selection, Multivariate Response, FDR Control, Nonconformity Score, BH Procedure, Regional Monotonicity, Differentiable Sorting

TL;DR¶

Extends Conformal Selection from univariate responses to multivariate settings, introduces the concept of Regional Monotonicity, designs distance-based (mCS-dist) and learning-based (mCS-learn) nonconformity scores, and guarantees finite-sample FDR control while improving selection power.

Background & Motivation¶

Limitations of Prior Work¶

Limitations of Prior Work: Ubiquity of selection problems: Drug discovery (screening compounds with high binding affinity), precision medicine (identifying positive treatment effects), and LLM output certification (filtering trustworthy generated content) all require selecting a subset of candidates that meet specific criteria.

Background¶

Background: Limitations of existing CS: Conformal Selection (Jin & Candès, 2023) only supports threshold selection for univariate responses $y > c$, failing to handle multi-dimensional criteria (e.g., LLM outputs satisfying fairness, safety, and correctness simultaneously).

Proposed Approach¶

Proposed Approach: Multivariate CP is not directly applicable: Confidence sets constructed by multivariate conformal prediction might be incompatible with the shape of the pre-defined target region $R$, and they only control PCER instead of FDR.

Key Challenge¶

Key Challenge: Goal: To construct a selection framework under the multivariate response setting that simultaneously satisfies: (1) finite-sample FDR control, (2) maximized selection power, and (3) being model-agnostic.

Method¶

Overall Architecture (Algorithm 1)¶

Training: Construct a multivariate predictive model $\hat{\mu}$.
Calibration: Compute regionally monotonic nonconformity scores $V_i = V(\bm{x}_i, \bm{y}_i)$ and construct conformal p-values.
Thresholding: Apply the BH procedure for multiple hypothesis testing correction to output the selection set $\mathcal{S}$.

Key Design 1: Regional Monotonicity (Definition 3.1)¶

\[V(\bm{x}, \bm{y}') \leq V(\bm{x}, \bm{y}), \quad \forall \bm{y}' \in R^c, \bm{y} \in R\]

This guarantees the conservativeness of conformal p-values (Proposition 3.2), thereby ensuring FDR control (Theorem 3.5).

Key Design 2: mCS-dist (Distance-based Score)¶

\[V(\bm{x}, \bm{y}) = D_1(\bm{y}, R^c) - D_2(\hat{\mu}(\bm{x}), R^c)\]

Regular score: $D_1 = D_2 = \inf_{\bm{s} \in R^c} \|\cdot - \bm{s}\|_p$
Clipped score (Superior): $D_1 = M \cdot \mathbb{1}\{\bm{y} \notin R^c \cup \partial R\}$. Theorem 4.1 proves that the clipped score outperforms the regular score in terms of asymptotic power.

Key Design 3: mCS-learn (Learning-based Score)¶

\[V^\theta(\bm{x}, \bm{y}) = M \cdot \mathbb{1}\{\bm{y} \notin R^c \cup \partial R\} - f_\theta(\bm{x}, \bm{y}; R)\]

Uses differentiable sorting (soft-rank) to approximate conformal p-values, optimizing $f_\theta$ via backpropagation.
Loss function $L_2$: Directly penalizes p-values, minimizing the p-values for samples inside the target region and increasing them for samples outside the region.
Proposition 4.2 proves that this family of scores contains the optimal nonconformity score.

Key Experimental Results¶

Simulated Data¶

Tested on 2D/5D/10D Gaussian mixtures and various target regions (convex/non-convex/irregular).
Both mCS-dist and mCS-learn maintain FDR $\leq q$ under all settings, outperforming baseline methods by a significant margin.
mCS-learn shows the most pronounced advantages in non-convex regions and high-dimensional scenarios.
When dimensions increase from 2 to 10, the power of mCS-dist decreases but still maintains FDR control, whereas the performance drop of mCS-learn is much smaller.

Real-world Data¶

Drug discovery datasets: mCS achieves the highest selection power under FDR control.
LLM alignment: In selection tasks with multi-dimensional alignment scores, mCS successfully screens outputs that meet multi-dimensional criteria simultaneously.

Baseline Comparison¶

Marginal CS (per-dimension independent CS + Bonferroni correction): Very low power due to overly conservative multiple testing correction.
CP-based selection: Only controls PCER, potentially leading to FDR inflation.
Oracle selection: Serves as an upper-bound reference where true responses are known.
Both mCS-dist and mCS-learn significantly outperform Marginal CS, approaching Oracle selection.

Highlights & Insights¶

Regional monotonicity is the core innovation, elegantly generalizing univariate monotonicity to arbitrary dimensions and target regions.
Expressive power of mCS-learn: Proposition 4.2 theoretically guarantees that the optimal score is covered within the learnable family.
Practical value: Provides a general uncertainty quantification framework covering drug discovery to LLM certification.
Modular design: The pre-trained model $\hat{\mu}$ can be separated from the selection process, allowing flexible integration.

Limitations & Future Work¶

Splitting a calibration set from the training set is required, which reduces the data available for model training.
mCS-learn requires an additional three-way split (train-validate-calibrate), leading to low data efficiency.
The target region $R$ needs to be pre-defined; exploring adaptive target regions remains future work.
Computing $\inf_{\bm{s} \in R^c} \|\cdot\|$ can be computationally expensive on complex regions.
The power may be suboptimal when $|R| $ is extremely small or extremely large.
Combining conditional density estimators with conformal p-values has not been explored.
Robustness to out-of-distribution (OOD) test data is not discussed.

Conformal Selection (Jin & Candès, 2023): Direct generalization of this work, extending from $y > c$ to $\bm{y} \in R$.
Multivariate Extensions of Conformal Prediction: Multivariate CP methods such as Bates et al. (2021) and Feldman et al. (2023) focus on constructing prediction sets rather than selection.
BH Procedure (Benjamini & Hochberg, 1995): The foundation of multiple testing correction for FDR control.
Differentiable Sorting (Blondel et al., 2020): Technical foundation for soft-rank in mCS-learn.
Insights: The concept of learning-based scores could be generalized to conditional density estimation or Bayesian non-parametric frameworks; learning adaptive target regions can also be explored.

Rating¶

Novelty: ⭐⭐⭐⭐ — Regional monotonicity is a simple yet powerful generalization.
Experimental Thoroughness: ⭐⭐⭐⭐ — Thorough validation across both simulation and real-world data.
Writing Quality: ⭐⭐⭐⭐⭐ — Exceptionally clear theoretical and algorithmic descriptions.
Value: ⭐⭐⭐⭐ — Provides a framework with rigorous statistical guarantees for multi-criteria selection problems.