Locally Optimal Private Sampling: Beyond the Global Minimax¶
Conference: NeurIPS 2025 arXiv: 2510.09485 Code: GitHub Area: AI Security Keywords: Local Differential Privacy, Private Sampling, Minimax Optimality, \(f\)-Divergence, Public Data
TL;DR¶
Under local differential privacy (LDP), this paper proposes a local minimax framework that leverages neighborhood constraints defined by a public distribution \(P_0\) to derive closed-form optimal samplers, achieving consistent and significant improvements over the global minimax sampler both theoretically and empirically.
Background & Motivation¶
Background: Local differential privacy (LDP) allows users to randomize data on-device before sharing, eliminating the need to trust a central server. It has been widely deployed by Google, Apple, and Microsoft. Private sampling refers to the task of generating a single sample from a distribution \(P\) subject to LDP constraints.
Limitations of Prior Work: Park et al. (NeurIPS'24) established a global minimax framework that finds the worst-case optimal sampler. However, the global minimax criterion is overly pessimistic—it assumes the adversary may select an arbitrary distribution, whereas real-world data distributions are unlikely to be worst-case (degenerate) distributions.
Key Challenge: The global minimax optimal sampler applies a uniform transformation across all input distributions and cannot exploit the practical prior knowledge that the data distribution lies near a known reference distribution, leading to suboptimal privacy–utility tradeoffs.
Goal: Establish a local minimax framework—when the data distribution \(P\) is known to reside near a known public distribution \(P_0\), can better private samplers be designed?
Key Insight: The neighborhood \(N_\gamma(P_0) = \{P: E_\gamma(P\|P_0) = E_\gamma(P_0\|P) = 0\}\) is defined via the \(E_\gamma\)-divergence (hockey-stick divergence), and the minimax optimal sampler is derived under this constraint. A key observation is that the local minimax risk equals the global minimax risk restricted to distributions within the neighborhood.
Core Idea: Extend the global minimax framework from pure LDP to functional LDP, then show that the locally minimax optimal sampler is the global optimal sampler restricted to the \(P_0\)-neighborhood, yielding a closed-form solution that is independent of the choice of \(f\)-divergence.
Method¶
Overall Architecture¶
The theoretical construction proceeds in two layers: 1. Global minimax under functional LDP: Generalizes the results of Park et al. to general functional LDP (encompassing approximate DP and Gaussian LDP). 2. Local minimax: Derives the neighborhood-constrained optimal sampler from the global result.
Key Designs¶
1. Global Minimax Optimal Sampler (Theorem 3.4)¶
- Function: Finds the minimax optimal sampler satisfying \(g\)-FLDP for the distribution class \(\tilde{\mathcal{P}}_{c_1,c_2,h}\) and any \(f\)-divergence.
- Core Formula: $\(q_g^\star(P)(x) = \lambda^\star p(x) + (1 - \lambda^\star) h(x)\)$ where \(\lambda^\star_{c_1,c_2,g} = \inf_{\beta \geq 0} \frac{e^\beta + \frac{c_2 - c_1}{1-c_1}(1 + g^*(-e^\beta)) - 1}{(1-c_1)e^\beta + c_2 - 1}\)
- Interpretation: With probability \(\lambda^\star\), draw from the original distribution \(P\); with probability \(1 - \lambda^\star\), draw from the reference distribution \(h\)—a natural privacy–utility mixture.
2. Locally Minimax Optimal Sampler — Functional LDP (Theorem 4.1)¶
- Function: Solves for the minimax optimal sampler over the neighborhood \(N_\gamma(P_0)\).
- Mechanism: Shows that the local minimax risk equals the global minimax risk obtained by setting \(c_1 = 1/\gamma\), \(c_2 = \gamma\), and \(h = p_0\).
- Formula: The local solution follows directly by substituting these values of \(c_1\), \(c_2\), and \(h\) into the global result.
- Design Motivation: The neighborhood \(N_\gamma(P_0)\) constrains the density ratio \(p(x)/p_0(x) \in [1/\gamma, \gamma]\), which naturally corresponds to the distribution class parameters in the global framework.
3. Locally Minimax Optimal Sampler — Pure LDP (Theorem 5.1)¶
- Function: Derives a stronger, pointwise optimal sampler under pure \(\varepsilon\)-LDP.
- Core Formula: $\(q(x) = \text{clip}\left(\frac{1}{r_P} p(x); \frac{\gamma+1}{\gamma+e^\varepsilon} p_0(x), \frac{(\gamma+1)e^\varepsilon}{\gamma+e^\varepsilon} p_0(x)\right)\)$
- Key Advantage: This is a nonlinear sampler that adapts pointwise to each input distribution \(P\) via clipping, strictly dominating the linear sampler of Theorem 4.1 (Proposition 5.2).
- Design Motivation: The linear sampler applies a uniform transformation across all distributions, whereas the nonlinear sampler minimizes the \(f\)-divergence for each specific \(P\) in an instance-optimal manner.
Theoretical Properties¶
- \(f\)-Divergence Independence: The structure of the optimal sampler does not depend on the specific choice of \(f\)-divergence (KL, TV, Hellinger, etc.).
- Pointwise Dominance: \(D_f(P \| \mathbf{Q}^\star_\varepsilon(P)) \leq D_f(P \| \mathbf{Q}^\star_{g_\varepsilon}(P))\) for all \(P \in N_\gamma(P_0)\).
- Generality: Pure LDP, approximate LDP, and Gaussian LDP are all special cases of functional LDP.
Key Experimental Results¶
Discrete Setting (\(k=20\), uniform reference)¶
| \(\varepsilon\) | Metric | Global Optimal | Local Optimal |
|---|---|---|---|
| 0.1 | KL | ~0.18 | ~0.065 |
| 0.5 | KL | ~0.14 | ~0.035 |
| 1.0 | KL | ~0.10 | ~0.018 |
| 2.0 | KL | ~0.04 | ~0.004 |
| 0.1 | TV | ~0.14 | ~0.09 |
| 1.0 | TV | ~0.08 | ~0.04 |
The local sampler consistently outperforms the global sampler across all privacy levels and divergence metrics.
Continuous Setting (1D Laplace mixture, 100 client distributions)¶
| \(\varepsilon\) | Metric | Global Worst-Case | Local Worst-Case |
|---|---|---|---|
| 0.1 | KL | ~0.18 | ~0.04 |
| 0.5 | KL | ~0.12 | ~0.02 |
| 1.0 | KL | ~0.07 | ~0.008 |
| 2.0 | KL | ~0.025 | ~0.002 |
| 0.1 | Hellinger² | ~0.06 | ~0.02 |
| 1.0 | Hellinger² | ~0.028 | ~0.003 |
Key Findings¶
- Consistent and significant improvement: The local sampler reduces the worst-case \(f\)-divergence by a factor of 2–10 across all settings.
- Larger gains at low privacy budgets (small \(\varepsilon\)): At \(\varepsilon=0.1\), KL divergence drops from 0.18 to 0.04.
- Pure LDP nonlinear sampler outperforms functional LDP linear sampler: Pointwise clipping yields additional gains.
- Effectiveness under Gaussian LDP: Experimental results are provided in the appendix.
Highlights & Insights¶
- Elegant theoretical reduction: The local minimax problem reduces to a specially parameterized global minimax problem in a single step.
- \(f\)-Divergence independence: A remarkably strong theoretical guarantee—optimality holds regardless of the divergence used to measure utility.
- Clipping as optimality: The nonlinear sampler under pure LDP is essentially a clipping operation—both intuitive and concise.
- Effective use of public data: Unlike Zamanlooy et al., the public data here genuinely reduces the minimax risk.
- From global pessimism to local realism: The paper provides theoretical justification that leveraging prior information can substantially improve the privacy–utility tradeoff.
Limitations & Future Work¶
- Evaluation limited to synthetic data: Experiments on high-dimensional real-world datasets are absent.
- Computational cost: The clipping sampler incurs high computational overhead in high dimensions due to the normalization constant \(r_P\).
- Selection of neighborhood parameter \(\gamma\): How to determine \(P_0\) and \(\gamma\) in practice remains an open problem.
- Single-sample setting: The framework addresses single-sample generation only; extensions to the multi-sample regime are not considered.
- Integer constraint \(\gamma \in \mathbb{N}\): A technical limitation, though it can be overcome via amplification.
- Generalization from \(E_\gamma\)-divergence neighborhoods to general \(f\)-divergence neighborhoods: Left for future work.
Related Work & Insights¶
- Relation to Park et al. (NeurIPS'24): This work builds directly on their global framework; the key innovation is the introduction of neighborhood constraints.
- Distinction from Zamanlooy et al.: Their approach requires samplers to preserve the public prior (a hard constraint), whereas this paper treats public data as the center of a neighborhood (a soft constraint)—the latter is more flexible and provably reduces the minimax risk.
- Feldman et al.: Employ the same \(E_\gamma\)-neighborhood definition, but in the context of density estimation rather than sampling.
- Broader inspiration: Application of local minimax to other DP problems (distribution estimation, learning, generation).
Rating¶
⭐⭐⭐⭐ (4/5) - The theoretical contributions are elegant and refined; the reduction from global to local minimax is impressive. - However, the work lacks experiments on real-world data and validation of high-dimensional scalability.