Unifying Re-Identification, Attribute Inference, and Data Reconstruction Risks in Differential Privacy¶
Conference: NeurIPS 2025 arXiv: 2507.06969 Code: None Area: AI Safety / Differential Privacy Keywords: Differential Privacy, Re-identification, Attribute Inference, Data Reconstruction, f-DP
TL;DR¶
Under the f-DP framework grounded in hypothesis testing, this paper provides a unified characterization of three classes of privacy risks in differential privacy — re-identification, attribute inference, and data reconstruction — yielding tighter and consistent risk upper bounds that enable a 20% reduction in noise without compromising security guarantees.
Background & Motivation¶
Limitations of Prior Work¶
Background: Differential privacy (DP) mechanisms are difficult to interpret and calibrate because:
Fragmented risk types: Re-identification, attribute inference, and data reconstruction represent three distinct privacy risks, and existing methods provide bounds in incompatible forms for each.
Excessive pessimism: Bounds derived from ε-DP, Rényi DP, and concentrated DP tend to be overly conservative.
Inconsistency: Risk assessments produced by different DP variants are mutually contradictory, making it difficult for practitioners to select an appropriate framework.
Goal: The central contribution of this paper is to demonstrate, within the f-DP (hypothesis testing DP) framework, that the success-rate upper bounds for all three classes of attacks admit a single unified mathematical form.
Method¶
Overall Architecture¶
- All three privacy risks are uniformly modeled as hypothesis testing problems.
- The trade-off function of f-DP is used to derive unified upper bounds for each risk.
- The bounds are shown to be tunable, supporting evaluation at arbitrary baseline risk levels.
- Noise calibration based on the unified bounds reduces unnecessary noise injection.
Key Designs¶
-
Unified hypothesis-testing formulation:
- Re-identification risk: H₀: an individual's record is in the dataset vs. H₁: it is not.
- Attribute inference risk: H₀: an individual's attribute is A vs. H₁: the attribute is B.
- Data reconstruction risk: H₀: the data record is \(x\) vs. H₁: the record is \(x'\).
- All three reduce to a hypothesis test between the output distributions induced by two neighboring datasets.
-
Derivation of the unified bound:
- The trade-off function \(f\) of f-DP is employed.
- The attack success probability is bounded above by a common transformation of \(f\).
- Key property: the bound takes the same mathematical form across all three attack types.
-
Tunability:
- A baseline risk parameter \(\beta\) is introduced to represent the adversary's prior success rate under no protection.
- The bound varies with \(\beta\), enabling fine-grained evaluation for different attack scenarios.
- The worst-case setting (extreme values of \(\beta\)) is recovered as a special case.
Loss & Training¶
No model training is involved. The central theoretical result is:
where \(g\) is a unified functional form that is independent of the attack type.
Key Experimental Results¶
Noise Calibration Comparison¶
Main Results¶
| Method | Privacy Budget ε | Noise Std. | Text Classification Accuracy (%) ↑ | Risk Upper Bound |
|---|---|---|---|---|
| ε-DP bound | 1.0 | σ=8.5 | 52.3 | 0.63 |
| Rényi DP bound | 1.0 | σ=7.2 | 58.5 | 0.58 |
| Concentrated DP bound | 1.0 | σ=6.8 | 61.2 | 0.55 |
| Ours (f-DP unified bound) | 1.0 | σ=5.5 | 70.1 | 0.52 |
Unified Evaluation of Three Risk Types¶
Ablation Study¶
| Attack Type | ε-DP Bound | Rényi DP Bound | Unified Bound (Ours) | Empirical Attack Rate |
|---|---|---|---|---|
| Re-identification (β=0.01) | 0.92 | 0.85 | 0.65 | 0.12 |
| Re-identification (β=0.5) | 0.98 | 0.95 | 0.78 | 0.35 |
| Attribute inference | 0.89 | 0.82 | 0.62 | 0.18 |
| Data reconstruction | 0.95 | 0.91 | 0.71 | 0.08 |
Key Findings¶
- The proposed bounds are 20–30% tighter than those of ε-DP and more closely approximate empirical attack rates.
- The 20% noise reduction translates directly into an accuracy improvement from 52% to 70% on text classification tasks.
- The unified bound outperforms existing specialized methods across all three attack types.
- The tunable baseline risk parameter enables more realistic threat-model-specific risk assessment.
Highlights & Insights¶
- Unified framework: This is the first work to analyze all three major privacy risks under a single mathematical framework.
- Practical improvement: A 20% noise reduction directly yields measurable gains in model utility.
- Theoretical elegance: The f-DP framework characterizes the privacy–utility trade-off more naturally than ε-DP.
- Tunability: Practitioners can adjust the baseline risk to match their specific threat model.
Limitations & Future Work¶
- The trade-off function underlying f-DP presents a non-trivial conceptual barrier for non-specialist users.
- Empirical validation is conducted primarily on text classification tasks; evaluation in other domains is insufficient.
- Extension to compositional DP (multiple queries) warrants further investigation.
- Accurate estimation of the baseline risk \(\beta\) in real-world deployments may be challenging.
Related Work & Insights¶
- f-DP (Dong et al., 2022): A hypothesis-testing-based DP definition that forms the theoretical foundation of this work.
- Rényi DP (Mironov, 2017): A DP variant based on Rényi divergence.
- Membership Inference Attacks: Practical instantiations of re-identification attacks.
- Attribute Inference (Yeom et al., 2018): Formalization of attribute inference attacks.
Rating¶
| Dimension | Score (1–5) |
|---|---|
| Novelty | 4 |
| Theoretical Depth | 5 |
| Experimental Thoroughness | 4 |
| Writing Quality | 4 |
| Value | 4 |
| Overall Recommendation | 4 |