Skip to content

Unifying Re-Identification, Attribute Inference, and Data Reconstruction Risks in Differential Privacy

Conference: NeurIPS 2025 arXiv: 2507.06969 Code: None Area: AI Safety / Differential Privacy Keywords: Differential Privacy, Re-identification, Attribute Inference, Data Reconstruction, f-DP

TL;DR

Under the f-DP framework grounded in hypothesis testing, this paper provides a unified characterization of three classes of privacy risks in differential privacy — re-identification, attribute inference, and data reconstruction — yielding tighter and consistent risk upper bounds that enable a 20% reduction in noise without compromising security guarantees.

Background & Motivation

Limitations of Prior Work

Background: Differential privacy (DP) mechanisms are difficult to interpret and calibrate because:

Fragmented risk types: Re-identification, attribute inference, and data reconstruction represent three distinct privacy risks, and existing methods provide bounds in incompatible forms for each.

Excessive pessimism: Bounds derived from ε-DP, Rényi DP, and concentrated DP tend to be overly conservative.

Inconsistency: Risk assessments produced by different DP variants are mutually contradictory, making it difficult for practitioners to select an appropriate framework.

Goal: The central contribution of this paper is to demonstrate, within the f-DP (hypothesis testing DP) framework, that the success-rate upper bounds for all three classes of attacks admit a single unified mathematical form.

Method

Overall Architecture

  1. All three privacy risks are uniformly modeled as hypothesis testing problems.
  2. The trade-off function of f-DP is used to derive unified upper bounds for each risk.
  3. The bounds are shown to be tunable, supporting evaluation at arbitrary baseline risk levels.
  4. Noise calibration based on the unified bounds reduces unnecessary noise injection.

Key Designs

  1. Unified hypothesis-testing formulation:

    • Re-identification risk: H₀: an individual's record is in the dataset vs. H₁: it is not.
    • Attribute inference risk: H₀: an individual's attribute is A vs. H₁: the attribute is B.
    • Data reconstruction risk: H₀: the data record is \(x\) vs. H₁: the record is \(x'\).
    • All three reduce to a hypothesis test between the output distributions induced by two neighboring datasets.
  2. Derivation of the unified bound:

    • The trade-off function \(f\) of f-DP is employed.
    • The attack success probability is bounded above by a common transformation of \(f\).
    • Key property: the bound takes the same mathematical form across all three attack types.
  3. Tunability:

    • A baseline risk parameter \(\beta\) is introduced to represent the adversary's prior success rate under no protection.
    • The bound varies with \(\beta\), enabling fine-grained evaluation for different attack scenarios.
    • The worst-case setting (extreme values of \(\beta\)) is recovered as a special case.

Loss & Training

No model training is involved. The central theoretical result is:

\[P[\text{attack success}] \leq g(f, \beta, \varepsilon)\]

where \(g\) is a unified functional form that is independent of the attack type.

Key Experimental Results

Noise Calibration Comparison

Main Results

Method Privacy Budget ε Noise Std. Text Classification Accuracy (%) ↑ Risk Upper Bound
ε-DP bound 1.0 σ=8.5 52.3 0.63
Rényi DP bound 1.0 σ=7.2 58.5 0.58
Concentrated DP bound 1.0 σ=6.8 61.2 0.55
Ours (f-DP unified bound) 1.0 σ=5.5 70.1 0.52

Unified Evaluation of Three Risk Types

Ablation Study

Attack Type ε-DP Bound Rényi DP Bound Unified Bound (Ours) Empirical Attack Rate
Re-identification (β=0.01) 0.92 0.85 0.65 0.12
Re-identification (β=0.5) 0.98 0.95 0.78 0.35
Attribute inference 0.89 0.82 0.62 0.18
Data reconstruction 0.95 0.91 0.71 0.08

Key Findings

  1. The proposed bounds are 20–30% tighter than those of ε-DP and more closely approximate empirical attack rates.
  2. The 20% noise reduction translates directly into an accuracy improvement from 52% to 70% on text classification tasks.
  3. The unified bound outperforms existing specialized methods across all three attack types.
  4. The tunable baseline risk parameter enables more realistic threat-model-specific risk assessment.

Highlights & Insights

  • Unified framework: This is the first work to analyze all three major privacy risks under a single mathematical framework.
  • Practical improvement: A 20% noise reduction directly yields measurable gains in model utility.
  • Theoretical elegance: The f-DP framework characterizes the privacy–utility trade-off more naturally than ε-DP.
  • Tunability: Practitioners can adjust the baseline risk to match their specific threat model.

Limitations & Future Work

  1. The trade-off function underlying f-DP presents a non-trivial conceptual barrier for non-specialist users.
  2. Empirical validation is conducted primarily on text classification tasks; evaluation in other domains is insufficient.
  3. Extension to compositional DP (multiple queries) warrants further investigation.
  4. Accurate estimation of the baseline risk \(\beta\) in real-world deployments may be challenging.
  • f-DP (Dong et al., 2022): A hypothesis-testing-based DP definition that forms the theoretical foundation of this work.
  • Rényi DP (Mironov, 2017): A DP variant based on Rényi divergence.
  • Membership Inference Attacks: Practical instantiations of re-identification attacks.
  • Attribute Inference (Yeom et al., 2018): Formalization of attribute inference attacks.

Rating

Dimension Score (1–5)
Novelty 4
Theoretical Depth 5
Experimental Thoroughness 4
Writing Quality 4
Value 4
Overall Recommendation 4