Skip to content

Impact of Dataset Properties on Membership Inference Vulnerability of Deep Transfer Learning

Conference: NeurIPS2025
arXiv: 2402.06674
Code: DPBayes/impact-dataset-properties-MI-vulnerability-deep-TL
Area: AI Security
Keywords: membership inference attack, differential privacy, transfer learning, power-law, privacy risk

TL;DR

This paper theoretically and empirically demonstrates a power-law relationship between membership inference attack (MIA) vulnerability and the number of samples per class in deep transfer learning: as the per-class sample count \(S\) increases, MIA advantage decays as \(S^{-1/2}\). However, the amount of data required to protect the most vulnerable samples is prohibitively large, highlighting the irreplaceable role of formal differential privacy guarantees.

Background & Motivation

Membership inference attacks (MIA) and differential privacy (DP) measure privacy leakage in machine learning from lower-bound and upper-bound perspectives, respectively. Their threat models differ: DP assumes an extremely strong adversary (with knowledge of all training data except the target point), while MIA assumes a more realistic adversary (with knowledge of only the data distribution).

Prior work has observed several isolated phenomena: - Models trained on more classes are more susceptible to MIA (Shokri et al., 2017) - Models trained on less data are more vulnerable (Chen et al., 2020) - Minority-class samples are more likely to be exposed (Chang and Shokri, 2021) - Large generalization error is sufficient for successful MIA (Song and Mittal, 2021)

Core Gap: None of these prior works quantitatively study the rate at which MIA vulnerability changes with dataset properties, nor do they analyze worst-case vulnerability under low false positive rate (FPR) conditions.

Motivation: Deep transfer learning (fine-tuning pretrained models) is widely used in privacy-sensitive settings where labeled data is often scarce. There is a need to establish a quantitative relationship between dataset properties (per-class sample count \(S\), number of classes \(C\)) and MIA vulnerability to guide practical privacy risk assessment.

Method

Problem Formulation

MIA vulnerability is defined as the true positive rate (TPR) at a fixed false positive rate (FPR). Two state-of-the-art black-box attacks are employed: - LiRA (Carlini et al., 2022): Based on likelihood ratio testing, using shadow models to estimate IN/OUT distributions; under Gaussian assumptions, this is the Neyman-Pearson optimal attack. - RMIA (Zarifzadeh et al., 2024): A more robust variant when the number of shadow models is limited.

Theoretical Analysis: Simplified Model

A simplified membership inference model amenable to closed-form analysis is constructed:

  1. Data generation: Class centers \(\bm{m}_c\) are sampled orthogonally on the high-dimensional unit sphere; for each class, \(2S\) Gaussian samples \(\bm{x}_c \sim \mathcal{N}(\bm{m}_c, \Sigma)\) are drawn.
  2. Classifier construction: \(CS\) samples are randomly selected to compute class means \(\hat{\bm{m}}_c\), and the inner product \(\langle \bm{x}, \hat{\bm{m}}_c \rangle\) is used as the classification score.
  3. Adversary's goal: Infer which vectors were used to train the classifier.

This simplified model resembles the linear classification head (Head) fine-tuning commonly used in transfer learning.

Core Theorem: Per-Sample Power-Law

Lemma 1: Reduces per-sample LiRA vulnerability to the location and scale parameters of the IN/OUT distributions.

Theorem 2 (Per-Sample Power-Law): Under the simplified model, for a fixed target sample \((\bm{x}, y)\):

\[\log(\text{tpr} - \text{fpr}) \approx -\frac{1}{2}\log S - \frac{1}{2}\Phi^{-1}(\text{fpr})^2 + \log\frac{|\langle \bm{x}, \bm{x} - \bm{m}_x \rangle|}{\sqrt{\bm{x}^T \Sigma \bm{x}} \sqrt{2\pi}}\]

Key implications: - Adversary advantage (tpr - fpr) decays as a power law \(S^{-1/2}\). - Samples farther from their class center (large \(\|\bm{x} - \bm{m}_x\|\)) are more vulnerable. - By the Cauchy-Schwarz inequality, if \(\|\bm{x} - \bm{m}_x\|\) is bounded, worst-case vulnerability is also bounded.

Corollary 4 (Average-Case Power-Law): After taking expectation over the data distribution, average MIA vulnerability likewise follows the \(-\frac{1}{2}\log S\) power law.

Regression Prediction Model

Based on the functional form derived theoretically, a linear regression model is fitted:

\[\log_{10}(\text{tpr} - \text{fpr}) = \beta_S \log_{10}(S) + \beta_C \log_{10}(C) + \beta_0\]

This model is used to predict MIA vulnerability from dataset properties.

Key Experimental Results

Experimental Setup

  • Pretrained models: ViT-Base-16 (ViT-B) and ResNet-50 (R-50), both pretrained on ImageNet-21k
  • Fine-tuning strategies: Head (linear classification head), FiLM (parameter-efficient fine-tuning), training from scratch
  • Attack configuration: LiRA + RMIA with \(M=256\) shadow models
  • Datasets: Subset of the VTAB benchmark (test accuracy >80%), including Patch Camelyon, EuroSAT, CIFAR-100, etc.
  • Hyperparameter tuning: Optuna + TPE, 20 iterations

Table 1: Minimum Per-Class Sample Count \(S\) Required to Match DP Guarantees (\(C=2\), \(\delta=10^{-5}\))

\(\epsilon\) Avg. fpr=0.1 Avg. fpr=0.01 Avg. fpr=0.001 Worst fpr=0.1
0.25 5,400 69,000 320,000 \(5.5 \times 10^9\)
0.50 1,100 16,000 88,000 \(2.6 \times 10^8\)
0.75 360 5,900 38,000 \(3.5 \times 10^7\)
1.00 160 2,700 19,000 \(7.0 \times 10^6\)

Key Finding: Even in the average case, matching \(\epsilon=1\) DP guarantees requires at least 2,700 samples per class (fpr=0.01); in the worst case, \(7 \times 10^6\) samples are needed—rendering data-volume-based protection practically infeasible.

Regression Model Fit and Generalization

Setting \(R^2\) Score Notes
Training set (ViT-B Head, fpr=0.001) 0.930 Excellent fit
Test set (R-50 Head) 0.790 Good cross-backbone generalization
R-50 FiLM Good Good cross-fine-tuning-strategy generalization
Training from scratch (Carlini et al. data) Underestimates Training from scratch is more vulnerable than fine-tuning

The regression coefficient \(\beta_S\) is approximately \(-0.5\) at higher FPR values, consistent with theoretical predictions.

Per-Sample Vulnerability Analysis

  • Quantile trends: The \(\beta_S\) values for the 99th, 99.9th, and 99.99th percentiles are \(-0.5603\), \(-0.5688\), and \(-0.4796\), respectively, close to the theoretical value of \(-0.5\).
  • Maximum vulnerability: \(\beta_S = -0.2695\), indicating a significantly slower decay rate.
  • When \(S \geq 32768\), the slope increases to \(-0.3478\), suggesting that the most vulnerable samples require substantially more data to be protected.

Highlights & Insights

  • Theory–experiment closed loop: The power-law relationship \(\text{tpr} - \text{fpr} \propto S^{-1/2}\) is derived from a simplified model and validated at scale, with theoretical predictions closely matching experimental observations.
  • Quantitative bridging of MIA and DP: This is the first work to quantitatively compare empirical MIA vulnerability with formal DP guarantees via power-law extrapolation, revealing the fundamental reason why increasing data volume cannot substitute for DP.
  • Worst-case analysis: Beyond average vulnerability, the paper systematically investigates per-sample worst-case vulnerability, finding that protecting the most vulnerable samples requires orders of magnitude more data than the average case.
  • Cross-architecture generalization: The regression model trained on ViT-B generalizes reasonably well to R-50 and FiLM fine-tuning, providing practical guidance.

Limitations & Future Work

  • Attack scope: The analysis primarily considers LiRA (optimal under the simplified model); future stronger attacks may necessitate revision of the conclusions.
  • Data distribution assumptions: The simplified model assumes intra-class Gaussian distributions; more complex settings such as heavy-tailed distributions are not analyzed.
  • Transfer learning focus: The theory and most experiments are limited to fine-tuning scenarios; training from scratch yields higher vulnerability and the power law may not hold.
  • Statistical rather than formal: MIA evaluation is inherently statistical and cannot provide the universal formal guarantees offered by DP.
  • Unexplained inter-class differences: Experiments reveal significant variation in vulnerability across classes, but the underlying causes are not analyzed in depth.
  • Adversary knowledge assumptions: The adversary is assumed to know only the target point, with the remaining training set drawn randomly; the power law may fail under stronger adversarial assumptions.
  • MIA attack methods: Shokri et al. (2017) introduced the shadow model framework; Carlini et al. (2022) proposed LiRA (likelihood ratio-based optimal attack); Zarifzadeh et al. (2024) proposed RMIA (more robust with fewer shadow models).
  • Dataset properties and privacy: More classes lead to higher vulnerability (Shokri 2017); minority-class samples are more vulnerable (Chang & Shokri 2021); high generalization error enables successful MIA (Song & Mittal 2021); however, none quantify the rate of change.
  • Memorization and fine-tuning: Feldman & Zhang (2020) found that training from scratch requires substantial memorization, while fine-tuning significantly reduces it; Tobaben et al. (2023) preliminarily reported the relationship between MIA and the number of shots in few-shot classification.
  • Worst-case MIA: Recent work by Guepin et al. (2024) and Meeus et al. (2024) focuses on worst-case vulnerability but does not establish quantitative relationships with dataset properties.
  • This paper's contribution: The first work to establish a quantitative power-law relationship between MIA vulnerability and dataset properties, covering both average and worst-case settings, with quantitative comparison against DP guarantees.

Rating

  • Novelty: ⭐⭐⭐⭐ — The theoretical derivation and experimental validation of the power-law relationship are well-motivated; the quantitative MIA–DP bridging analysis is original.
  • Experimental Thoroughness: ⭐⭐⭐⭐ — Multiple datasets, architectures, fine-tuning strategies, and 256 shadow models are employed; per-sample vulnerability analysis is comprehensive; in-depth experiments on training from scratch are somewhat lacking.
  • Writing Quality: ⭐⭐⭐⭐ — Theoretical derivations are rigorous; figures and tables are rich and clear; notation is consistent throughout.
  • Value: ⭐⭐⭐⭐ — Provides a quantitative tool for practical privacy risk assessment; the conclusion that "increasing data cannot substitute for DP" carries important practical implications.