A Closer Look at GAN Priors: Exploiting Intermediate Features for Enhanced Model Inversion Attacks¶

Conference: ECCV 2024
arXiv: 2407.13863
Code: https://github.com/final-solution/IF-GMI
Area: Image Generation
Keywords: Model Inversion Attack, GAN Prior, Intermediate Features, StyleGAN2, OOD

TL;DR¶

Proposed IF-GMI, which decomposes the generator of a pre-trained StyleGAN2 into multiple blocks and optimizes intermediate features layer-by-layer (incorporating an $\ell_1$-ball constraint to prevent image collapse). This expands the search space of model inversion attacks from the latent space to intermediate features, boosting attack accuracy in OOD scenarios by up to 38.8%.

Background & Motivation¶

Model Inversion (MI) attacks aim to reconstruct privacy-sensitive training images (such as faces) from a deployed classifier, posing a significant threat to privacy and security in deep learning. In recent years, GAN-based MI attacks have become mainstream: they first pre-train a GAN on public data, and then optimize the latent code of the GAN to generate images matching the target class.

However, existing methods (such as GMI, KEDMI, PPA, PLGMI) treat the GAN as a black box and only perform optimization in the input latent space. This leads to two major limitations of prior work: 1. Insufficient semantic extraction: The latent code is located at the very front of the GAN, far from the output, limiting its representation capacity. This degrades severely in OOD (Out-Of-Distribution) scenarios where the distributions of the public and private dataset differ significantly. 2. Poor transferability: When the training distribution of the GAN differs substantially from the target data (e.g., using MetFaces art paintings vs. real faces), optimizing only the latent code can barely bridge the distribution gap.

Core Problem¶

How to overcome the representation bottleneck of the GAN latent space and leverage the rich hierarchical semantic information inside the GAN to improve model inversion attacks, especially in scenarios where the training data distribution and the GAN prior are severely mismatched (OOD)?

This problem is crucial because, in reality, attackers are unlikely to obtain public data with a distribution close to the private data to train the GAN; thus, OOD scenarios represent a more realistic threat model.

Method¶

Overall Architecture¶

The core mechanism of IF-GMI is "decomposing GAN $\to$ layer-by-layer search of intermediate features". The specific pipeline is as follows:

Sampling and initial selection: Sample a large number of $\mathbf{z}$ from a Gaussian distribution, map them to the $\mathcal{W}$ space using the StyleGAN2 Mapping Network to obtain $\mathbf{w}$, and then use data augmentation and target classifier scores to select high-quality initial vectors $\mathbf{w}_{init}$.
Latent code optimization: Perform standard optimization on $\mathbf{w}$ in the $\mathcal{W}$ space first (similar to PPA).
Layer-by-layer intermediate feature optimization: Decompose the Synthesis Network into $L+1$ blocks, optimize the intermediate features $\mathbf{f}^{(i)}$ sequentially from layer 1 to layer $L$, with each layer updated under an $\ell_1$-ball constraint.
Output and selection: Generate the final reconstructed image.

Key Designs¶

Intermediate Features Optimization: This is the core contribution of this work. After decomposing $G_{syn} = G_{L+1} \circ G_L \circ \cdots \circ G_1$, gradient optimization is performed on the features $\mathbf{f}^{(i)}$ between each block. During optimization, both $\mathbf{f}^{(i)}$ and $\mathbf{w}^{(i)}$ are updated. The advantage is that intermediate features are closer to the output than the latent code, possessing stronger representation capacity and semantic control. The early blocks control the overall structure (pose, face shape), while the later ones control local details (eye openness, hair strands). Optimizing layer-by-layer allows for fine-tuning features at different granularities.
$\ell_1$-ball Constraint: Direct optimization of intermediate features can easily lead to generation collapse (features deviating from the manifold learned by the GAN). Therefore, a constraint $\|\mathbf{f}^{(i)} - \mathbf{f}^{(i)}_0\|_1 \leq r[i]$ is imposed on each layer's features, where the radius sequence is set to be increasing (e.g., $[1000, 2000, 3000, 4000]$) to allow deeper features more degrees of freedom for adjustment. This design is critical—without it, the model can achieve high prediction confidence but the generated images lack realism.
Using Pre-trained StyleGAN2 instead of Training a GAN from Scratch: Unlike GMI/KEDMI, which require training a specialized GAN tailored to the target model, IF-GMI directly utilizes a publicly available pre-trained StyleGAN2. This ensures attack flexibility and cross-model/cross-dataset transferability.

Loss & Training¶

Identity Loss: Poincaré loss is adopted instead of cross-entropy to avoid gradient vanishing when prediction confidence approaches 1: $$\mathcal{L}_{id} = \text{arccosh}\left(1 + \frac{2\|v_1 - v_2\|_2^2}{(1-\|v_1\|_2^2)(1-\|v_2\|_2^2)}\right)$$ where $v_1$ is the normalized prediction confidence, and $v_2$ is the one-hot target vector (where 1 is replaced by 0.9999 to avoid division by zero).
Optimizer: Adam, lr=0.005, $\beta=(0.1, 0.1)$.
Iteration Steps: $[50, 10, 10, 10]$ for FaceScrub, $[70, 25, 25, 25]$ for CelebA (more steps for latent code optimization, fewer steps for intermediate layer optimization).
Layer Selection: $L=3$ (first 3 intermediate feature layers), determined via small-scale pilot experiments—too small of an $L$ causes underfitting, while a too large $L$ affects local details, leading to overfitting.

Key Experimental Results¶

Dataset (Public $\to$ Private)	Target Model	Metric	PPA	IF-GMI	Gain
FFHQ $\to$ FaceScrub	ResNet-18	Acc@1	0.812	0.830	+1.8%
MetFaces $\to$ FaceScrub	ResNet-18	Acc@1	0.775	0.926	+15.1%
FFHQ $\to$ CelebA	ResNet-152	Acc@1	0.841	0.947	+10.6%
MetFaces $\to$ CelebA	ResNet-152	Acc@1	0.396	0.784	+38.8%
MetFaces $\to$ FaceScrub	ResNet-152	Acc@1	0.731	0.904	+17.3%
MetFaces $\to$ FaceScrub	ResNeSt-101	Acc@1	0.750	0.922	+17.2%
MetFaces $\to$ FaceScrub	DenseNet-169	Acc@1	0.798	0.933	+13.5%
AFHQ $\to$ Stanford Dogs	ResNet-152	Acc@1	0.950	0.982	+3.2%

The improvement is most significant in the extreme OOD scenario of MetFaces (art paintings) $\to$ real faces, demonstrating that intermediate feature optimization effectively bridges the distribution gap.

Ablation Study¶

Intermediate feature optimization contributes the most: Removing it drops Acc@1 from 0.947 to 0.803 (-14.4%) and increases FID from 37.46 to 43.58.
$\ell_1$-ball constraint: Removing it leaves Acc@1 almost unchanged (0.945 vs 0.947), but slightly increases FID (37.53 vs 37.46), showing that the constraint primarily preserves image quality rather than attack accuracy.
Choice of layer $L$: $L=3$ achieves the best balance; there are also differences in the split locations when $L=1$, with the first 3-4 blocks yielding the best performance as splitting points.
Robustness: Under BiDO defense, IF-GMI drops only 14.1% (0.906 $\to$ 0.765), whereas PPA drops much more (0.619 $\to$ 0.356).

Highlights & Insights¶

Deconstructing GAN from a black box to a modularized tool: This is a direct yet previously unexplored path—hierarchical semantic information inside the GAN was historically wasted. This work is the first to systematically exploit it in MI attacks.
$\ell_1$-ball constraint is a simple yet effective trick: Constraining the optimization drift of high-dimensional intermediate features prevents generation collapse at almost zero cost while bringing notable gains.
Massive improvement under OOD scenarios: A +38.8% boost on MetFaces $\to$ CelebA is highly impressive, proving that intermediate features possess stronger cross-distribution adjustment capabilities than latent codes.
Transferable to other scenarios: The concept of intermediate layer feature optimization can be generalized to tasks like GAN inversion and image editing.

Limitations & Future Work¶

Relatively high FID: The authors concede that the reconstructed images suffer from higher FID scores—optimizing high-dimensional intermediate features is highly complex, and they continue to use loss functions designed for latent codes without tailor-made optimization strategies for intermediate features.
Manual tuning required for layer $L$: The optimal number of layers and splitting method depend on specific dataset combinations, requiring empirical search on a small scale.
Limited to StyleGAN2: Although transferability is better than prior methods, it is only validated on StyleGAN2; its efficacy on newer generative models like diffusion models remains unknown.
White-box assumption: The study only considers white-box attack scenarios; the applicability in black-box trials is not explored.
Potential directions: Designing specialized regularization/loss functions for intermediate features; combining with diffusion model priors; extending to federated learning scenarios.

vs PPA (ICML 2022): PPA also uses a pre-trained StyleGAN2, but only optimizes the latent code in the $\mathcal{W}$ space. IF-GMI decomposes the Synthesis Network on top of this to optimize intermediate features layer-by-layer. While the performance gap is minor in-distribution (+1.8%), it is massive in OOD scenarios (+15% to 39%), demonstrating that the representation capacity of the latent code is severely bottlenecked under distribution mismatch.
vs PLGMI (AAAI 2023): PLGMI trains a conditional GAN using pseudo-labels. It is competitive on some metrics but suffers from poor image quality (FID as high as 200+). IF-GMI comprehensively outperforms it across all metrics.
vs LOMMA (CVPR 2023): LOMMA enhances attacks by training surrogate models through model distillation; it is an orthogonal, plug-and-play technology that can be combined with IF-GMI.
Insights & Connections: The concept of intermediate feature optimization can be generalized to privacy protection settings. Once we know attackers can exploit GAN intermediate layers, defenders can target these layers with tailored feature perturbations. Furthermore, the concept of utilizing an $\ell_1$-ball constraint to bound optimization drift is highly transferable to contexts like adversarial perturbation constraints and guided generation in diffusion models.

Rating¶

Novelty: ⭐⭐⭐⭐ The core mechanism (decomposing GANs to utilize intermediate features) is straightforward yet effective, and was indeed unexplored in MI attacks.
Experimental Thoroughness: ⭐⭐⭐⭐⭐ Thorough evaluation across multiple datasets, target models, ablations, defense robustness, and layer choices.
Writing Quality: ⭐⭐⭐⭐ Well-structured with clear motivation and rigorous mathematical formulations.
Value: ⭐⭐⭐⭐ Discloses the significance of GAN intermediate layer information for MI attacks, providing cautionary insights for privacy-security research.

Dataset (Public \(\to\) Private)	Target Model	Metric	PPA	IF-GMI	Gain
FFHQ \(\to\) FaceScrub	ResNet-18	Acc@1	0.812	0.830	+1.8%
MetFaces \(\to\) FaceScrub	ResNet-18	Acc@1	0.775	0.926	+15.1%
FFHQ \(\to\) CelebA	ResNet-152	Acc@1	0.841	0.947	+10.6%
MetFaces \(\to\) CelebA	ResNet-152	Acc@1	0.396	0.784	+38.8%
MetFaces \(\to\) FaceScrub	ResNet-152	Acc@1	0.731	0.904	+17.3%
MetFaces \(\to\) FaceScrub	ResNeSt-101	Acc@1	0.750	0.922	+17.2%
MetFaces \(\to\) FaceScrub	DenseNet-169	Acc@1	0.798	0.933	+13.5%
AFHQ \(\to\) Stanford Dogs	ResNet-152	Acc@1	0.950	0.982	+3.2%