Skip to content

Realistic Face Reconstruction from Facial Embeddings via Diffusion Models

Conference: AAAI 2026 arXiv: 2602.13168 Code: N/A Area: Image Generation Keywords: Face Reconstruction, Facial Embeddings, Privacy Attack, Kolmogorov-Arnold Network, Diffusion Models

TL;DR

This paper proposes the FEM (Face Embedding Mapping) framework, which employs a KAN-based network to map embeddings from arbitrary face recognition (FR) or privacy-preserving face recognition (PPFR) systems into the embedding space of a pretrained identity-preserving (ID-Preserving) diffusion model, enabling high-resolution realistic face reconstruction for evaluating privacy leakage risks in FR systems.

Background & Motivation

State of the Field

Face recognition (FR) systems generate facial embedding vectors as identity templates via black-box CNN/DNN models. To enhance privacy, privacy-preserving face recognition (PPFR) systems (e.g., DCTDP, HFCF, PartialFace, MinusFace) and embedding protection algorithms (e.g., PolyProtect, MLP-Hash, SlerpFace) have been proposed.

Limitations of Prior Work

Poor reconstruction quality of CNN-based methods: NbNet and end-to-end CNN approaches produce blurry, noisy facial images, typically limited to low resolution.

Significant limitations of GAN-based methods: FaceTI relies on StyleGAN3 and requires intensive training resources (51 hours/epoch, 25 GB VRAM); MAP2V requires no training but is extremely slow at inference (111 seconds/image).

Insufficient research targeting PPFR: Existing methods focus primarily on standard FR systems, with limited investigation into embedding attacks against privacy-preserving systems.

Lack of generalizability: Existing methods struggle to handle realistic attack scenarios involving partially leaked or protected embeddings.

Root Cause

How can a lightweight mapping network uniformly map the embedding spaces of diverse FR/PPFR systems—each with distinct characteristics—into the embedding space of a high-quality face generation model, achieving efficient and high-quality face reconstruction?

Starting Point

By leveraging the pretrained ID-Preserving diffusion model IPA-FaceID, which already possesses the capability to generate high-quality faces from embeddings, the core problem is reformulated as learning a mapping between embedding spaces. A KAN (Kolmogorov-Arnold Network) is introduced to capture complex nonlinear relationships between embedding spaces.

Method

Overall Architecture

The FEM framework consists of two phases: training and inference.

Training Phase: 1. A public face dataset is fed into both the target FR/PPFR model \(\Gamma'(\cdot)\) and the default FR model of IPA-FaceID \(\Gamma(\cdot)\). 2. Two embedding distributions \(\mathcal{D}'(e'_i)\) and \(\mathcal{D}(e_i)\) are obtained. 3. The FEM model \(\mathcal{M}(\cdot)\) is trained so that the mapped embedding \(\hat{e}_i = \mathcal{M}(e'_i)\) approximates the corresponding \(e_i\) as closely as possible.

Inference Phase: 1. The leaked target system embedding \(e'\) is fed into the trained FEM. 2. FEM produces the mapped embedding \(\hat{e}\). 3. IPA-FaceID directly generates a high-resolution identity-preserving facial image.

Key Designs

1. FEM-KAN: KAN-Based Embedding Mapping

Mechanism: The Kolmogorov-Arnold theorem states that any continuous function can be represented as a finite composition of univariate continuous functions. The mapping relationship between facial embeddings can thus be decomposed into compositions of univariate function operations.

\[f(x) = \sum_q \Phi_q\left(\sum_i \phi_{q,i}(x_i)\right)\]

Distinction from FEM-MLP: - FEM-MLP uses fixed activation functions (GELU) with a 3-layer MLP and 1D batch normalization. - FEM-KAN employs learnable activation functions placed on edges in a 3-layer KAN architecture, enabling more accurate capture of nonlinear mappings.

Design Motivation: Although facial embeddings are high-dimensional, they possess inherent structure. The univariate function decomposition in KAN is better suited to capturing complex nonlinear relationships between embedding spaces. UMAP visualizations confirm that FEM effectively maps target-domain embeddings into the target domain or boundary regions of IPA-FR.

2. Loss Function Design

Mean Squared Error (MSE) is used as the reconstruction loss:

\[\mathcal{L}_{MSE}(e_i, \hat{e}_i) = \frac{\sum_{i=0}^{N-1}(e_i - \hat{e}_i)^2}{N}\]

where \(e_i\) is the target embedding (output of IPA-FR) and \(\hat{e}_i = \mathcal{M}(e'_i)\) is the FEM-mapped embedding.

3. Exploiting the ID-Preserving Capability of IPA-FaceID

IPA-FaceID injects facial embeddings into a pretrained T2I diffusion model via decoupled cross-attention. The text prompt is fixed as "front portrait of a person" to generate frontal portraits. Once FEM maps the embedding into the target domain, IPA-FaceID can directly generate identity-preserving facial images.

Loss & Training

  • 90% of the FFHQ dataset is used for training; testing is performed on 1,000 unseen identities from CelebA-HQ.
  • AdamW optimizer with an initial learning rate of \(10^{-2}\) and exponential decay rate of 0.8.
  • Batch size of 128; trained for 20 epochs.
  • Training conducted on a Tesla V100 32 GB GPU.

Key Experimental Results

Main Results

Attack Success Rate (ASR) on the CelebA-HQ dataset:

Target Model Method MF EF GF AF Avg.
IRSE50 (FR) FaceTI 93.4 80.8 49.6 66.8 72.7
MAP2V 94.0 86.2 59.3 72.0 77.9
FEM-MLP 98.0 91.8 62.6 73.4 81.5
FEM-KAN 99.2 93.8 65.7 76.1 83.7
HFCF (PPFR) MAP2V 76.3 15.4 5.3 14.8 28.0
FEM-KAN 98.3 90.7 66.5 76.9 83.1
MinusFace (PPFR) MAP2V 68.0 4.8 2.3 5.6 20.2
FEM-KAN 96.5 71.3 44.5 58.1 67.6

FEM-KAN achieves the highest average ASR across all FR and PPFR target models, especially surpassing MAP2V by a large margin on PPFR models such as HFCF and MinusFace (83.1 vs. 28.0 and 67.6 vs. 20.2, respectively).

Ablation Study

Configuration Key Metric Notes
FEM Training Time 3 hrs vs. FaceTI 51 hrs 17× faster
FEM GPU Memory 4325 MiB vs. FaceTI 25383 MiB 5.8× more efficient
FEM Inference Time 2.6s vs. MAP2V 111s 42× faster
50% Embedding Leakage FEM-KAN ASR 53.2% vs. FaceTI 50.8% FEM more robust
30% Embedding Leakage FEM-KAN ASR 32.5% Attack viable at very low leakage
Makeup Scenario (LADN-M) FEM-KAN Avg. ASR 85.1% vs. FaceTI 56.4% FEM more robust to makeup

Protected Embedding Attacks

Protection Algorithm FEM-KAN MF/EF/GF/AF MAP2V MF/EF/GF/AF
MLP-Hash 82.1/54.7/56.5/71.6 48.1/0.6/0.3/1.5
SlerpFace 79.4/9.3/7.8/15.4 11.4/0.0/0.1/0.1
PolyProtect 50.3/7.1/5.6/15.4 28.6/4.4/3.6/4.3

FEM achieves ASR under MLP-Hash protection close to the unprotected baseline, indicating a critical security vulnerability in this embedding protection algorithm.

Key Findings

  • KAN outperforms MLP: FEM-KAN surpasses FEM-MLP in nearly all scenarios, validating the advantage of learnable activation functions for embedding mapping.
  • PPFR systems are not secure: Even after privacy-preserving transformations such as frequency-domain processing, embeddings retain sufficient identity information for high-quality face reconstruction.
  • Partial embeddings remain exploitable: FEM-KAN achieves 32.5% ASR even with only 30% of the embedding vector available.
  • Makeup has minimal impact on FEM: Makeup causes an 18.1% ASR drop for FaceTI but only a 6.4% drop for FEM-KAN.
  • Face anti-spoofing systems can be bypassed: The reconstructed high-quality faces can pass FASNet detection.

Highlights & Insights

  1. First application of KAN to embedding mapping: This work demonstrates that learnable activation functions in KAN genuinely outperform conventional MLP for nonlinear mapping of high-dimensional structured data (facial embeddings).
  2. Framework-level thinking: The face reconstruction problem is elegantly reformulated as a two-stage pipeline of "embedding space mapping + pretrained generation," enabling an extremely lightweight mapping network.
  3. Comprehensive security evaluation: The framework covers standard FR, PPFR, partial leakage, protected embeddings, and protected images, offering practical value for real-world security assessment.
  4. High training efficiency: A 3-layer network trained in 3 hours with 2.6-second inference significantly outperforms existing methods.

Limitations & Future Work

  • The framework depends on IPA-FaceID as the generation backend; it would become ineffective if this model is updated or deprecated.
  • The text prompt is fixed as "front portrait of a person," which may limit reconstruction accuracy for non-frontal faces.
  • FEM requires black-box query access to the target FR system to construct training data.
  • ASR drops noticeably for low-resolution faces (LFW 112×112).
  • When the embedding leakage rate falls below 30%, reconstructed faces exhibit visible artifacts.
  • Only MSE loss is employed; alternative distance metrics (e.g., cosine distance, contrastive loss) are not explored.
  • FEM can serve as an evaluation tool for the privacy security of FR/PPFR systems, quantifying the privacy leakage risk of different protection algorithms.
  • The results inspire security research: embedding protection algorithms, particularly MLP-Hash, require redesign to resist mapping-based attacks.
  • KAN may have application potential in other embedding space alignment tasks, such as cross-modal retrieval and domain adaptation.
  • This work underscores the "embedding-as-privacy" security paradigm: even when images are protected, embedding vectors may still leak identity information.

Rating

  • Novelty: ⭐⭐⭐⭐ (Novel application of KAN to embedding mapping; framework design is elegant and concise)
  • Experimental Thoroughness: ⭐⭐⭐⭐⭐ (Covers diverse FR/PPFR systems, multiple attack scenarios, and resource comparisons)
  • Writing Quality: ⭐⭐⭐⭐ (Clear structure with detailed experimental setup)
  • Value: ⭐⭐⭐⭐ (Practically significant for privacy security research; exposes security risks in PPFR systems)