Realistic Face Reconstruction from Facial Embeddings via Diffusion Models¶
Conference: AAAI 2026 arXiv: 2602.13168 Code: N/A Area: Image Generation Keywords: Face Reconstruction, Facial Embeddings, Privacy Attack, Kolmogorov-Arnold Network, Diffusion Models
TL;DR¶
This paper proposes the FEM (Face Embedding Mapping) framework, which employs a KAN-based network to map embeddings from arbitrary face recognition (FR) or privacy-preserving face recognition (PPFR) systems into the embedding space of a pretrained identity-preserving (ID-Preserving) diffusion model, enabling high-resolution realistic face reconstruction for evaluating privacy leakage risks in FR systems.
Background & Motivation¶
State of the Field¶
Face recognition (FR) systems generate facial embedding vectors as identity templates via black-box CNN/DNN models. To enhance privacy, privacy-preserving face recognition (PPFR) systems (e.g., DCTDP, HFCF, PartialFace, MinusFace) and embedding protection algorithms (e.g., PolyProtect, MLP-Hash, SlerpFace) have been proposed.
Limitations of Prior Work¶
Poor reconstruction quality of CNN-based methods: NbNet and end-to-end CNN approaches produce blurry, noisy facial images, typically limited to low resolution.
Significant limitations of GAN-based methods: FaceTI relies on StyleGAN3 and requires intensive training resources (51 hours/epoch, 25 GB VRAM); MAP2V requires no training but is extremely slow at inference (111 seconds/image).
Insufficient research targeting PPFR: Existing methods focus primarily on standard FR systems, with limited investigation into embedding attacks against privacy-preserving systems.
Lack of generalizability: Existing methods struggle to handle realistic attack scenarios involving partially leaked or protected embeddings.
Root Cause¶
How can a lightweight mapping network uniformly map the embedding spaces of diverse FR/PPFR systems—each with distinct characteristics—into the embedding space of a high-quality face generation model, achieving efficient and high-quality face reconstruction?
Starting Point¶
By leveraging the pretrained ID-Preserving diffusion model IPA-FaceID, which already possesses the capability to generate high-quality faces from embeddings, the core problem is reformulated as learning a mapping between embedding spaces. A KAN (Kolmogorov-Arnold Network) is introduced to capture complex nonlinear relationships between embedding spaces.
Method¶
Overall Architecture¶
The FEM framework consists of two phases: training and inference.
Training Phase: 1. A public face dataset is fed into both the target FR/PPFR model \(\Gamma'(\cdot)\) and the default FR model of IPA-FaceID \(\Gamma(\cdot)\). 2. Two embedding distributions \(\mathcal{D}'(e'_i)\) and \(\mathcal{D}(e_i)\) are obtained. 3. The FEM model \(\mathcal{M}(\cdot)\) is trained so that the mapped embedding \(\hat{e}_i = \mathcal{M}(e'_i)\) approximates the corresponding \(e_i\) as closely as possible.
Inference Phase: 1. The leaked target system embedding \(e'\) is fed into the trained FEM. 2. FEM produces the mapped embedding \(\hat{e}\). 3. IPA-FaceID directly generates a high-resolution identity-preserving facial image.
Key Designs¶
1. FEM-KAN: KAN-Based Embedding Mapping¶
Mechanism: The Kolmogorov-Arnold theorem states that any continuous function can be represented as a finite composition of univariate continuous functions. The mapping relationship between facial embeddings can thus be decomposed into compositions of univariate function operations.
Distinction from FEM-MLP: - FEM-MLP uses fixed activation functions (GELU) with a 3-layer MLP and 1D batch normalization. - FEM-KAN employs learnable activation functions placed on edges in a 3-layer KAN architecture, enabling more accurate capture of nonlinear mappings.
Design Motivation: Although facial embeddings are high-dimensional, they possess inherent structure. The univariate function decomposition in KAN is better suited to capturing complex nonlinear relationships between embedding spaces. UMAP visualizations confirm that FEM effectively maps target-domain embeddings into the target domain or boundary regions of IPA-FR.
2. Loss Function Design¶
Mean Squared Error (MSE) is used as the reconstruction loss:
where \(e_i\) is the target embedding (output of IPA-FR) and \(\hat{e}_i = \mathcal{M}(e'_i)\) is the FEM-mapped embedding.
3. Exploiting the ID-Preserving Capability of IPA-FaceID¶
IPA-FaceID injects facial embeddings into a pretrained T2I diffusion model via decoupled cross-attention. The text prompt is fixed as "front portrait of a person" to generate frontal portraits. Once FEM maps the embedding into the target domain, IPA-FaceID can directly generate identity-preserving facial images.
Loss & Training¶
- 90% of the FFHQ dataset is used for training; testing is performed on 1,000 unseen identities from CelebA-HQ.
- AdamW optimizer with an initial learning rate of \(10^{-2}\) and exponential decay rate of 0.8.
- Batch size of 128; trained for 20 epochs.
- Training conducted on a Tesla V100 32 GB GPU.
Key Experimental Results¶
Main Results¶
Attack Success Rate (ASR) on the CelebA-HQ dataset:
| Target Model | Method | MF | EF | GF | AF | Avg. |
|---|---|---|---|---|---|---|
| IRSE50 (FR) | FaceTI | 93.4 | 80.8 | 49.6 | 66.8 | 72.7 |
| MAP2V | 94.0 | 86.2 | 59.3 | 72.0 | 77.9 | |
| FEM-MLP | 98.0 | 91.8 | 62.6 | 73.4 | 81.5 | |
| FEM-KAN | 99.2 | 93.8 | 65.7 | 76.1 | 83.7 | |
| HFCF (PPFR) | MAP2V | 76.3 | 15.4 | 5.3 | 14.8 | 28.0 |
| FEM-KAN | 98.3 | 90.7 | 66.5 | 76.9 | 83.1 | |
| MinusFace (PPFR) | MAP2V | 68.0 | 4.8 | 2.3 | 5.6 | 20.2 |
| FEM-KAN | 96.5 | 71.3 | 44.5 | 58.1 | 67.6 |
FEM-KAN achieves the highest average ASR across all FR and PPFR target models, especially surpassing MAP2V by a large margin on PPFR models such as HFCF and MinusFace (83.1 vs. 28.0 and 67.6 vs. 20.2, respectively).
Ablation Study¶
| Configuration | Key Metric | Notes |
|---|---|---|
| FEM Training Time | 3 hrs vs. FaceTI 51 hrs | 17× faster |
| FEM GPU Memory | 4325 MiB vs. FaceTI 25383 MiB | 5.8× more efficient |
| FEM Inference Time | 2.6s vs. MAP2V 111s | 42× faster |
| 50% Embedding Leakage | FEM-KAN ASR 53.2% vs. FaceTI 50.8% | FEM more robust |
| 30% Embedding Leakage | FEM-KAN ASR 32.5% | Attack viable at very low leakage |
| Makeup Scenario (LADN-M) | FEM-KAN Avg. ASR 85.1% vs. FaceTI 56.4% | FEM more robust to makeup |
Protected Embedding Attacks¶
| Protection Algorithm | FEM-KAN MF/EF/GF/AF | MAP2V MF/EF/GF/AF |
|---|---|---|
| MLP-Hash | 82.1/54.7/56.5/71.6 | 48.1/0.6/0.3/1.5 |
| SlerpFace | 79.4/9.3/7.8/15.4 | 11.4/0.0/0.1/0.1 |
| PolyProtect | 50.3/7.1/5.6/15.4 | 28.6/4.4/3.6/4.3 |
FEM achieves ASR under MLP-Hash protection close to the unprotected baseline, indicating a critical security vulnerability in this embedding protection algorithm.
Key Findings¶
- KAN outperforms MLP: FEM-KAN surpasses FEM-MLP in nearly all scenarios, validating the advantage of learnable activation functions for embedding mapping.
- PPFR systems are not secure: Even after privacy-preserving transformations such as frequency-domain processing, embeddings retain sufficient identity information for high-quality face reconstruction.
- Partial embeddings remain exploitable: FEM-KAN achieves 32.5% ASR even with only 30% of the embedding vector available.
- Makeup has minimal impact on FEM: Makeup causes an 18.1% ASR drop for FaceTI but only a 6.4% drop for FEM-KAN.
- Face anti-spoofing systems can be bypassed: The reconstructed high-quality faces can pass FASNet detection.
Highlights & Insights¶
- First application of KAN to embedding mapping: This work demonstrates that learnable activation functions in KAN genuinely outperform conventional MLP for nonlinear mapping of high-dimensional structured data (facial embeddings).
- Framework-level thinking: The face reconstruction problem is elegantly reformulated as a two-stage pipeline of "embedding space mapping + pretrained generation," enabling an extremely lightweight mapping network.
- Comprehensive security evaluation: The framework covers standard FR, PPFR, partial leakage, protected embeddings, and protected images, offering practical value for real-world security assessment.
- High training efficiency: A 3-layer network trained in 3 hours with 2.6-second inference significantly outperforms existing methods.
Limitations & Future Work¶
- The framework depends on IPA-FaceID as the generation backend; it would become ineffective if this model is updated or deprecated.
- The text prompt is fixed as "front portrait of a person," which may limit reconstruction accuracy for non-frontal faces.
- FEM requires black-box query access to the target FR system to construct training data.
- ASR drops noticeably for low-resolution faces (LFW 112×112).
- When the embedding leakage rate falls below 30%, reconstructed faces exhibit visible artifacts.
- Only MSE loss is employed; alternative distance metrics (e.g., cosine distance, contrastive loss) are not explored.
Related Work & Insights¶
- FEM can serve as an evaluation tool for the privacy security of FR/PPFR systems, quantifying the privacy leakage risk of different protection algorithms.
- The results inspire security research: embedding protection algorithms, particularly MLP-Hash, require redesign to resist mapping-based attacks.
- KAN may have application potential in other embedding space alignment tasks, such as cross-modal retrieval and domain adaptation.
- This work underscores the "embedding-as-privacy" security paradigm: even when images are protected, embedding vectors may still leak identity information.
Rating¶
- Novelty: ⭐⭐⭐⭐ (Novel application of KAN to embedding mapping; framework design is elegant and concise)
- Experimental Thoroughness: ⭐⭐⭐⭐⭐ (Covers diverse FR/PPFR systems, multiple attack scenarios, and resource comparisons)
- Writing Quality: ⭐⭐⭐⭐ (Clear structure with detailed experimental setup)
- Value: ⭐⭐⭐⭐ (Practically significant for privacy security research; exposes security risks in PPFR systems)