DictPFL: Efficient and Private Federated Learning on Encrypted Gradients¶

Conference: NeurIPS 2025 arXiv: 2510.21086 Code: UCF-ML-Research/DictPFL Area: AI Security Keywords: federated learning, Homomorphic Encryption, Privacy-Preserving, Gradient Pruning, Dictionary Decomposition

TL;DR¶

This paper proposes DictPFL, a framework that decomposes model weights into a static dictionary and a trainable lookup table, and combines this decomposition with encryption-aware pruning. DictPFL achieves full gradient protection via homomorphic encryption in federated learning while reducing communication overhead by 402–748× and training time by 28–65×, keeping total runtime within 2× of plaintext FL.

Background & Motivation¶

Federated learning (FL) enables multiple parties to collaboratively train models without sharing raw data, yet shared gradients remain vulnerable to privacy leakage — gradient inversion attacks can reconstruct clients' original training data from shared gradients.

Homomorphic encryption (HE) is an ideal solution for protecting gradient privacy: clients encrypt gradients before uploading, and the server aggregates directly on ciphertexts without decryption. However, HE introduces prohibitive overhead:

Ciphertext expansion: communication cost increases by 1–3 orders of magnitude
Computational cost: encryption, decryption, and homomorphic aggregation are highly time-consuming
In ViT training, HE-related operations (encryption, decryption, aggregation, communication) dominate total training time

The existing method FedML-HE adopts a selective encryption strategy, encrypting only the most sensitive 10% of gradients and transmitting the rest in plaintext. While this reduces overhead, unencrypted gradients still expose private information — experiments show that when 30% of gradients are unencrypted, an adversary can recover images with up to 23% similarity to the originals.

Core Problem¶

How to simultaneously achieve in HE-based FL:

Complete privacy protection: all transmitted gradients must be fully encrypted, with no plaintext gradients exposed
High efficiency: reduce the communication and computation overhead of HE to near-plaintext FL levels

These two objectives were previously considered contradictory — full encryption implies high overhead, while low overhead requires sacrificing some privacy.

Method¶

DictPFL consists of two core modules:

1. Decompose-for-Partial-Encrypt (DePE) — Dictionary Decomposition¶

Core Idea: Decompose a weight matrix \(W \in \mathbb{R}^{n \times m}\) into a static dictionary \(D \in \mathbb{R}^{n \times r}\) and a trainable lookup table \(T \in \mathbb{R}^{r \times m}\), where \(r \ll \min(n, m)\).

Procedure:

Apply truncated SVD to the initial weights \(W_0\): \(W_0 \approx U_r \Sigma_r V_r^\top\)
Set dictionary \(D = U_r \Sigma_r\) (frozen, identical across all clients, never transmitted)
Initialize the lookup table as a zero matrix; the effective weights are constructed as \(W = W_0 + D \cdot T\)
Only the gradients of \(T\) are encrypted and transmitted for aggregation

Key Design: Retaining the original \(W_0\) and initializing \(T\) to zero (rather than directly using \(V_r^\top\)) avoids information loss from SVD truncation. At \(r=4\), the number of trainable parameters is substantially reduced, directly decreasing the number of ciphertexts requiring encryption.

2. Prune-for-Minimum-Encrypt (PrME) — Encryption-Aware Pruning¶

PrME further reduces the number of parameters that need to be encrypted and transmitted, building upon DePE.

Unique Challenges of Pruning under HE:

Independent pruning by each client leads to inconsistent sparsity patterns, whereas HE's SIMD batching mechanism requires aligned ciphertext slots across clients
Non-linear comparison operations on encrypted indices cannot be performed on the server

Temporal Inactivity Pruning (TIP):

Uses the global gradient history from the preceding \(\tau\) rounds (consistent across all clients) as a shared pruning criterion
A parameter is pruned only when its gradient magnitude falls in the bottom \(s\%\) for \(\tau\) consecutive rounds
Pruning mask: \(M_{i,t}=0\) (pruned) when \(\sum_{k=1}^{\tau} \mathbf{1}(|\delta w_{i,t-k}| < \theta_{s,t-k}) = \tau\)

Holistic Reactivation Correction (HRC):

Addresses the permanent inactivation of pruned parameters in TIP
Assigns a dynamic reactivation probability \(p_i\) to each pruned parameter
After reactivation, if the accumulated global gradient remains small, \(p_i\) is decreased (multiplied by decay factor \(\beta\)); otherwise \(p_i\) is increased
Client-side reactivation consistency is ensured via a shared random seed
Pruning masks are never sent to the server, preventing privacy inference through plaintext masks

Default Hyperparameters¶

Parameter	Default	Description
\(r\)	4	Dictionary size
\(s\%\)	70%	Pruning ratio
\(\tau\)	3	Pruning patience window
\(\beta\)	0.2	Reactivation probability decay factor

Key Experimental Results¶

Efficiency vs. Full Encryption (FedHE-Full)¶

Communication overhead reduced by 402–748×
Training speed improved by 28–65×
Total runtime within < 2× of plaintext FL

Efficiency vs. Selective Encryption (FedML-HE, 10% encrypted)¶

Communication overhead reduced by 51–155×
Training speed improved by 4–19×
DictPFL simultaneously provides complete privacy protection (FedML-HE retains privacy leakage risk)

Accuracy (ViT, 3-client homogeneous setting)¶

Method	CIFAR-10	GTSRB	Diabetic Retinopathy
FedHE-Full	baseline	baseline	82.74%
FedHE-Top2	—	58.9%	—
DictPFL (\(r\)=4)	on par	95.27%	81.99%

Privacy Protection¶

FedML-HE (30% unencrypted): adversary recovers images with up to 23% similarity
DictPFL: all gradients encrypted, resistant to any gradient inversion attack

Ablation Study¶

Dictionary size \(r\): \(r=4\) achieves accuracy close to the full model; \(r=2\) shows significant degradation
Pruning ratio: 70% pruning with HRC reactivation matches the accuracy of 20% pruning while retaining the communication efficiency of 70% pruning
Pruning patience \(\tau\): \(\tau=3\) sufficiently balances accuracy and communication efficiency

Highlights & Insights¶

First demonstration of practical HE-based FL: runtime within 2× of plaintext FL, previously considered infeasible
Zero-leakage design: all transmitted gradients are encrypted; untransmitted parameters (the dictionary) remain local; pruning masks are never sent to the server
Elegant DePE design: retaining \(W_0\) and zero-initializing \(T\) avoids SVD truncation information loss; the dictionary \(D\) is naturally consistent across clients without any communication
HRC reactivation mechanism: gracefully resolves the irreversibility of pruning under HE, balancing efficiency and convergence through dynamic probabilities
Cross-task generality: effective across image classification, text classification, and text generation tasks, covering ViT, BERT, and TinyLlama

Limitations & Future Work¶

Fixed dictionary: the dictionary is constructed once before training and frozen, limiting adaptability to highly heterogeneous client data distributions; dynamic dictionaries could be explored
Scenario coverage: only the cross-silo setting (few clients) is evaluated; the cross-device setting (many resource-constrained devices) remains unverified
Model families: experiments focus on Transformer architectures; CNNs and other architectures are not covered
SVD decomposition cost: although a one-time operation, SVD on very large models may itself incur significant computation
Selection of \(r\): accuracy is sensitive to dictionary size (a notable jump from \(r=2\) to \(r=4\)); automatic selection of the optimal \(r\) is not explored

Method	Privacy Level	Communication	Training Speed	Accuracy
FedHE-Full	Fully encrypted	Very high (baseline)	Very slow	Highest
FedHE-Top2	Fully encrypted (last layer only)	Moderate	Moderate	Lower
FedML-HE (10%)	Partially encrypted, leakage risk	High	Slow	High
DP-based FL	Noise-based protection	Low	Fast	Degraded
MPC-based FL	Aggregation-level protection	Varies	Varies	Lossless
DictPFL	Fully encrypted	Very low	Near plaintext	High

The fundamental distinction from FedML-HE: FedML-HE reduces the volume of encrypted data (selective encryption), whereas DictPFL reduces the total volume of transmitted data (all encrypted but transmission volume is minimal). The design philosophy is entirely different; DictPFL eliminates privacy leakage at the source.

The dictionary decomposition idea generalizes broadly: decomposing a large parameter space into a shared static component and a personalized trainable component resembles the low-rank decomposition in LoRA, but freezes the left singular vectors of SVD rather than the right. This work also motivates future FL algorithm design to account for encryption scheme constraints (e.g., SIMD slot alignment) from the outset rather than as an afterthought. The pruning consistency problem is transferable to other distributed scenarios requiring consistency, such as sparse communication in distributed training. DictPFL is complementary to secure aggregation (SA): DictPFL protects client-to-server transmission, while SA protects individual contributions during aggregation.

Rating¶

Novelty: ⭐⭐⭐⭐ — The combined DePE+PrME design is novel; first to achieve practical HE-FL
Experimental Thoroughness: ⭐⭐⭐⭐ — Multi-dataset, multi-model, multi-scenario ablations with complete privacy attack validation
Writing Quality: ⭐⭐⭐⭐ — Clear figures, coherent motivation and methodology
Value: ⭐⭐⭐⭐⭐ — Addresses the core practicality bottleneck of HE-FL with major implications for real-world deployment