pFed1BS: Personalized Federated Learning with Bidirectional Communication Compression via One-Bit Random Sketching¶
Conference: AAAI 2026 arXiv: 2511.13144 Code: To be confirmed Area: Optimization / Federated Learning Keywords: Federated Learning, Communication Efficiency, Personalization, one-bit compression, random sketching
TL;DR¶
This paper proposes pFed1BS, a framework that achieves extreme bidirectional communication compression (>99% reduction) in federated learning via one-bit random sketching, while introducing a sign-based regularizer for client model personalization. The framework simultaneously addresses communication bottlenecks and data heterogeneity in non-IID settings.
Background & Motivation¶
Federated learning (FL) faces two fundamental challenges: (1) non-IID client data degrades the performance of a single global model; and (2) repeated transmission of high-dimensional model parameters between the server and clients incurs substantial communication overhead. Both issues are especially pronounced in bandwidth-constrained scenarios such as IoT, V2X, and remote sensing.
Existing work follows two independent lines:
- Personalized Federated Learning (PFL): Methods such as pFedMe, Ditto, and FedRep achieve personalization via regularization or architectural decomposition, but still require transmitting full-precision high-dimensional parameters, resulting in high communication costs.
- Communication-Efficient Federated Learning (CEFL): Methods such as OBDA (bidirectional one-bit quantization), OBCSAA (one-bit compressed sensing uplink), and zSignFed (sign-based compression) achieve extreme compression but train only a single global model and cannot handle data heterogeneity.
Through a systematic comparison in Table 1, the authors identify a clear research gap: no existing framework simultaneously supports extreme bidirectional compression and native personalization. pFed1BS is designed to fill this gap.
Method¶
Overall Architecture¶
pFed1BS formulates the federated learning problem as a bilevel optimization:
- Client level: Each client \(k\) learns a personalized model \(\bm{w}_k\) by minimizing an objective comprising a local loss, a sign-based regularization term, and an \(\ell_2\) penalty.
- Server level: The server aggregates one-bit sketches from all clients to produce a global consensus vector \(\bm{v}\).
Key Designs¶
1. One-Bit Random Sketch Communication
Instead of transmitting high-dimensional model parameters \(\bm{w}_k \in \mathbb{R}^n\), each client uploads a low-dimensional one-bit sketch \(\text{sign}(\mathbf{\Phi}\bm{w}_k)\), where \(\mathbf{\Phi} \in \mathbb{R}^{m \times n}\) (\(m \ll n\)) is a random projection matrix. The server likewise broadcasts only a one-bit consensus vector \(\bm{v} \in \{\pm 1\}^m\), achieving extreme compression in both uplink and downlink directions.
2. Sign-Based Regularizer
A regularization term \(g(\bm{v}, \mathbf{\Phi}\bm{w}_k) = \frac{1}{2}(\|\mathbf{\Phi}\bm{w}_k\|_1 - \langle\bm{v}, \mathbf{\Phi}\bm{w}_k\rangle)\) is introduced to encourage the sign of the local model projection to align with the global consensus while preserving local data characteristics. Since the \(\ell_1\) norm is non-smooth, a differentiable approximation \(h_\gamma(\bm{z}) = \frac{1}{\gamma}\sum_i \log(\cosh(\gamma z_i))\) is adopted, yielding the gradient \(\nabla\tilde{g} = \mathbf{\Phi}^\top(\tanh(\gamma\mathbf{\Phi}\bm{w}_k) - \bm{v})\). As \(\gamma \to \infty\), \(\tanh\) approaches the \(\text{sign}\) function.
3. Fast Hadamard Transform for Efficient Projection
Naive dense random matrix projection \(\mathbf{\Phi}\bm{w}\) has complexity \(\mathcal{O}(mn)\), which is infeasible for large models. The framework instead employs the Subsampled Randomized Hadamard Transform (SRHT): \(\mathbf{\Phi}\bm{w} = \bm{S}'\bm{H}\bm{D}\tilde{\bm{w}}\), where \(\bm{D}\) is a random sign-flipping matrix, \(\bm{H}\) is the Walsh–Hadamard matrix, and \(\bm{S}'\) is a subsampling matrix. This reduces computational complexity from \(\mathcal{O}(mn)\) to \(\mathcal{O}(n\log n)\) and eliminates the need to store a dense matrix.
4. Server Aggregation
Upon receiving client sketches, the server solves the discrete optimization \(\min_{\bm{v}\in\{\pm 1\}^m} \sum_k p_k \tilde{g}(\bm{v}, \bm{z}_k)\), whose closed-form solution is weighted majority voting: \(\bm{v}^* = \text{sign}(\sum_k p_k \bm{z}_k)\). This guarantees that the aggregation step is provably optimal rather than heuristic.
Algorithm¶
Each communication round proceeds as follows: clients receive the global consensus \(\bm{v}^t\) → perform \(R\) steps of local SGD with the sign-based regularizer → compute and upload one-bit sketches → the server applies weighted majority voting to produce a new consensus \(\bm{v}^{t+1}\) → broadcast.
Loss & Training¶
Under standard assumptions (\(L\)-smoothness, bounded variance, etc.), pFed1BS is proven to converge to a stationary neighborhood of the global potential function at rate \(\mathcal{O}(1/(RT))\). The convergence error is governed by three terms: stochastic noise \(\mathcal{O}(\eta L_F \sigma^2)\), communication error \(\mathcal{O}(\Delta_{\max}/(\eta R))\), and client sampling error \(\mathcal{O}(\lambda E_S/(\eta R))\). The regularization coefficient must satisfy \(\lambda = \mathcal{O}(1/n)\).
Key Experimental Results¶
Experimental Setup¶
- Datasets: MNIST, FMNIST, CIFAR-10, CIFAR-100, SVHN
- Models: Two-layer MLP for MNIST/FMNIST; VGG architecture for the remaining datasets
- Non-IID Partition: 20 clients with label-based data allocation
- Baselines: FedAvg, OBDA, OBCSAA, zSignFed, EDEN, FedBAT
- Key Hyperparameters: \(\lambda=0.0005\), \(\mu=0.00001\), \(\gamma=10000\), compression ratio \(m/n=0.1\)
- Hardware: NVIDIA RTX 3090 Ti; results averaged over 10 independent runs
Main Results (Table 2)¶
| Method | MNIST Acc(%) | CIFAR-10 Acc(%) | CIFAR-100 Acc(%) | Comm. Reduction |
|---|---|---|---|---|
| FedAvg | 97.21 | 87.78 | 59.60 | - |
| OBDA | 92.54 | 73.26 | 42.47 | ↓96.88% |
| OBCSAA | 92.20 | 83.57 | 48.99 | ↓49.84% |
| zSignFed | 94.83 | 67.60 | 40.17 | ↓48.45% |
| EDEN | 96.50 | 84.91 | 47.55 | ↓60.88% |
| FedBAT | 96.42 | 81.20 | 46.89 | ↓61.75% |
| pFed1BS | 97.83 | 85.21 | 52.88 | ↓99.69% |
pFed1BS achieves the highest or near-highest accuracy across all datasets while reducing communication costs by over 99%. On CIFAR-10, pFed1BS attains 85.21% accuracy with only 0.13 MB per round, whereas OBDA requires 1.34 MB yet achieves only 73.26%.
Ablation Study¶
Under non-IID conditions on MNIST (Figures 3–4), the convergence curves of pFed1BS show faster convergence, higher final accuracy (97.83%), and a more rapid and stable decrease in training loss compared to all baselines. On more challenging datasets such as CIFAR-100, competing one-bit methods collapse in performance (e.g., zSignFed reaches only 40.17%), while pFed1BS maintains 52.88%, demonstrating that the personalization mechanism is critical for the viability of extreme compression.
Highlights & Insights¶
- First unified framework: The first work to jointly formulate one-bit bidirectional communication and personalized federated learning as a rigorous joint optimization problem.
- Extreme compression ratio: Communication costs are reduced by over 99% while maintaining or exceeding the accuracy of full-precision methods.
- Computational efficiency: The Fast Hadamard Transform reduces projection complexity from \(\mathcal{O}(mn)\) to \(\mathcal{O}(n\log n)\).
- Theoretical completeness: A full convergence analysis is provided that accounts for the interactions among personalization, local updates, one-bit sketch error, and client sampling.
- Closed-form optimal aggregation: Server-side aggregation is not heuristic but is a provably optimal solution with rigorous guarantees.
Limitations & Future Work¶
- Limited model scale: Experiments are conducted only on MLP and VGG architectures; large-scale modern architectures such as ResNet and Transformer are not evaluated.
- Fixed compression ratio: The compression ratio \(m/n=0.1\) is fixed, with no adaptive adjustment mechanism.
- Single non-IID type: Only label-based non-IID partitioning is tested; more complex scenarios such as feature distribution shift and quantity imbalance are not explored.
- Insufficient hyperparameter sensitivity analysis: The interaction effects and robustness of the three hyperparameters \(\lambda\), \(\gamma\), and \(\mu\) are not thoroughly investigated.
- Limited PFL baseline comparisons: Direct comparisons with classical PFL methods such as pFedMe, Ditto, and Per-FedAvg are absent.
Related Work & Insights¶
- Personalized Federated Learning: pFedMe (regularization-based), Ditto (dual-model), FedRep (shared representation + personalized head), DisPFL (personalized sparse masks).
- Communication-Efficient Federated Learning: Top-k sparsification, mixed-precision quantization (Chen & Vikalo 2024), compressed sensing (Li et al. 2021), OBDA/OBCSAA/zSignFed (one-bit methods).
- Random Projection: Subsampled Randomized Hadamard Transform (SRHT) for efficient dimensionality reduction.
Rating¶
| Dimension | Score |
|---|---|
| Novelty | ⭐⭐⭐⭐ |
| Theoretical Depth | ⭐⭐⭐⭐ |
| Experimental Thoroughness | ⭐⭐⭐ |
| Value | ⭐⭐⭐⭐ |
| Writing Quality | ⭐⭐⭐⭐ |
Overall Rating: ⭐⭐⭐⭐ (4/5)
The paper fills a clear gap between communication compression and personalized federated learning with solid theoretical foundations, though the experimental scale is modest and direct comparisons with classical PFL methods are lacking.