Towards Trustworthy Federated Learning with Untrusted Participants¶

Conference: ICML 2025
arXiv: 2505.01874
Code: None (the paper mentions plans to open-source)
Area: AI Security
Keywords: Federated Learning, Differential Privacy, Robust Aggregation, Byzantine Fault Tolerance, Correlated Noise

TL;DR¶

The CafCor algorithm is proposed, which injects correlated noise achieved through shared randomness among participants and integrates a novel robust aggregation method called CAF. This achieves a privacy-utility trade-off close to Central Differential Privacy (CDP), without trusting the server and in the presence of malicious participants.

Background & Motivation¶

In distributed learning, privacy and robustness are two critical requirements for building trustworthy systems, but their intersection remains under-explored.

Existing solutions face a fundamental trade-off: - Local Differential Privacy (LDP): Does not trust the server, but suffers from extreme utility loss, with a privacy-utility trade-off of \(\Theta(d/(n\epsilon^2))\). - Central Differential Privacy (CDP): Assumes a fully trusted server, achieving much better utility with a trade-off of \(\Theta(d/(n^2\epsilon^2))\). - CDP is \(n\) times better than LDP, but fully trusting the server is unrealistic in practice.

Core Problem: Can a utility close to CDP be achieved without trusting the server and in the presence of malicious participants?

The SecLDP (Secret-based Local Differential Privacy) model proposed in this paper is key: each pair of participants shares a random seed unknown to others. This can be established via a single round of encrypted communication, which is a much weaker trust assumption than CDP.

Method¶

Overall Architecture¶

CafCor = Correlated noise + CAF aggregation, consisting of two core components:

Correlated Noise Injection: Utilizes shared randomness among participants.
Covariance-bound Agnostic Filter (CAF) Aggregation: A novel robust aggregation method.

Workflow (Algorithm 1): - The server broadcasts the model \(\theta_t\). - Each honest worker \(i\) samples a mini-batch and computes the clipped gradient \(g_t^{(i)}\). - Two types of noise are injected: independent noise \(\bar{v}_t^{(i)} \sim N(0, \sigma_{\text{ind}}^2 I_d)\) + correlated noise \(\sum_j v_t^{(ij)}\), where \(v_t^{(ij)} = -v_t^{(ji)}\). - After adding momentum, the messages are sent to the server. - The server aggregates the received messages using CAF and updates the model.

Key Designs¶

1. Correlated Noise Mechanism

Core Idea: Each pair of workers \((i,j)\) shares a random seed to generate symmetric noise \(v_t^{(ij)} = -v_t^{(ji)} \sim N(0, \sigma_{\text{cor}}^2 I_d)\).

Cancellation upon Aggregation: When averaging gradients across all honest workers, the correlated noise cancels out.
Privacy Protection: The server cannot obtain the shared seeds, so individual messages from workers appear highly noisy.
Collusion Robustness: Even if malicious actors collude, as long as there is at least one non-colluding honest worker, the correlated noise shared with them provides privacy.

The required noise scale is only \(\sigma_{\text{cor}}^2 = \Theta(1/(n\epsilon^2))\), whereas LDP requires \(\sigma_{\text{ind}}^2 = \Theta(1/\epsilon^2)\)—a difference of \(n\) times.

2. CAF Aggregation (Algorithm 2)

CAF filters out malicious inputs via iterative downweighting: - Initialize all input weights to 1. - In each round, calculate the weighted mean and empirical covariance matrix. - Find the direction corresponding to the largest eigenvalue (direction of maximum variance). - Reduce the weights of inputs based on their projection along this direction. - Record the weighted mean corresponding to the minimum-maximum eigenvalue. - Terminate when the total weight drops to \(n-2f\).

Key Advantages of CAF: - Does not assume any prior knowledge about the covariance of honest inputs (unlike Diakonikolas et al., 2017). - The error bound depends on the maximum eigenvalue rather than the trace, which is \(d\) times better than trimmed mean/median. - Computational complexity is \(O(f(nd^2 + d^3))\), which can be reduced to \(O(fnd \log d)\) in practice using the power method.

3. Momentum Update

Workers maintain a local momentum \(m_t^{(i)} = \beta_{t-1} m_{t-1}^{(i)} + (1-\beta_{t-1}) \tilde{g}_t^{(i)}\), sending momentum instead of raw gradients. The momentum coefficient is \(\beta_t = 1 - 24L\gamma_t\).

Loss & Training¶

Assume the loss function is \(L\)-smooth, and can be strongly convex or non-convex.
Gradient clipping \(\text{Clip}(g; C) = g \cdot \min\{1, C/\|g\|\}\) with sensitivity bound \(C\).
Learning rate strategy: for strongly convex cases, \(\gamma_t = 10/(\mu(t + 240L/\mu))\); for non-convex cases, \(\gamma_t = \min\{1/(24L), \sqrt{3L_0}/(16\bar{\sigma}\sqrt{LT})\}\).

Key Experimental Results¶

Main Results¶

On MNIST and Fashion-MNIST, with \(n=100\) workers and \(f \in \{5,10\}\) malicious workers:

Threat Model	MNIST (f=5)	MNIST (f=10)	Fashion-MNIST (f=5)
CDP (Baseline Upper Bound)	~83%	~82%	~72%
CafCor-SecLDP	83%	~78%	72%
CafCor-ByzLDP	~75%	~60%	~71%
DSGD-LDP	~30%	~30%	~50%

CafCor-SecLDP approaches CDP performance when \(f=5\), significantly outperforming LDP.

Ablation Study¶

Comparison of Aggregation Methods (Figure 2, \(n-f=10\) honest workers): - CafCor (CAF) vs CWTM, CWMED, GM, Multi-Krum, Meamed. - Under Sign Flipping and Label Flipping attacks: - CAF achieves optimal performance across all attacks, reaching near 90% accuracy under LF attacks. - CWTM/CWMED/GM only achieve around 50% accuracy under LF attacks. - Replacing CAF with a trimmed mean degrades the theoretical convergence rate by a factor of \(n\).

Key Findings¶

Privacy-Utility Trade-off: \(O((f+1)d/(n^2\epsilon^2))\), which matches CDP's \(\Theta(d/(n^2\epsilon^2))\) when \(f=O(1)\).
Strongly Convex Convergence Rate: \(O(\kappa G^2/\mu + L\bar{\sigma}^2/(\mu^2T) + L^2L_0/(\mu^2T^2))\).
Non-Convex Convergence Rate: \(O(\kappa G^2 + \bar{\sigma}\sqrt{LL_0}/\sqrt{T} + LL_0/T)\).
Strictly outperforms LDP when \(f\) is sublinear in \(n\).

Highlights & Insights¶

Introduction of SecLDP Model: Finds a practical middle ground between LDP and CDP, where the correlated noise scheme only requires a single round of pre-communication.
Dimensional Advantage of CAF Aggregation: The error bound does not include an extra dimension factor \(d\), which is a key advantage over trimmed mean in high-dimensional scenarios.
No Covariance Prior Required: CAF only needs to know the upper bound of the number of malicious workers, without requiring assumptions on the statistical properties of honest inputs.
Fine-grained Collusion Analysis: The parameter \(q\) in Theorem 4.1 allows precise control over scenarios with \(0\) to \(f\) colluding malicious actors.
Graceful Degradation: If shared randomness is unavailable, setting \(\sigma_{\text{cor}}=0\) naturally falls back to LDP while maintaining SOTA performance.

Limitations & Future Work¶

Communication Cost: Seed exchange requires \(O(n^2)\) pairwise communications. Although this is a one-time process with a small payload, it may still present a bottleneck for large-scale systems.
Optimality of SecLDP Unknown: The paper poses "What is the optimal utility under SecLDP? Does CafCor achieve it?" as an open question.
Privacy Budget Allocation: The settings of \(\sigma_{\text{ind}}\) and \(\sigma_{\text{cor}}\) rely on assumptions about cell collusion status, which can be difficult to determine in practice.
Simple Datasets: Experiments are conducted on MNIST/Fashion-MNIST, with no validation on more complex datasets (e.g., CIFAR-10 or NLP tasks).
\(2f\) Iterations in CAF: Although running in polynomial time, it still incurs overhead when \(f\) is large.

Three-way Trade-off in LDP + Robustness [Allouah et al., 2023c]: Proved a fundamental trade-off among privacy, robustness, and utility under LDP. This work breaks through this limitation.
Centralized Robust Mean Estimation [Hopkins et al., 2022; Liu et al., 2021]: Achieved near-optimal error bounds under CDP.
Correlated Noise Schemes [Sabater et al., 2022; Allouah et al., 2024]: Prior works did not provide utility guarantees under adversarial settings.
SMEA Aggregation [Allouah et al., 2023c]: Satisfies the same high-dimensional robust criterion but has exponential computational complexity. CAF serves as its polynomial-time alternative.
Provides a new paradigm for research that simultaneously addresses privacy and robustness in federated learning.

Rating¶

Novelty: ⭐⭐⭐⭐⭐ — The combination of correlated noise and robust aggregation is unique, and the SecLDP model is pioneering.
Experimental Thoroughness: ⭐⭐⭐⭐ — Comprehensive comparison across multiple attack types, threat models, and aggregation methods.
Writing Quality: ⭐⭐⭐⭐ — Rigorous theory and clear experimental details.
Value: ⭐⭐⭐⭐⭐ — Addresses the core pain points of privacy and robustness in federated learning.