Skip to content

Single-Loop Byzantine-Resilient Federated Bilevel Optimization

Conference: ICLR 2026
OpenReview: https://openreview.net/forum?id=AzaE6Dcz1C
Code: None
Area: Federated Bilevel Optimization / Byzantine-Robust Optimization
Keywords: Federated Bilevel Optimization, Byzantine Robustness, Single-Loop Algorithm, Hypergradient Estimation, Robust Aggregation

TL;DR

This paper investigates federated bilevel optimization in the presence of Byzantine clients. It first derives an asymptotic error lower bound determined by the heterogeneity of both upper and lower levels, then proposes the single-loop algorithm BR-FedBi and its Momentum/PAGE variants. These algorithms utilize auxiliary variables for hypergradient estimation combined with robust aggregation, theoretically achieving optimal Byzantine resilience or optimal sample complexity, and experimentally outperforming BILANTINE which requires sub-loops.

Background & Motivation

Background: Federated Bilevel Optimization (FBO) adapts hierarchical learning problems to multi-client scenarios, such as federated meta-learning, hyperparameter optimization, and federated reinforcement learning. The objective is \(\min_x \Phi(x)=\frac{1}{n}\sum_i f_i(x,y^*(x))\), where \(y^*(x)\) solves the lower-level problem \(\min_y \frac{1}{n}\sum_i g_i(x,y)\). The challenge lies in distributed communication and efficient hypergradient calculation.

Limitations of Prior Work: Existing robust federated learning methods mostly serve single-level optimization. Applying them to bilevel optimization introduces complications because global hypergradients involve global Hessian-inverse-vector products, and most relevant methods like BILANTINE rely on AID and sub-loops, requiring frequent communication that increases exposure to attackers.

Key Challenge: FBO requires accurate global hypergradients while Byzantine robustness demands stability against arbitrary malice. Data heterogeneity naturally causes gradient inconsistencies among honest clients. In bilevel structures, this error is amplified through lower-level solutions and Hessian terms.

Goal: The authors aim to establish the asymptotic error lower bound caused by Byzantine attacks in FBO and design a single-loop algorithm that maintains efficiency and reaches optimal robustness without sub-loops.

Key Insight: Leveraging SOBA-style single-loop optimization, the paper introduces an auxiliary variable \(z\) to approximate the Hessian-inverse-vector product. This decomposes hypergradient estimation into three synchronous updates: upper-level \(x\), lower-level \(y\), and auxiliary variable \(z\). All updates are passed through \((\alpha,c)\)-robust aggregators.

Core Idea: Replace "inner sub-loops + frequent hypergradient communication" with an "\(x/y/z\) three-variable single-loop + robust aggregation" approach. Use Momentum to cancel variance-related asymptotic error and PAGE to further reduce sample complexity.

Method

Overall Architecture

The server minimizes \(\Phi_H(x)=\frac{1}{|H|}\sum_{i\in H} f_i(x,y^*(x))\) where \(y^*(x)\) is the optimal solution for the honest lower-level objective \(g_H(x,y)\). Up to a ratio \(\alpha < 1/2\) of clients are Byzantine. BR-FedBi broadcasts \((x_k,y_k,z_k)\), honest clients compute updates for the three variables, and the server applies robust aggregation to each channel.

%%{init: {'flowchart': {'rankSpacing': 24, 'nodeSpacing': 28, 'padding': 6, 'wrappingWidth': 400}}}%%
flowchart TD
    A["Server broadcasts<br/>x_k, y_k, z_k"] --> B["Auxiliary variable hypergradient estimation"]
    B --> C["Three-way local updates<br/>D_x, D_y, D_z"]
    C --> D["Robust aggregation against<br/>Byzantine updates"]
    D --> E["Single-loop synchronous updates<br/>x, y, z"]
    E --> F["Momentum or PAGE<br/>variants"]

Key Designs

1. Auxiliary Variable Hypergradient Estimation The hypergradient is approximated using \(z\) to solve the linear system \(\nabla_{yy}g_H(x,y)z+\nabla_y f_H(x,y)=0\). Honest clients compute local quantities: $\(D_{x,i}=\nabla_{xy}G_i(x_k,y_k;\xi_i)z_k+\nabla_xF_i(x_k,y_k;\zeta_i)\)$ $\(D_{y,i}=\nabla_yG_i(x_k,y_k;\xi_i)\)$ $\(D_{z,i}=\nabla_{yy}G_i(x_k,y_k;\xi_i)z_k+\nabla_yF_i(x_k,y_k;\zeta_i)\)$

2. Three-way Robust Aggregation Byzantine clients can contaminate all update directions. BR-FedBi applies \(AGG(\cdot)\) separately to \(D_x\), \(D_y\), and \(D_z\): $\(x_{k+1}=x_k-\eta_x AGG(D_{x,1},...,D_{x,n})\)$ $\(y_{k+1}=y_k-\eta_y AGG(D_{y,1},...,D_{y,n})\)$ $\(z_{k+1}=z_k-\eta_z AGG(D_{z,1},...,D_{z,n})\)$

3. Lower Bound and Momentum Matching The paper establishes an ineradicable error lower bound \(\Omega(\alpha^2(\delta_f^2+\delta_g^2))\). BR-FedBiM uses Polyak momentum to cancel stochastic variance, matching this lower bound and achieving optimal Byzantine resilience.

4. PAGE Variant BR-FedBiP uses probabilistic gradient estimation to achieve \(O(\epsilon^{-1.5})\) sample complexity, matching the best known complexity for expected stochastic bilevel optimization under mean-squared smoothness.

Key Experimental Results

Main Results

Dataset / Setup Aggregator Method ALIE IPM Worst Conclusion
MNIST Hetero Median BILANTINE 73.74 46.02 46.02 Baseline collapses under IPM
MNIST Hetero Median BR-FedBiP 83.24 82.24 81.36 Single-loop + PAGE is stable
MNIST Hetero RFA BILANTINE 66.96 32.02 19.96 Baseline nearly fails with RFA
MNIST Hetero RFA BR-FedBiM 72.76 81.95 72.76 Momentum variant has best worst-case

Ablation Study

Config / Analysis Key Metric Description
BR-FedBi vs BILANTINE Comm. ~3 comm/epoch vs ~8 Avoids inner-loop repeated communication
MNIST 200 epochs Total Time BILANTINE ~6590-7701s; BR-FedBi ~3102-3139s Wall-clock time roughly halved

Key Findings

  • Structural Robustness: The "single-loop + three-way aggregation" design provides stable gains across various attackers and aggregators.
  • Resilience Matching: BR-FedBiM consistently achieves the highest worst-case accuracy, matching the theoretical optimal Byzantine resilience.
  • Efficiency: Reducing communication via single-loop design significantly lowers the wall-clock time compared to sub-loop methods like BILANTINE.

Highlights & Insights

  • Optimal Resilience: Establishes theoretical tightness by matching the upper bound of Momentum variants with the established FBO-specific lower bound.
  • Asymmetric Defense: Protecting \(x\), \(y\), and \(z\) channels separately ensures that corruption in lower-level or Hessian approximations does not contaminate the upper-level gradient.

Limitations & Future Work

  • Convergence analysis currently requires strong convexity of the lower-level problem.
  • Large-scale models and diverse non-IID system dynamics remain areas for future evaluation.
  • vs BILANTINE: BILANTINE uses AID/sub-loops; this work uses SOBA-style single-loop updates, reducing communication while achieving matching lower-bound resilience.
  • vs Byzantine-robust single-level FL: This work demonstrates that bilevel problems require safeguarding nested variables (\(y\) and \(z\)), not just the final gradient.

Rating

  • Novelty: ⭐⭐⭐⭐⭐
  • Experimental Thoroughness: ⭐⭐⭐⭐
  • Writing Quality: ⭐⭐⭐⭐
  • Value: ⭐⭐⭐⭐⭐