Skip to content

Federated ADMM from Bayesian Duality

Conference: ICLR 2026 arXiv: 2506.13150 Code: Available Area: Others Keywords: ADMM, Variational Bayes, Natural Gradient, Federated Learning, Bayesian Duality

TL;DR

This paper derives a Bayesian dual structure for ADMM from a variational Bayes (VB) perspective, proving that classical ADMM is a special case of VB over isotropic Gaussian families. Two novel extensions are introduced: a Newton-like variant (one-round convergence on quadratic objectives) and an Adam-like variant (IVON-ADMM, achieving +7% accuracy in heterogeneous deep learning settings).

Background & Motivation

State of the Field

Background: ADMM has served as a core algorithmic backbone for federated learning since its introduction in the 1970s, with its form remaining largely unchanged. Its robust structure naturally raises the question of whether a more general formulation exists.

Limitations of Prior Work: Accelerated variants of ADMM (over-relaxation, momentum, scaled norms, etc.) introduce additional variables without altering the algorithmic form. Swaroop et al. observed line-by-line similarities between VB and ADMM but could not establish an exact correspondence.

Key Challenge: The deterministic optimization framework underlying ADMM does not extend naturally to heterogeneous deep learning scenarios. A more general framework is needed to unify and generalize ADMM.

Key Insight: The key insight is that solutions to the VB objective exhibit a dual structure that not only resembles the fixed-point structure of ADMM but also naturally generalizes it. The critical missing link is the natural gradient.

Core Idea: The duality between natural parameters and expectation parameters in exponential family distributions establishes a "Bayesian duality" structure, of which ADMM is a special case under isotropic Gaussians.

Mechanism

Goal: ### Overall Architecture Classical ADMM: primal-dual structure over \((\theta_g^*, \theta_k^*, \mathbf{v}_k^*, \mathbf{v}_g^*)\); Bayesian ADMM: expectation-natural parameter dual structure over \((\mu_g^*, \mu_k^*, \eta_k^*, \lambda_g^*)\).

Method

Overall Architecture

Classical ADMM operates with a primal-dual structure over \((\theta_g^*, \theta_k^*, \mathbf{v}_k^*, \mathbf{v}_g^*)\); Bayesian ADMM adopts an expectation-natural parameter dual structure over \((\mu_g^*, \mu_k^*, \eta_k^*, \lambda_g^*)\). The core distinction is: gradient \(\to\) natural gradient, parameters \(\to\) distributions.

Key Designs

  1. Bayesian Dual Structure:

    • VB fixed-point condition: \(\lambda_g^* = -\sum_{k=0}^{K} \nabla \mathcal{L}_k(\mu_g^*)\)
    • Introducing local distributions \(q_k^*\) and dual variables \(\eta_k^*\) yields a four-condition structure analogous to ADMM
    • When \(q\) is chosen as an isotropic Gaussian, the natural gradient reduces to the ordinary gradient, recovering classical ADMM
  2. Newton-like Extension (Full-Covariance Gaussian):

    • Function: \(q\) is parameterized as a full-covariance Gaussian distribution
    • Mechanism: The natural gradient incorporates the inverse Fisher information matrix, which is equivalent to Newton's method on quadratic objectives, enabling one-round communication convergence
    • Design Motivation: Classical ADMM requires multiple rounds of iteration even on quadratic objectives
  3. Adam-like Extension (Diagonal Gaussian, IVON-ADMM):

    • Function: \(q\) is parameterized as a diagonal-covariance Gaussian, implemented efficiently via the IVON method
    • Mechanism: Diagonal Fisher approximation produces adaptive learning rates analogous to Adam
    • Design Motivation: Full Fisher information is computationally prohibitive; diagonal approximation is more practical for deep learning

Loss & Training

  • Client: minimizes local loss plus KL regularization (Bayesian formulation)
  • Server: aggregates natural gradient parameters rather than raw gradients

Key Experimental Results

Main Results

Deep heterogeneous federated learning:

Method Accuracy Runtime Notes
FedADMM Baseline Baseline Classical ADMM
FedAvg Baseline-level Baseline-level Standard federated
IVON-ADMM +7% Comparable Adam-like extension

Theoretical Validation (Quadratic Objectives)

Method Rounds to Converge Notes
Classical ADMM Multiple rounds Linear convergence
Newton-like ADMM 1 round One-step convergence

Key Findings

  • IVON-ADMM achieves +7% accuracy in deep heterogeneous settings (non-IID data) without additional communication or computational overhead
  • The Newton-like variant achieves one-round convergence on quadratic objectives, confirming the theoretical prediction
  • The natural gradient is the key link connecting VB and ADMM — precisely the missing element identified in Swaroop et al.

Highlights & Insights

  • Mathematical Elegance: Classical ADMM turns out to be a special case of a Bayesian method under the simplest distribution family. This connection is not only aesthetically appealing but also opens a new avenue for generalizing optimization algorithms via families of probability distributions.
  • The Central Role of Natural Gradients: Prior work using ordinary gradients failed to establish an exact correspondence; substituting natural gradients resolves this immediately, underscoring the deep role of information geometry in algorithm design.
  • A Free Lunch: IVON-ADMM leverages IVON's efficient diagonal Fisher implementation, incurring no additional runtime while substantially improving performance in heterogeneous settings.

Limitations & Future Work

  • Deep learning experiments are conducted at a relatively small scale (7-layer CNN); performance on larger models (e.g., LLMs) remains unknown
  • Diagonal Fisher approximation may be insufficiently accurate for certain model architectures
  • The framework requires selecting an exponential family distribution as a prior assumption; guidelines for choosing the distribution are not clearly specified
  • Communication efficiency analysis is relatively straightforward and could be further developed
  • vs. FedADMM: Classical ADMM is recovered as a special case; Bayesian ADMM provides a rigorous generalization
  • vs. FedAvg: IVON-ADMM demonstrates significant advantages in heterogeneous settings
  • vs. PVI (Swaroop 2025): This work resolves the missing exact correspondence between PVI and ADMM

Rating

  • Novelty: ⭐⭐⭐⭐⭐ The Bayesian dual structure is original and elegant, unifying two major paradigms
  • Experimental Thoroughness: ⭐⭐⭐ Theoretical validation is rigorous, but deep learning experiments are limited in scale
  • Writing Quality: ⭐⭐⭐⭐⭐ Theoretical derivations are clear and the duality diagrams are intuitive
  • Value: ⭐⭐⭐⭐ Provides a new theoretical foundation and practical extensions for federated optimization