Skip to content

FedDAG: Clustered Federated Learning via Global Data and Gradient Integration for Heterogeneous Environments

Conference: ICLR 2026 arXiv: 2602.23504 Code: https://tinyurl.com/2rbkb3zu Area: Optimization / Federated Learning Keywords: Clustered Federated Learning, Data Heterogeneity, Dual-Encoder Architecture, Cross-Cluster Knowledge Sharing, Adaptive Clustering

TL;DR

FedDAG is a clustered federated learning framework that performs more accurate client clustering via weighted class-wise similarity fusion of data and gradient signals, and enables cross-cluster feature transfer through a dual-encoder architecture, consistently outperforming existing baselines across diverse heterogeneity settings.

Background & Motivation

Background: Federated learning (FL) enables collaborative model training without sharing raw data, but client data heterogeneity (non-IID) leads to slow convergence and degraded accuracy. Clustered FL addresses this by grouping similar clients, with each cluster training its own model.

Limitations of Prior Work: Existing clustered FL methods suffer from four key limitations: 1) similarity computation relies on either data or gradient signals alone, lacking comprehensiveness; 2) knowledge sharing is restricted within clusters, failing to leverage diverse representations across clusters; 3) only label skew is typically addressed, while concept drift and quantity imbalance are neglected; 4) the number of clusters must be specified in advance.

Key Challenge: Data-based and gradient-based similarity measures each have blind spots — gradient similarity in high-dimensional spaces can produce false positives, while data similarity is insensitive to concept drift. Neither signal alone can accurately characterize true inter-client similarity.

Goal: How to dynamically cluster clients by jointly leveraging data and gradient information, while enabling cross-cluster representation sharing?

Key Insight: Refine similarity computation to the class level, learn adaptive weights for data and gradient signals, and employ a dual-encoder architecture to facilitate cross-cluster feature transfer.

Core Idea: Perform more accurate client clustering via class-wise weighted fusion of data and gradient similarity, and enable cross-cluster knowledge sharing while preserving cluster specialization through a dual-encoder architecture.

Method

Overall Architecture

FedDAG consists of two core components: (1) Similarity Computation and Adaptive Clustering — fuses data and gradient information to compute a weighted inter-client similarity matrix, generates candidate groupings via hierarchical clustering, and automatically determines the optimal number of clusters using a federated-aware metric; (2) Dual-Encoder Training — each cluster model contains a primary encoder (updated with intra-cluster data) and a secondary encoder (updated with gradients from complementary clusters), whose outputs are concatenated before being passed to the classifier.

Key Designs

  1. Weighted Class-wise Data-Gradient Fusion Similarity:

    • Function: Computes a more accurate inter-client similarity matrix by jointly leveraging data and gradient information.
    • Mechanism: Extends the data similarity approach of PACFL to the class level — comparing per-class data subspaces rather than global subspaces. Each client learns a weight \(w_i\) controlling the contribution of data vs. gradient signals, yielding a final similarity of \(S_{ij} = w_i \cdot S_{ij}^{data} + (1 - w_i) \cdot S_{ij}^{grad}\). Weights are optimized by minimizing an entropy-based loss to sharpen the adjacency matrix.
    • Design Motivation: Class-level comparison naturally handles concept drift (same label with different semantics), while weighted fusion allows each client to adaptively emphasize the most informative signal source.
  2. Dual-Encoder Cross-Cluster Knowledge Sharing:

    • Function: Enables each cluster model to simultaneously learn intra-cluster specialized features and inter-cluster complementary features.
    • Mechanism: Each cluster model comprises a primary encoder \(\phi^{(1)}\) and a secondary encoder \(\phi^{(2)}\). The primary encoder is updated with aggregated gradients \(\Theta_z^{1f}\) from clients within the cluster; the secondary encoder is updated with gradients \(\Theta_z^{2f}\) from complementary clusters. Their outputs are concatenated along the feature dimension and fed into the classifier: \(F_z(\cdot) = \psi(\phi^{(1)}(\cdot; \Theta_z^{1f}), \phi^{(2)}(\cdot; \Theta_z^{2f}); \Theta_z^c)\).
    • Design Motivation: Existing methods either restrict knowledge sharing within clusters or employ soft clustering that introduces noisy mixing. The dual-encoder design preserves cluster specialization via the primary encoder while introducing a complementary perspective via the secondary encoder, without mutual contamination.
  3. Federated-Aware Adaptive Clustering:

    • Function: Automatically determines the optimal number of clusters without requiring it to be specified in advance.
    • Mechanism: Hierarchical clustering generates candidate groupings at multiple granularities. A proposed federated-aware metric evaluates each candidate by rewarding compact clusters and penalizing over-partitioning (degenerate clusters with too few clients). The grouping with the highest metric score is selected as the final partition.
    • Design Motivation: Prespecifying the number of clusters is impractical in real-world deployments, and hierarchical clustering in FL settings tends to over-split.

Loss & Training

Standard cross-entropy loss is aggregated within each cluster. Weight optimization employs entropy-based regularization to promote binarization of the similarity matrix. Gradients are transmitted in compressed form to reduce communication overhead; each client computes gradients for at most one model per round.

Key Experimental Results

Main Results

Algorithm Technique CIFAR-10 FMNIST
PACFL Data (D) 90.45±0.30 94.41±0.31
CFL Gradient (G) 72.80±0.66 86.97±0.23
IFCA Gradient (G) 89.68±0.17 94.03±0.09
FedDAG D+G+Global Feature Sharing 94.53±0.12 96.82±0.18

Ablation Study

Configuration CIFAR-10 Note
FedDAG (Full) 94.53 Complete framework
Data similarity only ~91.0 Degenerates to PACFL++
Gradient similarity only ~88.5 Degenerates to improved CFL
Without dual-encoder ~92.0 No cross-cluster features
Without adaptive cluster count ~93.0 Uses predefined cluster count

Key Findings

  • FedDAG outperforms the strongest baseline PACFL by over 4 percentage points on CIFAR-10.
  • Fusing data and gradient signals consistently surpasses either signal alone, especially under concept drift.
  • The dual-encoder architecture yields a 2–3% improvement over single-encoder variants, confirming the value of cross-cluster knowledge sharing.
  • The framework is effective across all four heterogeneity types: label skew, feature skew, concept drift, and quantity imbalance.

Highlights & Insights

  • Class-wise Similarity Computation: Refining similarity to the class level is a natural approach to handling concept drift and is more robust than global subspace comparison.
  • Responsibility Separation in the Dual-Encoder Design: Assigning distinct signal sources to the primary and secondary encoders avoids the noisy mixing problem inherent in soft-clustering methods.

Limitations & Future Work

  • The dual-encoder architecture increases model parameter count and computational overhead.
  • Class-level comparison incurs growing computational cost as the number of classes increases.
  • Clients must upload a small amount of information for similarity computation; though compressed, this still poses privacy risks.
  • The approach has not been evaluated in real-world federated scenarios such as cross-device FL.
  • vs. PACFL: PACFL compares global subspaces via principal angles; FedDAG refines this to class-level comparison with weighted fusion, yielding a more comprehensive similarity measure.
  • vs. FedSoft/FedRC: These methods employ soft clustering to allow clients to mix multiple cluster models, potentially introducing noise. FedDAG's dual-encoder structurally separates the two signal sources, avoiding this issue.

Rating

  • Novelty: ⭐⭐⭐ Class-wise fusion and the dual-encoder design represent reasonable but incremental contributions.
  • Experimental Thoroughness: ⭐⭐⭐⭐ Evaluation across four heterogeneity types is comprehensive.
  • Writing Quality: ⭐⭐⭐ Content is thorough but the structure is somewhat complex.
  • Value: ⭐⭐⭐ Offers practical improvements for clustered FL, though the scope is relatively specific.