FedDAG: Clustered Federated Learning via Global Data and Gradient Integration for Heterogeneous Environments¶

Conference: ICLR 2026 arXiv: 2602.23504 Code: https://tinyurl.com/2rbkb3zu Area: Optimization / Federated Learning Keywords: Clustered Federated Learning, Data Heterogeneity, Dual-Encoder Architecture, Cross-Cluster Knowledge Sharing, Adaptive Clustering

TL;DR¶

FedDAG is a clustered federated learning framework that performs more accurate client clustering via weighted class-wise similarity fusion of data and gradient signals, and enables cross-cluster feature transfer through a dual-encoder architecture, consistently outperforming existing baselines across diverse heterogeneity settings.

Background & Motivation¶

Background: Federated learning (FL) enables collaborative model training without sharing raw data, but client data heterogeneity (non-IID) leads to slow convergence and degraded accuracy. Clustered FL addresses this by grouping similar clients, with each cluster training its own model.

Limitations of Prior Work: Existing clustered FL methods suffer from four key limitations: 1) similarity computation relies on either data or gradient signals alone, lacking comprehensiveness; 2) knowledge sharing is restricted within clusters, failing to leverage diverse representations across clusters; 3) only label skew is typically addressed, while concept drift and quantity imbalance are neglected; 4) the number of clusters must be specified in advance.

Key Challenge: Data-based and gradient-based similarity measures each have blind spots — gradient similarity in high-dimensional spaces can produce false positives, while data similarity is insensitive to concept drift. Neither signal alone can accurately characterize true inter-client similarity.

Goal: How to dynamically cluster clients by jointly leveraging data and gradient information, while enabling cross-cluster representation sharing?

Key Insight: Refine similarity computation to the class level, learn adaptive weights for data and gradient signals, and employ a dual-encoder architecture to facilitate cross-cluster feature transfer.

Core Idea: Perform more accurate client clustering via class-wise weighted fusion of data and gradient similarity, and enable cross-cluster knowledge sharing while preserving cluster specialization through a dual-encoder architecture.

Method¶

Overall Architecture¶

FedDAG consists of two core components: (1) Similarity Computation and Adaptive Clustering — fuses data and gradient information to compute a weighted inter-client similarity matrix, generates candidate groupings via hierarchical clustering, and automatically determines the optimal number of clusters using a federated-aware metric; (2) Dual-Encoder Training — each cluster model contains a primary encoder (updated with intra-cluster data) and a secondary encoder (updated with gradients from complementary clusters), whose outputs are concatenated before being passed to the classifier.

Key Designs¶

Weighted Class-wise Data-Gradient Fusion Similarity:
- Function: Computes a more accurate inter-client similarity matrix by jointly leveraging data and gradient information.
- Mechanism: Extends the data similarity approach of PACFL to the class level — comparing per-class data subspaces rather than global subspaces. Each client learns a weight \(w_i\) controlling the contribution of data vs. gradient signals, yielding a final similarity of \(S_{ij} = w_i \cdot S_{ij}^{data} + (1 - w_i) \cdot S_{ij}^{grad}\). Weights are optimized by minimizing an entropy-based loss to sharpen the adjacency matrix.
- Design Motivation: Class-level comparison naturally handles concept drift (same label with different semantics), while weighted fusion allows each client to adaptively emphasize the most informative signal source.
Dual-Encoder Cross-Cluster Knowledge Sharing:
- Function: Enables each cluster model to simultaneously learn intra-cluster specialized features and inter-cluster complementary features.
- Mechanism: Each cluster model comprises a primary encoder \(\phi^{(1)}\) and a secondary encoder \(\phi^{(2)}\). The primary encoder is updated with aggregated gradients \(\Theta_z^{1f}\) from clients within the cluster; the secondary encoder is updated with gradients \(\Theta_z^{2f}\) from complementary clusters. Their outputs are concatenated along the feature dimension and fed into the classifier: \(F_z(\cdot) = \psi(\phi^{(1)}(\cdot; \Theta_z^{1f}), \phi^{(2)}(\cdot; \Theta_z^{2f}); \Theta_z^c)\).
- Design Motivation: Existing methods either restrict knowledge sharing within clusters or employ soft clustering that introduces noisy mixing. The dual-encoder design preserves cluster specialization via the primary encoder while introducing a complementary perspective via the secondary encoder, without mutual contamination.
Federated-Aware Adaptive Clustering:
- Function: Automatically determines the optimal number of clusters without requiring it to be specified in advance.
- Mechanism: Hierarchical clustering generates candidate groupings at multiple granularities. A proposed federated-aware metric evaluates each candidate by rewarding compact clusters and penalizing over-partitioning (degenerate clusters with too few clients). The grouping with the highest metric score is selected as the final partition.
- Design Motivation: Prespecifying the number of clusters is impractical in real-world deployments, and hierarchical clustering in FL settings tends to over-split.

Loss & Training¶

Standard cross-entropy loss is aggregated within each cluster. Weight optimization employs entropy-based regularization to promote binarization of the similarity matrix. Gradients are transmitted in compressed form to reduce communication overhead; each client computes gradients for at most one model per round.

Key Experimental Results¶

Main Results¶

Algorithm	Technique	CIFAR-10	FMNIST
PACFL	Data (D)	90.45±0.30	94.41±0.31
CFL	Gradient (G)	72.80±0.66	86.97±0.23
IFCA	Gradient (G)	89.68±0.17	94.03±0.09
FedDAG	D+G+Global Feature Sharing	94.53±0.12	96.82±0.18

Ablation Study¶

Configuration	CIFAR-10	Note
FedDAG (Full)	94.53	Complete framework
Data similarity only	~91.0	Degenerates to PACFL++
Gradient similarity only	~88.5	Degenerates to improved CFL
Without dual-encoder	~92.0	No cross-cluster features
Without adaptive cluster count	~93.0	Uses predefined cluster count

Key Findings¶

FedDAG outperforms the strongest baseline PACFL by over 4 percentage points on CIFAR-10.
Fusing data and gradient signals consistently surpasses either signal alone, especially under concept drift.
The dual-encoder architecture yields a 2–3% improvement over single-encoder variants, confirming the value of cross-cluster knowledge sharing.
The framework is effective across all four heterogeneity types: label skew, feature skew, concept drift, and quantity imbalance.

Highlights & Insights¶

Class-wise Similarity Computation: Refining similarity to the class level is a natural approach to handling concept drift and is more robust than global subspace comparison.
Responsibility Separation in the Dual-Encoder Design: Assigning distinct signal sources to the primary and secondary encoders avoids the noisy mixing problem inherent in soft-clustering methods.

Limitations & Future Work¶

The dual-encoder architecture increases model parameter count and computational overhead.
Class-level comparison incurs growing computational cost as the number of classes increases.
Clients must upload a small amount of information for similarity computation; though compressed, this still poses privacy risks.
The approach has not been evaluated in real-world federated scenarios such as cross-device FL.

vs. PACFL: PACFL compares global subspaces via principal angles; FedDAG refines this to class-level comparison with weighted fusion, yielding a more comprehensive similarity measure.
vs. FedSoft/FedRC: These methods employ soft clustering to allow clients to mix multiple cluster models, potentially introducing noise. FedDAG's dual-encoder structurally separates the two signal sources, avoiding this issue.

Rating¶

Novelty: ⭐⭐⭐ Class-wise fusion and the dual-encoder design represent reasonable but incremental contributions.
Experimental Thoroughness: ⭐⭐⭐⭐ Evaluation across four heterogeneity types is comprehensive.
Writing Quality: ⭐⭐⭐ Content is thorough but the structure is somewhat complex.
Value: ⭐⭐⭐ Offers practical improvements for clustered FL, though the scope is relatively specific.