Towards Effective Federated Graph Foundation Model via Mitigating Knowledge Entanglement¶
Conference: NeurIPS 2025 arXiv: 2505.12684 Code: N/A Area: Model Compression Keywords: Federated Graph Learning, Graph Foundation Model, Knowledge Entanglement, Vector Quantization, Prompt Learning
TL;DR¶
This work is the first to propose the Federated Graph Foundation Model (FedGFM) paradigm, which integrates the distributed collaborative capability of federated graph learning with the cross-domain generalization capability of graph foundation models. Two modules — AncDAI (Anchor-based Domain-Aware Initialization) and AdaDPP (Adaptive Domain-sensitive Prompt Pool) — are introduced to mitigate knowledge entanglement, achieving state-of-the-art performance on 8 cross-task, cross-domain datasets against 20 baselines.
Background & Motivation¶
Graph machine learning faces inherent limitations under two dominant paradigms:
Limitations of Federated Graph Learning (FGL):
Data heterogeneity: Graphs across different clients vary greatly in feature dimensionality, label space, and topological patterns; most FGL methods are restricted to collaborative training on subsets of a single dataset.
Task heterogeneity: Existing FGL methods assume a unified graph granularity and downstream task (node-level / subgraph-level / graph-level), making multi-task collaboration difficult.
Limitations of Graph Foundation Models (GFM):
Multi-domain data silos: Training GFMs requires multi-domain graph data, but in practice data is distributed across institutions and cannot be shared due to privacy regulations.
Neglect of cross-institution resources: Centralized training cannot leverage storage and compute resources distributed across multiple institutions.
Complementary relationship: FGL provides a distributed training paradigm for GFMs, while GFMs provide a unified feature encoding and pre-train–fine-tune framework for FGL. Combining the two is therefore a natural choice.
Knowledge entanglement challenge: Naively distributing gVQ-VAE pre-training to a federated setting causes multi-domain knowledge to be encoded into indistinguishable representations. Empirical evidence (Figure 2(b)) shows that GFT (centralized training) produces clearly differentiated inter-domain cosine similarities, whereas GFT* (federated training) yields inter-domain similarities close to 1, indicating a collapse of domain-specific representations.
Method¶
Overall Architecture¶
FedGFM+ adopts a federated pre-training followed by fine-tuning paradigm: 1. Federated pre-training: Each client performs self-supervised learning on private graphs; the server aggregates local models to build a global graph foundation model. 2. Fine-tuning: The global model serves as a GFM and is adapted to specific downstream tasks via supervised learning. 3. Dual-perspective disentanglement: AncDAI (global) and AdaDPP (local) collaboratively mitigate knowledge entanglement.
The backbone network is gVQ-VAE, chosen because: (1) it jointly encodes graph structure and textual attributes into discrete semantic representations; and (2) it has an extremely small parameter count (e.g., GFT has only 7M parameters), making it naturally suited to communication-constrained federated settings.
Key Designs¶
1. AncDAI: Anchor-based Domain-Aware Initialization (Global Perspective)¶
Core Idea: Before pre-training, domain prototypes from each client are used as semantic anchors to initialize the global codebook, injecting domain-discriminative inductive bias.
Steps: 1. Each client encodes its local graph using the globally initialized model: \(\mathbf{Z}^k = f_{\theta^{glb}}(\mathbf{X}^k, \mathbf{A}^k)\) 2. Mean pooling yields the domain prototype: \(\mathbf{p}^k = \frac{1}{|\mathcal{V}^k|} \sum_{i \in \mathcal{V}^k} \mathbf{z}_i^k\) 3. Theoretical guarantee: Even with randomly initialized shared parameters, domain prototypes remain distinguishable across clients (Theorem B.1). 4. Perturbed embeddings are generated around each anchor: \(\tilde{\mathbf{p}}_i^k = \mathbf{p}^k + \sigma \epsilon_i, \quad \epsilon_i \sim \mathcal{N}(\mathbf{0}, \mathbf{1})\) 5. Synthetic embeddings from all domains are aggregated to initialize the global codebook.
Design Motivation: To endow the codebook with a domain-aware structure from the outset, laying the foundation for maintaining inter-domain separation throughout federated training.
2. AdaDPP: Adaptive Domain-sensitive Prompt Pool (Local Perspective)¶
Pre-training phase: Each client independently learns a set of domain-specific prompts \(\Phi^k = \{\phi_i^k\}_{i=1}^\lambda\), which are excluded from federated aggregation. Node features are augmented as:
Fine-tuning phase: Prompts from all clients are collected to build a global prompt pool \(\rho\). For a target graph, the most relevant prompts are selected via an attention mechanism:
Design Motivation: Excluding prompts from federated aggregation prevents information mixing; combining them at fine-tuning time enables adaptive transfer of cross-domain knowledge, realizing a "preserve domain specificity first, then adaptively transfer" strategy.
Loss & Training¶
Pre-training loss (self-supervised reconstruction, Eq. 2): $\(\mathcal{L}_{pretrain} = \mathcal{L}_{feat} + \mathcal{L}_{topo} + \text{codebook alignment} + \text{commitment loss}\)$
- \(\mathcal{L}_{feat}\): Node feature reconstruction (cosine similarity)
- \(\mathcal{L}_{topo}\): Topological reconstruction (adjacency matrix)
- A straight-through estimator is used to enable end-to-end gradient flow.
Federated aggregation follows the FedAvg strategy: \(\Theta^g \leftarrow \frac{N_k}{N} \sum_{k=1}^K \Theta^k\)
Key Experimental Results¶
Main Results¶
Performance comparison on 8 cross-domain, cross-task datasets:
| Method | Cora | PubMed | OGB-arxiv | WikiCS | FB15K | WN18RR | HIV | PCBA |
|---|---|---|---|---|---|---|---|---|
| GCN | 80.17 | 84.70 | 72.50 | 77.24 | 71.24 | 82.27 | 65.37 | 63.41 |
| FedAvg | 81.45 | 85.22 | 71.53 | 77.67 | 73.14 | 83.55 | 66.05 | 68.52 |
| GFT* (federated variant) | 81.07 | 84.24 | 73.19 | 78.81 | 73.52 | 86.30 | 66.32 | 72.81 |
| GQT* (federated variant) | 81.92 | 85.59 | 74.07 | 77.52 | 73.40 | 85.66 | 67.93 | 73.22 |
| FedGFM+ | 83.79 | 88.52 | 76.31 | 80.70 | 75.25 | 89.25 | 69.39 | 77.68 |
Gains over the best baseline: node classification ≥2.70%, edge classification ≥2.18%, graph classification ≥3.09%.
Ablation Study¶
| Configuration | Cora | PubMed | OGB-arxiv | HIV | PCBA | Note |
|---|---|---|---|---|---|---|
| w/o AncDAI | 81.55 | 85.56 | 75.19 | 67.52 | 74.81 | Largest performance drop |
| w/o AdaDPP | 83.17 | 87.42 | 75.83 | 67.84 | 76.72 | Without prompt pool |
| FedGFM+ | 83.79 | 88.52 | 76.31 | 69.39 | 77.68 | Full method |
Key Findings¶
- Knowledge entanglement is the core bottleneck: Naïve federated GFM variants (e.g., GFT*) can suffer negative transfer, underperforming isolated supervised models.
- AncDAI contributes more: Removing AncDAI leads to a larger performance drop than removing AdaDPP, indicating that global initialization is critical for combating entanglement.
- Communication efficiency: GFM parameters are on the order of millions (vs. billions for LLMs), making federated communication overhead acceptable.
- Hyperparameter robustness: Performance remains stable across a wide range of codebook sizes and prompt counts.
Highlights & Insights¶
- Paradigm-level contribution: FedGFM is the first systematic proposal to combine FGL and GFM, integrating their complementary strengths.
- Precise problem formulation: Empirical analysis (Figure 2) clearly identifies knowledge entanglement as a non-trivial challenge.
- Dual-perspective solution: The combination of global (initialization) and local (prompts) strategies is well-motivated and coherent.
- Theoretical support: The work proves that domain prototypes remain distinguishable under random initialization (Theorem B.1) and that the initialization strategy provides structured inductive bias (Theorem B.2).
Limitations & Future Work¶
- Privacy risks: The exchange of prototypes and prompts may expose partial semantic information; formal privacy analysis is needed.
- Current experiments assign each dataset to 3 clients; real-world scenarios may involve a much larger number of clients with greater heterogeneity.
- Differential privacy (DP) or secure computation could be incorporated to protect the transmission of prototypes and prompts.
- Only the gVQ-VAE backbone is evaluated; whether the approach generalizes to larger-scale GFM architectures warrants further exploration.
Related Work & Insights¶
- The work transfers federated foundation model training ideas from NLP/CV to the graph domain, where heterogeneity is considerably stronger.
- Excluding prompt learning from federated aggregation is analogous to local fine-tuning strategies in personalized federated learning.
- The domain prototype concept in AncDAI can be generalized to other federated learning scenarios that require handling domain heterogeneity.
- The codebook initialization strategy has broader applicability to federated training of VQ-VAE-based models.
Rating¶
- Novelty: ⭐⭐⭐⭐⭐ (The FedGFM paradigm is proposed for the first time; both the problem formulation and method design are original.)
- Experimental Thoroughness: ⭐⭐⭐⭐⭐ (8 datasets, 20 baselines, 3 task types, ablation and hyperparameter analysis.)
- Writing Quality: ⭐⭐⭐⭐ (Well-structured, though the method involves multiple interleaved concepts — federated learning, GFM, VQ-VAE, and prompt learning — making the paper information-dense.)
- Value: ⭐⭐⭐⭐ (Opens a promising research direction, but privacy and scalability challenges remain before real-world deployment.)