GraphKeeper: Graph Domain-Incremental Learning via Knowledge Disentanglement and Preservation¶
Conference: NeurIPS 2025 arXiv: 2511.00097 Code: GitHub Area: Social Computing Keywords: Graph Incremental Learning, Domain-Incremental Learning, LoRA, Knowledge Disentanglement, Catastrophic Forgetting
TL;DR¶
GraphKeeper is proposed to address catastrophic forgetting in Graph Domain-Incremental Learning (Graph Domain-IL) through three components: domain-specific LoRA parameter isolation, intra/inter-domain disentanglement, and ridge regression-based deviation-free knowledge preservation. It outperforms the second-best method by 6.5%–16.6% and can be seamlessly integrated with graph foundation models.
Background & Motivation¶
Graph Incremental Learning (GIL) requires models to continually update as new graph data arrives. Existing methods focus on Task-IL and Class-IL settings, both operating within a single graph domain. However, with the rise of Graph Foundation Models (GFMs), models must integrate graph data from multiple heterogeneous domains, i.e., the Domain-IL setting.
Unique challenges of Domain-IL:
Embedding Shifts: Learning new domains requires large parameter changes, causing embeddings of graphs from previous domains to shift.
Decision Boundary Deviations: In end-to-end training, the classifier is updated jointly with the encoder, corrupting the decision boundaries of old domains.
Empirical validation shows that SSM, a representative GIL method, performs well under Class-IL but fails severely under Domain-IL. The structural and semantic divergence across domains far exceeds intra-domain class differences, rendering existing methods inadequate.
Method¶
Overall Architecture¶
GraphKeeper consists of three modules targeting the two sources of catastrophic forgetting:
- Multi-domain Graph Disentanglement: Prevents embedding shifts and cross-domain confusion.
- Deviation-Free Knowledge Preservation: Maintains stable decision boundaries.
- Domain-aware Distribution Discrimination: Matches test graphs to unknown domains at inference time.
Key Designs¶
Module 1: Multi-domain Graph Disentanglement
Multi-domain Feature Alignment: Since graph features across domains have different dimensionalities, truncated SVD is applied to project all features into a unified \(\bar{d}\)-dimensional space: $\(\tilde{F}_i = \text{Proj}(F_i), \quad \tilde{F}_i \in \mathbb{R}^{|G_i| \times \bar{d}}\)$
Domain-Specific LoRA: Independent LoRA modules are attached to the pretrained GNN for each domain: $\(h^l = \xi^l(h^{l-1}, W_i^l) + \phi_i^l(h^{l-1}, W_{i,\text{down}}^l W_{i,\text{up}}^l)\)$ LoRA parameters of previous domains are frozen when learning a new domain, structurally preventing embedding shifts in old domains.
Intra-domain Disentanglement: Contrastive learning is employed to enhance discriminability among different classes within the same domain: $\(\mathcal{L}_{\text{intra}} = -\sum_{j=1}^{|G_i|} \log \frac{\sum_{o \in S_j^{\text{pos}}} \exp(\text{sim}(x_j, x_o^{\text{aug}}))}{\sum_{o' \in S_j^{\text{pos}} \cup S_j^{\text{neg}}} \exp(\text{sim}(x_j, x_{o'}^{\text{aug}}))}\)$ where \(S^{\text{pos}}\) denotes same-class nodes, \(S^{\text{neg}}\) denotes different-class nodes, and \(x^{\text{aug}}\) is drawn from augmented views.
Inter-domain Disentanglement: Current-domain node embeddings are pushed away from prototype embeddings of previous domains (obtained via clustering): $\(\mathcal{L}_{\text{inter}} = \frac{1}{|G_i|} \sum_{j=1}^{|G_i|} \sum_{k=1}^{|P|} \frac{1}{\|x_j - P_k\|_2^2 + \epsilon}\)$ Minimizing this objective encourages sufficient separation between domains in the embedding space.
Module 2: Deviation-Free Knowledge Preservation
Core Idea: Decouple the classifier from the embedding model by replacing gradient-based updates with a closed-form ridge regression solution, thereby avoiding decision boundary deviations caused by backpropagation.
The optimal classifier weights after the \(i\)-th incremental domain are: $\(W_i = (X_{(1:i)}^T X_{(1:i)} + \lambda I)^{-1} X_{(1:i)}^T Y_{(1:i)}\)$
Since historical data is inaccessible, a recursive update is applied: $\(W_i = [W_{i-1} - M_i X_i^T X_i W_{i-1} \| M_i X_i^T Y_i]\)$ $\(M_i = M_{i-1} - M_{i-1} X_i^T (I + X_i M_{i-1} X_i^T)^{-1} X_i M_{i-1}\)$
This guarantees an exact update equivalent to the full-data closed-form solution, requiring no storage of any historical graph data.
Module 3: Domain-aware Distribution Discrimination
When the domain of a test graph is unknown, it must be matched to the correct domain-specific LoRA module. The procedure is: 1. Map features to a high-dimensional space using a randomly initialized and frozen GNN (to separate prototypes of similar domains). 2. Determine the domain of the test graph via nearest-prototype matching: $\(c_{\text{test}} = \arg\max_k \exp(-\|D_{\text{test}} - D_k\|_2^2)\)$
Loss & Training¶
The overall optimization objective is: $\(\mathcal{L} = \gamma_1 \mathcal{L}_{\text{intra}} + \gamma_2 \mathcal{L}_{\text{inter}}\)$
Note: The decision module (ridge regression) is not updated through backpropagation of \(\mathcal{L}\); instead, it is computed directly via the closed-form solution after embedding learning is complete. This decoupled design is key to preventing decision boundary deviations.
Key Experimental Results¶
Main Results¶
Average results over 6 domain sequences under Domain-IL:
| Method | Group 1 AA↑ | Group 3 AA↑ | Group 5 AA↑ |
|---|---|---|---|
| Fine-Tune | 23.9 | 20.9 | 19.7 |
| Joint (upper bound) | 66.6 | 78.0 | 74.5 |
| EWC | 23.3 | 20.8 | 20.6 |
| ER-GNN | 23.3 | 28.7 | 24.8 |
| DeLoMe | 49.3 | 70.2 | 63.2 |
| PDGNNs | 52.4 | 65.5 | 64.3 |
| TPP | 52.6 | 57.1 | 56.7 |
| GraphKeeper | 69.2 | 80.6 | 75.5 |
GraphKeeper surpasses the second-best method by 6.5%–16.6% and exceeds the Joint baseline (which has access to all historical data).
Integration with Graph Foundation Models (few-shot Domain-IL):
| Method | Group 1 AA↑ | Group 3 AA↑ |
|---|---|---|
| GCOPE (original) | 20.6 | 13.2 |
| GCOPE + GraphKeeper | significant gain | significant gain |
| MDGPT (original) | low AA / high AF | low AA / high AF |
| MDGPT + GraphKeeper | high AA / low AF | high AA / low AF |
Ablation Study¶
Ablation of individual modules (inferred from paper analysis): - Removing inter-domain disentanglement → increased domain embedding confusion, significant performance drop. - Removing domain-specific LoRA → uncontrolled embedding shifts, severe degradation. - Replacing ridge regression with gradient-based classifier → decision boundary deviation, aggravated forgetting. - Removing high-dimensional random projection → domain prototype confusion, increased domain misclassification at test time.
Key Findings¶
- Existing GIL methods fail comprehensively under Domain-IL: EWC, GEM, LWF, and similar methods show negligible improvement over Fine-Tune.
- GraphKeeper surpasses the Joint upper bound: This indicates that a single GNN struggles to effectively fuse knowledge from multiple domains, making parameter isolation necessary.
- Seamless integration with GFMs: GraphKeeper endows pretrained graph foundation models with continual learning capability while preserving their few-shot advantages.
- Near-zero forgetting: The AF metric approaches 0, far outperforming all baselines including those with memory replay.
- The relatively stronger performance of DeLoMe/PDGNNs relies on SGC/APPNP backbones (trading plasticity for stability); replacing them with GCN leads to significant performance degradation.
Highlights & Insights¶
- Novel problem formulation: The first systematic study of Graph Domain-Incremental Learning (Domain-IL), distinguishing it from conventional Task-IL/Class-IL settings.
- Thorough disentanglement analysis: Forgetting is decomposed into two orthogonal dimensions—embedding shifts and decision boundary deviations—with targeted solutions for each.
- Elegant application of closed-form ridge regression: The recursive update formula guarantees exact equivalence to the full-data solution without storing any historical data, achieving \(O(1)\) space complexity.
- High-dimensional random projection for domain discrimination: Simple yet effective, leveraging random projections to separate domain prototypes.
- Promising integration with GFMs: Provides a viable path toward continual updating of graph foundation models.
Limitations & Future Work¶
- Each new domain requires an additional LoRA module, incurring linearly growing storage and inference overhead as the number of domains increases.
- Domain prototypes are obtained via clustering; clustering quality directly affects inter-domain disentanglement performance.
- Domain-aware discrimination relies on prototype distances and may fail when domain distributions overlap substantially.
- Feature alignment via truncated SVD inevitably incurs information loss.
- Validation is limited to node classification; graph-level tasks (graph classification, link prediction) are not addressed.
Related Work & Insights¶
- Graph Incremental Learning: Representative methods such as SSM and ER-GNN are shown to fail under Domain-IL in this work.
- Multi-domain Graph Pretraining: GFMs such as GCOPE and MDGPT are augmented with incremental learning capability through GraphKeeper.
- LoRA for Graph Learning: The domain-specific LoRA design draws inspiration from continual learning approaches in NLP.
- Insights: The recursive ridge regression knowledge preservation mechanism may be applicable to other incremental learning scenarios that admit closed-form updates, such as incremental updates in recommender systems.
Rating¶
- Novelty: ⭐⭐⭐⭐⭐ — First to define and systematically address Graph Domain-Incremental Learning.
- Technical Depth: ⭐⭐⭐⭐ — Each of the three modules is theoretically grounded; the recursive update derivation is rigorous.
- Experimental Thoroughness: ⭐⭐⭐⭐ — 15 datasets, multiple domain sequences, extensive baseline comparisons, and GFM integration experiments.
- Practicality: ⭐⭐⭐⭐ — Directly integrable with existing GFMs; code is publicly available.
- Overall: ⭐⭐⭐⭐