GraphKeeper: Graph Domain-Incremental Learning via Knowledge Disentanglement and Preservation¶

Conference: NeurIPS 2025 arXiv: 2511.00097 Code: GitHub Area: Social Computing Keywords: Graph Incremental Learning, Domain-Incremental Learning, LoRA, Knowledge Disentanglement, Catastrophic Forgetting

TL;DR¶

GraphKeeper is proposed to address catastrophic forgetting in Graph Domain-Incremental Learning (Graph Domain-IL) through three components: domain-specific LoRA parameter isolation, intra/inter-domain disentanglement, and ridge regression-based deviation-free knowledge preservation. It outperforms the second-best method by 6.5%–16.6% and can be seamlessly integrated with graph foundation models.

Background & Motivation¶

Graph Incremental Learning (GIL) requires models to continually update as new graph data arrives. Existing methods focus on Task-IL and Class-IL settings, both operating within a single graph domain. However, with the rise of Graph Foundation Models (GFMs), models must integrate graph data from multiple heterogeneous domains, i.e., the Domain-IL setting.

Unique challenges of Domain-IL:

Embedding Shifts: Learning new domains requires large parameter changes, causing embeddings of graphs from previous domains to shift.

Decision Boundary Deviations: In end-to-end training, the classifier is updated jointly with the encoder, corrupting the decision boundaries of old domains.

Empirical validation shows that SSM, a representative GIL method, performs well under Class-IL but fails severely under Domain-IL. The structural and semantic divergence across domains far exceeds intra-domain class differences, rendering existing methods inadequate.

Method¶

Overall Architecture¶

GraphKeeper consists of three modules targeting the two sources of catastrophic forgetting:

Multi-domain Graph Disentanglement: Prevents embedding shifts and cross-domain confusion.
Deviation-Free Knowledge Preservation: Maintains stable decision boundaries.
Domain-aware Distribution Discrimination: Matches test graphs to unknown domains at inference time.

Key Designs¶

Module 1: Multi-domain Graph Disentanglement

Multi-domain Feature Alignment: Since graph features across domains have different dimensionalities, truncated SVD is applied to project all features into a unified $\bar{d}$-dimensional space: $$\tilde{F}_i = \text{Proj}(F_i), \quad \tilde{F}_i \in \mathbb{R}^{|G_i| \times \bar{d}}$$

Domain-Specific LoRA: Independent LoRA modules are attached to the pretrained GNN for each domain: $$h^l = \xi^l(h^{l-1}, W_i^l) + \phi_i^l(h^{l-1}, W_{i,\text{down}}^l W_{i,\text{up}}^l)$$ LoRA parameters of previous domains are frozen when learning a new domain, structurally preventing embedding shifts in old domains.

Intra-domain Disentanglement: Contrastive learning is employed to enhance discriminability among different classes within the same domain: $$\mathcal{L}_{\text{intra}} = -\sum_{j=1}^{|G_i|} \log \frac{\sum_{o \in S_j^{\text{pos}}} \exp(\text{sim}(x_j, x_o^{\text{aug}}))}{\sum_{o' \in S_j^{\text{pos}} \cup S_j^{\text{neg}}} \exp(\text{sim}(x_j, x_{o'}^{\text{aug}}))}$$ where $S^{\text{pos}}$ denotes same-class nodes, $S^{\text{neg}}$ denotes different-class nodes, and $x^{\text{aug}}$ is drawn from augmented views.

Inter-domain Disentanglement: Current-domain node embeddings are pushed away from prototype embeddings of previous domains (obtained via clustering): $$\mathcal{L}_{\text{inter}} = \frac{1}{|G_i|} \sum_{j=1}^{|G_i|} \sum_{k=1}^{|P|} \frac{1}{\|x_j - P_k\|_2^2 + \epsilon}$$ Minimizing this objective encourages sufficient separation between domains in the embedding space.

Module 2: Deviation-Free Knowledge Preservation

Core Idea: Decouple the classifier from the embedding model by replacing gradient-based updates with a closed-form ridge regression solution, thereby avoiding decision boundary deviations caused by backpropagation.

The optimal classifier weights after the $i$-th incremental domain are: $$W_i = (X_{(1:i)}^T X_{(1:i)} + \lambda I)^{-1} X_{(1:i)}^T Y_{(1:i)}$$

Since historical data is inaccessible, a recursive update is applied: $$W_i = [W_{i-1} - M_i X_i^T X_i W_{i-1} \| M_i X_i^T Y_i]$$ $$M_i = M_{i-1} - M_{i-1} X_i^T (I + X_i M_{i-1} X_i^T)^{-1} X_i M_{i-1}$$

This guarantees an exact update equivalent to the full-data closed-form solution, requiring no storage of any historical graph data.

Module 3: Domain-aware Distribution Discrimination

When the domain of a test graph is unknown, it must be matched to the correct domain-specific LoRA module. The procedure is: 1. Map features to a high-dimensional space using a randomly initialized and frozen GNN (to separate prototypes of similar domains). 2. Determine the domain of the test graph via nearest-prototype matching: $$c_{\text{test}} = \arg\max_k \exp(-\|D_{\text{test}} - D_k\|_2^2)$$

Loss & Training¶

The overall optimization objective is: $$\mathcal{L} = \gamma_1 \mathcal{L}_{\text{intra}} + \gamma_2 \mathcal{L}_{\text{inter}}$$

Note: The decision module (ridge regression) is not updated through backpropagation of $\mathcal{L}$; instead, it is computed directly via the closed-form solution after embedding learning is complete. This decoupled design is key to preventing decision boundary deviations.

Key Experimental Results¶

Main Results¶

Average results over 6 domain sequences under Domain-IL:

Method	Group 1 AA↑	Group 3 AA↑	Group 5 AA↑
Fine-Tune	23.9	20.9	19.7
Joint (upper bound)	66.6	78.0	74.5
EWC	23.3	20.8	20.6
ER-GNN	23.3	28.7	24.8
DeLoMe	49.3	70.2	63.2
PDGNNs	52.4	65.5	64.3
TPP	52.6	57.1	56.7
GraphKeeper	69.2	80.6	75.5

GraphKeeper surpasses the second-best method by 6.5%–16.6% and exceeds the Joint baseline (which has access to all historical data).

Integration with Graph Foundation Models (few-shot Domain-IL):

Method	Group 1 AA↑	Group 3 AA↑
GCOPE (original)	20.6	13.2
GCOPE + GraphKeeper	significant gain	significant gain
MDGPT (original)	low AA / high AF	low AA / high AF
MDGPT + GraphKeeper	high AA / low AF	high AA / low AF

Ablation Study¶

Ablation of individual modules (inferred from paper analysis): - Removing inter-domain disentanglement → increased domain embedding confusion, significant performance drop. - Removing domain-specific LoRA → uncontrolled embedding shifts, severe degradation. - Replacing ridge regression with gradient-based classifier → decision boundary deviation, aggravated forgetting. - Removing high-dimensional random projection → domain prototype confusion, increased domain misclassification at test time.

Key Findings¶

Existing GIL methods fail comprehensively under Domain-IL: EWC, GEM, LWF, and similar methods show negligible improvement over Fine-Tune.
GraphKeeper surpasses the Joint upper bound: This indicates that a single GNN struggles to effectively fuse knowledge from multiple domains, making parameter isolation necessary.
Seamless integration with GFMs: GraphKeeper endows pretrained graph foundation models with continual learning capability while preserving their few-shot advantages.
Near-zero forgetting: The AF metric approaches 0, far outperforming all baselines including those with memory replay.
The relatively stronger performance of DeLoMe/PDGNNs relies on SGC/APPNP backbones (trading plasticity for stability); replacing them with GCN leads to significant performance degradation.

Highlights & Insights¶

Novel problem formulation: The first systematic study of Graph Domain-Incremental Learning (Domain-IL), distinguishing it from conventional Task-IL/Class-IL settings.
Thorough disentanglement analysis: Forgetting is decomposed into two orthogonal dimensions—embedding shifts and decision boundary deviations—with targeted solutions for each.
Elegant application of closed-form ridge regression: The recursive update formula guarantees exact equivalence to the full-data solution without storing any historical data, achieving $O(1)$ space complexity.
High-dimensional random projection for domain discrimination: Simple yet effective, leveraging random projections to separate domain prototypes.
Promising integration with GFMs: Provides a viable path toward continual updating of graph foundation models.

Limitations & Future Work¶

Each new domain requires an additional LoRA module, incurring linearly growing storage and inference overhead as the number of domains increases.
Domain prototypes are obtained via clustering; clustering quality directly affects inter-domain disentanglement performance.
Domain-aware discrimination relies on prototype distances and may fail when domain distributions overlap substantially.
Feature alignment via truncated SVD inevitably incurs information loss.
Validation is limited to node classification; graph-level tasks (graph classification, link prediction) are not addressed.

Graph Incremental Learning: Representative methods such as SSM and ER-GNN are shown to fail under Domain-IL in this work.
Multi-domain Graph Pretraining: GFMs such as GCOPE and MDGPT are augmented with incremental learning capability through GraphKeeper.
LoRA for Graph Learning: The domain-specific LoRA design draws inspiration from continual learning approaches in NLP.
Insights: The recursive ridge regression knowledge preservation mechanism may be applicable to other incremental learning scenarios that admit closed-form updates, such as incremental updates in recommender systems.

Rating¶

Novelty: ⭐⭐⭐⭐⭐ — First to define and systematically address Graph Domain-Incremental Learning.
Technical Depth: ⭐⭐⭐⭐ — Each of the three modules is theoretically grounded; the recursive update derivation is rigorous.
Experimental Thoroughness: ⭐⭐⭐⭐ — 15 datasets, multiple domain sequences, extensive baseline comparisons, and GFM integration experiments.
Practicality: ⭐⭐⭐⭐ — Directly integrable with existing GFMs; code is publicly available.
Overall: ⭐⭐⭐⭐

GraphKeeper: Graph Domain-Incremental Learning via Knowledge Disentanglement and Preservation¶

TL;DR¶

Background & Motivation¶

Method¶

Overall Architecture¶

Key Designs¶

Loss & Training¶

Key Experimental Results¶

Main Results¶

Ablation Study¶

Key Findings¶

Highlights & Insights¶

Limitations & Future Work¶

Rating¶

Background & Motivation¶

Core Problem¶

Method¶

Key Experimental Results¶

Highlights & Insights¶

Limitations & Future Work¶

Inspiration & Connections¶

Rating¶

GraphKeeper: Graph Domain-Incremental Learning via Knowledge Disentanglement and Preservation¶

TL;DR¶

Background & Motivation¶

Method¶

Overall Architecture¶

Key Designs¶

Loss & Training¶

Key Experimental Results¶

Main Results¶

Ablation Study¶

Key Findings¶

Highlights & Insights¶

Limitations & Future Work¶

Related Work & Insights¶

Rating¶

Background & Motivation¶

Core Problem¶

Method¶

Key Experimental Results¶

Highlights & Insights¶

Limitations & Future Work¶

Related Work & Insights¶

Inspiration & Connections¶

Rating¶

Related Papers¶