Towards Effective, Stealthy, and Persistent Backdoor Attacks Targeting Graph Foundation Models¶
Conference: AAAI2026 arXiv: 2511.17982 Code: RingBDStack/GFM-BA Area: AI Security Keywords: backdoor attack, graph foundation model, GNN security, trigger generation, adversarial ML
TL;DR¶
This paper proposes GFM-BA, the first systematic backdoor attack method targeting the pre-training phase of Graph Foundation Models (GFMs). It addresses three core challenges — effectiveness, stealthiness, and persistence — through three modules: label-free trigger association, node-adaptive trigger generation, and persistent backdoor anchoring.
Background & Motivation¶
GFMs are pre-trained on multi-domain graph data and subsequently adapted to downstream tasks; users commonly deploy open-source pre-trained models directly. This creates a realistic attack surface: an adversary who controls the pre-training stage can inject a backdoor and release the poisoned model.
Fundamental differences between traditional GNN backdoor attacks and the GFM setting:
| Condition | Traditional GNN | GFM |
|---|---|---|
| Downstream labels available | ✓ | ✗ |
| Same-domain train/inference | ✓ | ✗ (cross-domain) |
| Fixed model parameters | ✓ | ✗ (downstream fine-tuning) |
These differences give rise to three core challenges:
Effectiveness: Downstream labels are unavailable at pre-training time — how can triggers be designed to induce targeted misclassification?
Stealthiness: Node feature distributions vary substantially across domains, making fixed triggers easily detectable by anomaly detection.
Persistence: Downstream fine-tuning may erase the injected backdoor behavior (backdoor forgetting).
Limitations of prior work: GCBA requires downstream labels; CrossBA cannot control the target label and degrades to an adversarial evasion attack.
Method¶
Module 1: Label-Free Trigger Association¶
- Node embeddings of the pre-training graph are extracted using the pre-trained GNN, and \(k\) prototype embeddings are selected via Farthest Point Sampling (FPS).
- The greedy strategy of FPS ensures prototypes are spread across the embedding space. Proposition 1 theoretically proves that when inter-class separation is sufficiently large, FPS is more likely to cover multiple downstream classes.
- At downstream injection time, the attacker identifies the prototype corresponding to the desired target label through a small number of probe queries.
Module 2: Node-Adaptive Trigger Generator¶
- An MLP dynamically generates trigger features from the target node feature \(\mathbf{x}_i\) and target embedding \(\mathbf{e}_j\): \(\mathbf{x}_{ij}^{tri} = \text{MLP}([\mathbf{x}_i \| \mathbf{e}_j])\)
- The trigger is designed as a 3-node fully connected subgraph inserted into the target node's neighborhood.
- Dual-objective optimization: \(\mathcal{L}_{eff}\) aligns the triggered node's embedding with the target prototype; \(\mathcal{L}_{ste}\) enforces similarity between trigger features and target node features to preserve graph homophily.
- Notably, pre-trained model parameters are not modified; the method exploits latent backdoor logic already present in the encoder.
Module 3: Persistent Backdoor Anchoring¶
- Empirical observation: the majority of pre-trained parameters change minimally during downstream fine-tuning.
- Graph mixup is used to synthesize cross-domain graphs that simulate potential downstream distributions.
- Fine-tuning-sensitive parameters are identified via model-pruning-based importance estimation.
- Random perturbations \(\theta_k \leftarrow \theta_k + \epsilon|\theta_k|\) are applied to sensitive parameters, and the trigger generator is trained to remain effective under such perturbations.
- Persistence loss: \(\mathcal{L}_{per} = \text{Var}(\{\mathcal{L}_{eff}^j\}) + \text{Mean}(\{\mathcal{L}_{eff}^j\})\)
Key Experimental Results¶
Attack Effectiveness (ASR %, Target-Controlled Setting)¶
| Method | Cora | CiteSeer | PubMed | Photo | Computers |
|---|---|---|---|---|---|
| GCBA_M (GCOPE) | 4.77 | 5.98 | 21.65 | 3.48 | 4.62 |
| CrossBA (GCOPE) | 14.29 | 16.67 | 33.33 | 9.25 | 7.98 |
| GFM-BA (GCOPE) | 90.40 | 89.06 | 100.00 | 84.53 | 78.54 |
| CrossBA (SAMGPT) | 13.61 | 16.67 | 33.33 | 12.10 | 9.20 |
| GFM-BA (SAMGPT) | 100.00 | 100.00 | 100.00 | 99.80 | 100.00 |
Target-Controlled ASR exceeds the strongest baseline (CrossBA) by 66–91%.
Stealthiness (ASR After Edge Purification Defense)¶
GFM-BA maintains high ASR after edge purification (100% on GCOPE), outperforming baselines by an average of 36.81% (GCOPE), 19.98% (MDGPT), and 36.73% (SAMGPT), with no degradation in clean accuracy.
Persistence (ASR Drop After Fine-Tuning)¶
| Method | Cora Drop | Photo Drop | Computers Drop |
|---|---|---|---|
| CrossBA (SAMGPT) | ↓4.74 | ↓9.40 | ↓0.60 |
| GFM-BA (SAMGPT) | ↓1.34 | ↓4.00 | ↓1.40 |
| CrossBA (MDGPT) | ↓1.36 | ↓4.60 | ↓2.40 |
| GFM-BA (MDGPT) | ↓0.68 | ↓0.60 | ↓0.80 |
ASR degradation after fine-tuning is minimal (mostly <2%), demonstrating substantially superior persistence over baselines.
Highlights & Insights¶
- Label-free attack paradigm: Selecting prototype embeddings via FPS bypasses the dependency on downstream labels, representing a key breakthrough for GFM backdoor attacks.
- Adaptive trigger generation: The node-adaptive design preserves graph homophily, significantly enhancing stealthiness.
- No modification to model parameters: The method exploits latent logic in the pre-trained encoder without affecting clean accuracy.
- Theoretical grounding: Propositions 1 and 2 provide theoretical foundations for FPS coverage and parameter-insensitive anchoring, respectively.
Limitations & Future Work¶
- Evaluation is limited to node classification; graph classification and link prediction scenarios are not addressed.
- FPS prototype coverage may degrade under highly imbalanced class distributions.
- The attack assumes the adversary can perform a small number of downstream probe queries to match prototypes to labels, which may be infeasible in certain settings.
- Defense evaluation relies solely on simple edge purification; stronger defenses such as spectral filtering and model pruning are not assessed.
- The optimality of the fixed 3-node trigger structure is not investigated.
Rating¶
- Novelty: ⭐⭐⭐⭐⭐ — First systematic treatment of the three core GFM backdoor attack challenges; the label-free design is a genuine breakthrough.
- Experimental Thoroughness: ⭐⭐⭐⭐ — 5 datasets × 3 victim GFMs × 3 baselines, with ablation studies and hyperparameter analysis.
- Writing Quality: ⭐⭐⭐⭐ — Problem motivation and challenge analysis are clearly articulated; methodological descriptions are rigorous.
- Value: ⭐⭐⭐⭐ — Exposes security vulnerabilities in GFMs and advances research in trustworthy AI.