Skip to content

Towards Effective, Stealthy, and Persistent Backdoor Attacks Targeting Graph Foundation Models

Conference: AAAI2026 arXiv: 2511.17982 Code: RingBDStack/GFM-BA Area: AI Security Keywords: backdoor attack, graph foundation model, GNN security, trigger generation, adversarial ML

TL;DR

This paper proposes GFM-BA, the first systematic backdoor attack method targeting the pre-training phase of Graph Foundation Models (GFMs). It addresses three core challenges — effectiveness, stealthiness, and persistence — through three modules: label-free trigger association, node-adaptive trigger generation, and persistent backdoor anchoring.

Background & Motivation

GFMs are pre-trained on multi-domain graph data and subsequently adapted to downstream tasks; users commonly deploy open-source pre-trained models directly. This creates a realistic attack surface: an adversary who controls the pre-training stage can inject a backdoor and release the poisoned model.

Fundamental differences between traditional GNN backdoor attacks and the GFM setting:

Condition Traditional GNN GFM
Downstream labels available
Same-domain train/inference ✗ (cross-domain)
Fixed model parameters ✗ (downstream fine-tuning)

These differences give rise to three core challenges:

Effectiveness: Downstream labels are unavailable at pre-training time — how can triggers be designed to induce targeted misclassification?

Stealthiness: Node feature distributions vary substantially across domains, making fixed triggers easily detectable by anomaly detection.

Persistence: Downstream fine-tuning may erase the injected backdoor behavior (backdoor forgetting).

Limitations of prior work: GCBA requires downstream labels; CrossBA cannot control the target label and degrades to an adversarial evasion attack.

Method

Module 1: Label-Free Trigger Association

  • Node embeddings of the pre-training graph are extracted using the pre-trained GNN, and \(k\) prototype embeddings are selected via Farthest Point Sampling (FPS).
  • The greedy strategy of FPS ensures prototypes are spread across the embedding space. Proposition 1 theoretically proves that when inter-class separation is sufficiently large, FPS is more likely to cover multiple downstream classes.
  • At downstream injection time, the attacker identifies the prototype corresponding to the desired target label through a small number of probe queries.

Module 2: Node-Adaptive Trigger Generator

  • An MLP dynamically generates trigger features from the target node feature \(\mathbf{x}_i\) and target embedding \(\mathbf{e}_j\): \(\mathbf{x}_{ij}^{tri} = \text{MLP}([\mathbf{x}_i \| \mathbf{e}_j])\)
  • The trigger is designed as a 3-node fully connected subgraph inserted into the target node's neighborhood.
  • Dual-objective optimization: \(\mathcal{L}_{eff}\) aligns the triggered node's embedding with the target prototype; \(\mathcal{L}_{ste}\) enforces similarity between trigger features and target node features to preserve graph homophily.
  • Notably, pre-trained model parameters are not modified; the method exploits latent backdoor logic already present in the encoder.

Module 3: Persistent Backdoor Anchoring

  • Empirical observation: the majority of pre-trained parameters change minimally during downstream fine-tuning.
  • Graph mixup is used to synthesize cross-domain graphs that simulate potential downstream distributions.
  • Fine-tuning-sensitive parameters are identified via model-pruning-based importance estimation.
  • Random perturbations \(\theta_k \leftarrow \theta_k + \epsilon|\theta_k|\) are applied to sensitive parameters, and the trigger generator is trained to remain effective under such perturbations.
  • Persistence loss: \(\mathcal{L}_{per} = \text{Var}(\{\mathcal{L}_{eff}^j\}) + \text{Mean}(\{\mathcal{L}_{eff}^j\})\)

Key Experimental Results

Attack Effectiveness (ASR %, Target-Controlled Setting)

Method Cora CiteSeer PubMed Photo Computers
GCBA_M (GCOPE) 4.77 5.98 21.65 3.48 4.62
CrossBA (GCOPE) 14.29 16.67 33.33 9.25 7.98
GFM-BA (GCOPE) 90.40 89.06 100.00 84.53 78.54
CrossBA (SAMGPT) 13.61 16.67 33.33 12.10 9.20
GFM-BA (SAMGPT) 100.00 100.00 100.00 99.80 100.00

Target-Controlled ASR exceeds the strongest baseline (CrossBA) by 66–91%.

Stealthiness (ASR After Edge Purification Defense)

GFM-BA maintains high ASR after edge purification (100% on GCOPE), outperforming baselines by an average of 36.81% (GCOPE), 19.98% (MDGPT), and 36.73% (SAMGPT), with no degradation in clean accuracy.

Persistence (ASR Drop After Fine-Tuning)

Method Cora Drop Photo Drop Computers Drop
CrossBA (SAMGPT) ↓4.74 ↓9.40 ↓0.60
GFM-BA (SAMGPT) ↓1.34 ↓4.00 ↓1.40
CrossBA (MDGPT) ↓1.36 ↓4.60 ↓2.40
GFM-BA (MDGPT) ↓0.68 ↓0.60 ↓0.80

ASR degradation after fine-tuning is minimal (mostly <2%), demonstrating substantially superior persistence over baselines.

Highlights & Insights

  • Label-free attack paradigm: Selecting prototype embeddings via FPS bypasses the dependency on downstream labels, representing a key breakthrough for GFM backdoor attacks.
  • Adaptive trigger generation: The node-adaptive design preserves graph homophily, significantly enhancing stealthiness.
  • No modification to model parameters: The method exploits latent logic in the pre-trained encoder without affecting clean accuracy.
  • Theoretical grounding: Propositions 1 and 2 provide theoretical foundations for FPS coverage and parameter-insensitive anchoring, respectively.

Limitations & Future Work

  • Evaluation is limited to node classification; graph classification and link prediction scenarios are not addressed.
  • FPS prototype coverage may degrade under highly imbalanced class distributions.
  • The attack assumes the adversary can perform a small number of downstream probe queries to match prototypes to labels, which may be infeasible in certain settings.
  • Defense evaluation relies solely on simple edge purification; stronger defenses such as spectral filtering and model pruning are not assessed.
  • The optimality of the fixed 3-node trigger structure is not investigated.

Rating

  • Novelty: ⭐⭐⭐⭐⭐ — First systematic treatment of the three core GFM backdoor attack challenges; the label-free design is a genuine breakthrough.
  • Experimental Thoroughness: ⭐⭐⭐⭐ — 5 datasets × 3 victim GFMs × 3 baselines, with ablation studies and hyperparameter analysis.
  • Writing Quality: ⭐⭐⭐⭐ — Problem motivation and challenge analysis are clearly articulated; methodological descriptions are rigorous.
  • Value: ⭐⭐⭐⭐ — Exposes security vulnerabilities in GFMs and advances research in trustworthy AI.