Skip to content

Towards Anomaly-Aware Pre-Training and Fine-Tuning for Graph Anomaly Detection

Paper Information

TL;DR

This paper proposes the APF framework, which addresses the dual challenges of label scarcity and homophily disparity in graph anomaly detection through Rayleigh quotient-guided anomaly-aware pre-training and granularity-adaptive fine-tuning.

Background & Motivation

Core Problem

Graph anomaly detection (GAD) faces two key challenges:

Label scarcity: Annotation costs are high, and labeled nodes are extremely scarce in real-world scenarios.

Homophily disparity: This manifests at two levels — node-level (large variation in individual nodes' local homophily) and class-level (anomalous nodes consistently exhibit lower local homophily).

Limitations of Prior Work

  • Generic graph pre-training strategies (DGI, GraphMAE) extract only task-agnostic semantics and fail to capture anomaly-relevant cues.
  • Pseudo-label and synthetic sample methods are unstable under label scarcity.
  • Globally uniform approaches (edge reweighting, spectral filtering) lack node-adaptive mechanisms.

Key Findings

Local homophily \(h_i = \frac{|v_j \in \mathcal{N}_i: y_i = y_j|}{|\mathcal{N}_i|}\) varies drastically across nodes, and the average local homophily of anomalous nodes \(h^a\) is consistently lower than that of normal nodes \(h^n\). Existing methods exhibit inconsistent performance across nodes grouped by local homophily.

Method

1. Anomaly-Aware Pre-Training

Label-Free Anomaly Indicator — Rayleigh Quotient

The Rayleigh quotient is employed as a label-free anomaly measure:

\[RQ(\boldsymbol{x}, \boldsymbol{L}) = \frac{\boldsymbol{x}^T \boldsymbol{L} \boldsymbol{x}}{\boldsymbol{x}^T \boldsymbol{x}} = \frac{\sum_{i,j} A_{ij}(x_j - x_i)^2}{\sum_{i=1}^n x_i^2}\]

Rationale: The Rayleigh quotient measures the inconsistency between node attributes and local graph structure; anomalous nodes yield higher Rayleigh quotient values (spectral energy right-shift phenomenon).

For each node \(v_i\), MRQSampler extracts a 2-hop subgraph \(\mathcal{G}_i^{RQ}\) that maximizes the subgraph's Rayleigh quotient.

Dual-Filter Encoding

Learnable Chebyshev polynomial spectral filters are employed, comprising a low-pass and a high-pass filter:

\[g_L(\hat{\boldsymbol{L}}) = \sum_{k=0}^{K} w_k^L T_k(\hat{\boldsymbol{L}}), \quad g_H(\hat{\boldsymbol{L}}) = \sum_{k=0}^{K} w_k^H T_k(\hat{\boldsymbol{L}})\]
  • Low-pass encoder: Captures general semantic patterns \(\boldsymbol{Z}_L = f_{\theta_L}(g_L(\hat{\boldsymbol{L}})\boldsymbol{X})\)
  • High-pass encoder: Captures subtle anomaly cues \(\boldsymbol{Z}_H = f_{\theta_H}(g_H(\hat{\boldsymbol{L}})\boldsymbol{X})\)

Pre-Training Objective

Mutual information maximization based on DGI, augmented with an anomaly-aware objective:

\[\mathcal{L}_{pt} = -\frac{1}{n}\sum_i \left[\log\mathcal{D}(\boldsymbol{Z}_i^L, \boldsymbol{s}^L) + \log(1-\mathcal{D}(\tilde{\boldsymbol{Z}}_i^L, \boldsymbol{s}^L))\right] - \frac{1}{n}\sum_i \left[\log\mathcal{D}(\boldsymbol{Z}_i^H, \boldsymbol{s}_i^H) + \log(1-\mathcal{D}(\tilde{\boldsymbol{Z}}_i^H, \boldsymbol{s}_i^H))\right]\]

where \(\boldsymbol{s}_i^H\) is the anomaly-aware summary derived from the Rayleigh quotient subgraph.

2. Granularity-Adaptive Fine-Tuning

Gated Fusion Network

Node- and dimension-level adaptive fusion:

\[\boldsymbol{Z} = \boldsymbol{C} \odot \boldsymbol{Z}_L + (1-\boldsymbol{C}) \odot \boldsymbol{Z}_H\]

Fusion coefficients are generated by a lightweight gating network:

\[\boldsymbol{C} = \sigma(\boldsymbol{X}\boldsymbol{W}_c + \boldsymbol{b}_c)\]

Parameter complexity is reduced from \(\mathcal{O}(n \times e)\) to \(\mathcal{O}((d+1) \times e)\).

Anomaly-Aware Regularization Loss

Encourages anomalous nodes to retain more high-pass (anomaly-relevant) information:

\[\mathcal{L}_{reg} = -\frac{1}{|\mathcal{V}^L|}\sum_{v_i, y_i=1}\left(p^a\log c_i + (1-p^a)\log(1-c_i)\right) - \frac{1}{|\mathcal{V}^L|}\sum_{v_i, y_i=0}\left(p^n\log c_i + (1-p^n)\log(1-c_i)\right)\]

where \(p^a \leq p^n\), guiding anomalous nodes to rely more heavily on high-pass representations.

3. Theoretical Guarantees

Theorem 1: Under the Anomaly Stochastic Block Model (ASBM), when low-pass and high-pass filters are applied to homophilic and heterophilic nodes respectively, there exist parameters such that all nodes are linearly separable with probability \(1-o_d(1)\).

Key Experimental Results

Experimental Setup

  • 10 GADBench datasets: Reddit, Weibo, Amazon, Yelp, T-Finance, Elliptic, etc.
  • Semi-supervised setting: Only 100 labeled nodes (20 anomalous + 80 normal)
  • Metrics: AUPRC, AUROC, Rec@K

Main Results (AUPRC)

Model Reddit Weibo Amazon T-Fin Average
GCN 4.2 86.0 32.8 60.5 29.3
BWGNN 4.2 80.6 81.7 60.9 -
BernNet 4.9 66.6 81.2 51.8 31.1
APF Best/2nd Best/2nd Best/2nd Best/2nd Highest

Ablation Study

  1. Rayleigh quotient-guided subgraph selection substantially improves anomaly-awareness.
  2. The dual-filter design outperforms single-filter alternatives.
  3. The gated fusion network surpasses direct parameter optimization.
  4. Anomaly-aware regularization yields more pronounced gains on datasets with larger class-level homophily disparity.

Highlights & Insights

  1. Novel label-free anomaly measure: The Rayleigh quotient serves as an anomaly signal during pre-training.
  2. Dual-granularity design: Adaptive mechanisms operate at the node level during pre-training and at the node-plus-dimension level during fine-tuning.
  3. Theoretical support: Linear separability is proven under the ASBM model.
  4. Comprehensive evaluation: Validated across 10 datasets.

Limitations & Future Work

  1. Pre-training relies on the DGI framework, which may not be optimal for all scenarios.
  2. The Rayleigh quotient assumes anomalies manifest as spectral energy right-shift, potentially limiting sensitivity to certain anomaly types.
  3. The values of \(p^a\) and \(p^n\) require manual specification.
  4. Optimization of the regularization loss may be unstable when labeled data is extremely scarce.
  • Graph anomaly detection: PCGNN, AMNet, BWGNN — global homophily processing
  • Graph pre-training: DGI, GraphMAE, BGRL — task-agnostic semantics
  • Spectral methods: BernNet, ChebNet — learnable spectral filters

Rating

  • Novelty: ⭐⭐⭐⭐ — The combination of Rayleigh quotient and dual-filter pre-training is highly insightful.
  • Experimental Thoroughness: ⭐⭐⭐⭐⭐ — Comprehensive evaluation across 10 datasets.
  • Writing Quality: ⭐⭐⭐⭐ — Theory and practice are tightly integrated.
  • Value: ⭐⭐⭐⭐ — Practically valuable in label-scarce settings.