Towards Anomaly-Aware Pre-Training and Fine-Tuning for Graph Anomaly Detection¶

Paper Information¶

Conference: ICLR 2026
arXiv: 2504.14250
Code: https://github.com/Cloudy1225/APF
Area: LLM Evaluation
Keywords: GAD, pre-training & fine-tuning, Rayleigh quotient, homophily disparity, dual filters, gated fusion

TL;DR¶

This paper proposes the APF framework, which addresses the dual challenges of label scarcity and homophily disparity in graph anomaly detection through Rayleigh quotient-guided anomaly-aware pre-training and granularity-adaptive fine-tuning.

Background & Motivation¶

Core Problem¶

Graph anomaly detection (GAD) faces two key challenges:

Label scarcity: Annotation costs are high, and labeled nodes are extremely scarce in real-world scenarios.

Homophily disparity: This manifests at two levels — node-level (large variation in individual nodes' local homophily) and class-level (anomalous nodes consistently exhibit lower local homophily).

Limitations of Prior Work¶

Generic graph pre-training strategies (DGI, GraphMAE) extract only task-agnostic semantics and fail to capture anomaly-relevant cues.
Pseudo-label and synthetic sample methods are unstable under label scarcity.
Globally uniform approaches (edge reweighting, spectral filtering) lack node-adaptive mechanisms.

Key Findings¶

Local homophily \(h_i = \frac{|v_j \in \mathcal{N}_i: y_i = y_j|}{|\mathcal{N}_i|}\) varies drastically across nodes, and the average local homophily of anomalous nodes \(h^a\) is consistently lower than that of normal nodes \(h^n\). Existing methods exhibit inconsistent performance across nodes grouped by local homophily.

Method¶

1. Anomaly-Aware Pre-Training¶

Label-Free Anomaly Indicator — Rayleigh Quotient¶

The Rayleigh quotient is employed as a label-free anomaly measure:

\[RQ(\boldsymbol{x}, \boldsymbol{L}) = \frac{\boldsymbol{x}^T \boldsymbol{L} \boldsymbol{x}}{\boldsymbol{x}^T \boldsymbol{x}} = \frac{\sum_{i,j} A_{ij}(x_j - x_i)^2}{\sum_{i=1}^n x_i^2}\]

Rationale: The Rayleigh quotient measures the inconsistency between node attributes and local graph structure; anomalous nodes yield higher Rayleigh quotient values (spectral energy right-shift phenomenon).

For each node \(v_i\), MRQSampler extracts a 2-hop subgraph \(\mathcal{G}_i^{RQ}\) that maximizes the subgraph's Rayleigh quotient.

Dual-Filter Encoding¶

Learnable Chebyshev polynomial spectral filters are employed, comprising a low-pass and a high-pass filter:

\[g_L(\hat{\boldsymbol{L}}) = \sum_{k=0}^{K} w_k^L T_k(\hat{\boldsymbol{L}}), \quad g_H(\hat{\boldsymbol{L}}) = \sum_{k=0}^{K} w_k^H T_k(\hat{\boldsymbol{L}})\]

Low-pass encoder: Captures general semantic patterns \(\boldsymbol{Z}_L = f_{\theta_L}(g_L(\hat{\boldsymbol{L}})\boldsymbol{X})\)
High-pass encoder: Captures subtle anomaly cues \(\boldsymbol{Z}_H = f_{\theta_H}(g_H(\hat{\boldsymbol{L}})\boldsymbol{X})\)

Pre-Training Objective¶

Mutual information maximization based on DGI, augmented with an anomaly-aware objective:

\[\mathcal{L}_{pt} = -\frac{1}{n}\sum_i \left[\log\mathcal{D}(\boldsymbol{Z}_i^L, \boldsymbol{s}^L) + \log(1-\mathcal{D}(\tilde{\boldsymbol{Z}}_i^L, \boldsymbol{s}^L))\right] - \frac{1}{n}\sum_i \left[\log\mathcal{D}(\boldsymbol{Z}_i^H, \boldsymbol{s}_i^H) + \log(1-\mathcal{D}(\tilde{\boldsymbol{Z}}_i^H, \boldsymbol{s}_i^H))\right]\]

where \(\boldsymbol{s}_i^H\) is the anomaly-aware summary derived from the Rayleigh quotient subgraph.

2. Granularity-Adaptive Fine-Tuning¶

Gated Fusion Network¶

Node- and dimension-level adaptive fusion:

\[\boldsymbol{Z} = \boldsymbol{C} \odot \boldsymbol{Z}_L + (1-\boldsymbol{C}) \odot \boldsymbol{Z}_H\]

Fusion coefficients are generated by a lightweight gating network:

\[\boldsymbol{C} = \sigma(\boldsymbol{X}\boldsymbol{W}_c + \boldsymbol{b}_c)\]

Parameter complexity is reduced from \(\mathcal{O}(n \times e)\) to \(\mathcal{O}((d+1) \times e)\).

Anomaly-Aware Regularization Loss¶

Encourages anomalous nodes to retain more high-pass (anomaly-relevant) information:

\[\mathcal{L}_{reg} = -\frac{1}{|\mathcal{V}^L|}\sum_{v_i, y_i=1}\left(p^a\log c_i + (1-p^a)\log(1-c_i)\right) - \frac{1}{|\mathcal{V}^L|}\sum_{v_i, y_i=0}\left(p^n\log c_i + (1-p^n)\log(1-c_i)\right)\]

where \(p^a \leq p^n\), guiding anomalous nodes to rely more heavily on high-pass representations.

3. Theoretical Guarantees¶

Theorem 1: Under the Anomaly Stochastic Block Model (ASBM), when low-pass and high-pass filters are applied to homophilic and heterophilic nodes respectively, there exist parameters such that all nodes are linearly separable with probability \(1-o_d(1)\).

Key Experimental Results¶

Experimental Setup¶

10 GADBench datasets: Reddit, Weibo, Amazon, Yelp, T-Finance, Elliptic, etc.
Semi-supervised setting: Only 100 labeled nodes (20 anomalous + 80 normal)
Metrics: AUPRC, AUROC, Rec@K

Main Results (AUPRC)¶

Model	Reddit	Weibo	Amazon	T-Fin	Average
GCN	4.2	86.0	32.8	60.5	29.3
BWGNN	4.2	80.6	81.7	60.9	-
BernNet	4.9	66.6	81.2	51.8	31.1
APF	Best/2nd	Best/2nd	Best/2nd	Best/2nd	Highest

Ablation Study¶

Rayleigh quotient-guided subgraph selection substantially improves anomaly-awareness.
The dual-filter design outperforms single-filter alternatives.
The gated fusion network surpasses direct parameter optimization.
Anomaly-aware regularization yields more pronounced gains on datasets with larger class-level homophily disparity.

Highlights & Insights¶

Novel label-free anomaly measure: The Rayleigh quotient serves as an anomaly signal during pre-training.
Dual-granularity design: Adaptive mechanisms operate at the node level during pre-training and at the node-plus-dimension level during fine-tuning.
Theoretical support: Linear separability is proven under the ASBM model.
Comprehensive evaluation: Validated across 10 datasets.

Limitations & Future Work¶

Pre-training relies on the DGI framework, which may not be optimal for all scenarios.
The Rayleigh quotient assumes anomalies manifest as spectral energy right-shift, potentially limiting sensitivity to certain anomaly types.
The values of \(p^a\) and \(p^n\) require manual specification.
Optimization of the regularization loss may be unstable when labeled data is extremely scarce.

Graph anomaly detection: PCGNN, AMNet, BWGNN — global homophily processing
Graph pre-training: DGI, GraphMAE, BGRL — task-agnostic semantics
Spectral methods: BernNet, ChebNet — learnable spectral filters

Rating¶

Novelty: ⭐⭐⭐⭐ — The combination of Rayleigh quotient and dual-filter pre-training is highly insightful.
Experimental Thoroughness: ⭐⭐⭐⭐⭐ — Comprehensive evaluation across 10 datasets.
Writing Quality: ⭐⭐⭐⭐ — Theory and practice are tightly integrated.
Value: ⭐⭐⭐⭐ — Practically valuable in label-scarce settings.