Towards Anomaly-Aware Pre-Training and Fine-Tuning for Graph Anomaly Detection¶
Paper Information¶
- Conference: ICLR 2026
- arXiv: 2504.14250
- Code: https://github.com/Cloudy1225/APF
- Area: LLM Evaluation
- Keywords: GAD, pre-training & fine-tuning, Rayleigh quotient, homophily disparity, dual filters, gated fusion
TL;DR¶
This paper proposes the APF framework, which addresses the dual challenges of label scarcity and homophily disparity in graph anomaly detection through Rayleigh quotient-guided anomaly-aware pre-training and granularity-adaptive fine-tuning.
Background & Motivation¶
Core Problem¶
Graph anomaly detection (GAD) faces two key challenges:
Label scarcity: Annotation costs are high, and labeled nodes are extremely scarce in real-world scenarios.
Homophily disparity: This manifests at two levels — node-level (large variation in individual nodes' local homophily) and class-level (anomalous nodes consistently exhibit lower local homophily).
Limitations of Prior Work¶
- Generic graph pre-training strategies (DGI, GraphMAE) extract only task-agnostic semantics and fail to capture anomaly-relevant cues.
- Pseudo-label and synthetic sample methods are unstable under label scarcity.
- Globally uniform approaches (edge reweighting, spectral filtering) lack node-adaptive mechanisms.
Key Findings¶
Local homophily \(h_i = \frac{|v_j \in \mathcal{N}_i: y_i = y_j|}{|\mathcal{N}_i|}\) varies drastically across nodes, and the average local homophily of anomalous nodes \(h^a\) is consistently lower than that of normal nodes \(h^n\). Existing methods exhibit inconsistent performance across nodes grouped by local homophily.
Method¶
1. Anomaly-Aware Pre-Training¶
Label-Free Anomaly Indicator — Rayleigh Quotient¶
The Rayleigh quotient is employed as a label-free anomaly measure:
Rationale: The Rayleigh quotient measures the inconsistency between node attributes and local graph structure; anomalous nodes yield higher Rayleigh quotient values (spectral energy right-shift phenomenon).
For each node \(v_i\), MRQSampler extracts a 2-hop subgraph \(\mathcal{G}_i^{RQ}\) that maximizes the subgraph's Rayleigh quotient.
Dual-Filter Encoding¶
Learnable Chebyshev polynomial spectral filters are employed, comprising a low-pass and a high-pass filter:
- Low-pass encoder: Captures general semantic patterns \(\boldsymbol{Z}_L = f_{\theta_L}(g_L(\hat{\boldsymbol{L}})\boldsymbol{X})\)
- High-pass encoder: Captures subtle anomaly cues \(\boldsymbol{Z}_H = f_{\theta_H}(g_H(\hat{\boldsymbol{L}})\boldsymbol{X})\)
Pre-Training Objective¶
Mutual information maximization based on DGI, augmented with an anomaly-aware objective:
where \(\boldsymbol{s}_i^H\) is the anomaly-aware summary derived from the Rayleigh quotient subgraph.
2. Granularity-Adaptive Fine-Tuning¶
Gated Fusion Network¶
Node- and dimension-level adaptive fusion:
Fusion coefficients are generated by a lightweight gating network:
Parameter complexity is reduced from \(\mathcal{O}(n \times e)\) to \(\mathcal{O}((d+1) \times e)\).
Anomaly-Aware Regularization Loss¶
Encourages anomalous nodes to retain more high-pass (anomaly-relevant) information:
where \(p^a \leq p^n\), guiding anomalous nodes to rely more heavily on high-pass representations.
3. Theoretical Guarantees¶
Theorem 1: Under the Anomaly Stochastic Block Model (ASBM), when low-pass and high-pass filters are applied to homophilic and heterophilic nodes respectively, there exist parameters such that all nodes are linearly separable with probability \(1-o_d(1)\).
Key Experimental Results¶
Experimental Setup¶
- 10 GADBench datasets: Reddit, Weibo, Amazon, Yelp, T-Finance, Elliptic, etc.
- Semi-supervised setting: Only 100 labeled nodes (20 anomalous + 80 normal)
- Metrics: AUPRC, AUROC, Rec@K
Main Results (AUPRC)¶
| Model | Amazon | T-Fin | Average | ||
|---|---|---|---|---|---|
| GCN | 4.2 | 86.0 | 32.8 | 60.5 | 29.3 |
| BWGNN | 4.2 | 80.6 | 81.7 | 60.9 | - |
| BernNet | 4.9 | 66.6 | 81.2 | 51.8 | 31.1 |
| APF | Best/2nd | Best/2nd | Best/2nd | Best/2nd | Highest |
Ablation Study¶
- Rayleigh quotient-guided subgraph selection substantially improves anomaly-awareness.
- The dual-filter design outperforms single-filter alternatives.
- The gated fusion network surpasses direct parameter optimization.
- Anomaly-aware regularization yields more pronounced gains on datasets with larger class-level homophily disparity.
Highlights & Insights¶
- Novel label-free anomaly measure: The Rayleigh quotient serves as an anomaly signal during pre-training.
- Dual-granularity design: Adaptive mechanisms operate at the node level during pre-training and at the node-plus-dimension level during fine-tuning.
- Theoretical support: Linear separability is proven under the ASBM model.
- Comprehensive evaluation: Validated across 10 datasets.
Limitations & Future Work¶
- Pre-training relies on the DGI framework, which may not be optimal for all scenarios.
- The Rayleigh quotient assumes anomalies manifest as spectral energy right-shift, potentially limiting sensitivity to certain anomaly types.
- The values of \(p^a\) and \(p^n\) require manual specification.
- Optimization of the regularization loss may be unstable when labeled data is extremely scarce.
Related Work & Insights¶
- Graph anomaly detection: PCGNN, AMNet, BWGNN — global homophily processing
- Graph pre-training: DGI, GraphMAE, BGRL — task-agnostic semantics
- Spectral methods: BernNet, ChebNet — learnable spectral filters
Rating¶
- Novelty: ⭐⭐⭐⭐ — The combination of Rayleigh quotient and dual-filter pre-training is highly insightful.
- Experimental Thoroughness: ⭐⭐⭐⭐⭐ — Comprehensive evaluation across 10 datasets.
- Writing Quality: ⭐⭐⭐⭐ — Theory and practice are tightly integrated.
- Value: ⭐⭐⭐⭐ — Practically valuable in label-scarce settings.