Sentient: Detecting APTs Via Capturing Indirect Dependencies and Behavioral Logic¶
Conference: AAAI 2026 arXiv: 2502.06521 Code: None Area: Graph Learning / Cybersecurity Keywords: APT Detection, Provenance Graph, Graph Transformer, Mamba, Behavioral Intent Analysis
TL;DR¶
This paper proposes Sentient, an APT detection method combining Graph Transformer pre-training and bidirectional Mamba2 intent analysis. Trained exclusively on benign data, it captures indirect dependencies, removes contextual noise, and correlates behavioral logic, achieving an average 44% reduction in false positive rate across three standard benchmarks.
Background & Motivation¶
- Background: Advanced Persistent Threats (APTs) are notoriously difficult to detect due to their stealthiness and complexity. Provenance graph-based methods represent the current state of the art, leveraging entity relationships in system audit logs to uncover attack traces.
- Limitations of Prior Work: (a) Missing indirect dependencies — GNN-based methods are constrained by the receptive field of neighborhood aggregation, failing to capture relationships between non-directly connected nodes; (b) Noise in complex scenarios — infected entities continue to perform numerous benign tasks, causing neighborhood aggregation to erroneously incorporate weakly related activities; (c) Missing behavioral logic correlation — isolated system behaviors exhibit contextual ambiguity (e.g., sshd writing a log appears normal in isolation), yet their combination reveals malicious intent.
- Key Challenge: GNN local aggregation cannot reach indirect dependencies, introduces noise through indiscriminate neighbor aggregation, and is unable to establish logical correlations between distant behaviors.
- Goal: Design a globally aware APT detection method capable of understanding behavioral logic.
- Key Insight: Employ global attention in Graph Transformers to capture indirect dependencies, construct denoised behavior sequences via random walks, and leverage bidirectional Mamba2 to mine logical correlations among behaviors.
- Core Idea: Graph Transformer for global node embeddings + bidirectional Mamba2 for intent logic over behavior sequences = addressing the three challenges of indirect dependencies, noise, and logical correlation.
Method¶
Overall Architecture¶
Five components: (1) Graph Construction — builds a provenance graph from system logs, initializing nodes with Word2Vec semantic encoding and Laplacian positional encoding; (2) Pre-training — a Graph Transformer reconstructs key node information to learn globally structured semantic embeddings; (3) Intent Analysis Module (IAM) — random walks construct behavior sequences, and bidirectional Mamba2 mines logical correlations; (4) Threat Detection — an MLP reconstructs behavioral actions, with behaviors whose reconstruction error exceeds a threshold flagged as malicious; (5) Attack Investigation — clusters behaviors with similar intent.
Key Designs¶
-
Graph Transformer Pre-training
- Function: Learns global node embeddings that capture indirect dependencies, circumventing the receptive field limitations of GNNs.
- Mechanism: The initial embedding \(h_i^0 = \sigma((A^0\alpha + a^0) + (B^0\beta + b^0))\) combines semantic encoding \(\alpha\) (Word2Vec) and positional encoding \(\beta\) (Laplacian eigenvectors). Multi-head attention allows each node to attend to all others in the graph (\(w_{ij} = \text{softmax}(Q h_i \cdot K h_j / \sqrt{d_k})\)), with residual connections and FFN producing final embeddings. The pre-training objective is node type reconstruction (weighted cross-entropy to handle class imbalance).
- Design Motivation: Attack behaviors in provenance graphs involve multi-hop relationships (e.g., file read → execution → network transmission), which GNNs require multiple layers to reach — incurring over-smoothing in deep settings. The global attention of Graph Transformers resolves this in a single pass.
-
Intent Analysis Module (IAM)
- Function: Mines logical correlations among behaviors in a denoised context to understand behavioral intent.
- Mechanism: Using pre-trained embeddings \(h\), random walks over the provenance graph construct behavior sequences \(\lambda_i = \{e_1, ..., e_W\}\), where each behavior \(e_t\) is represented as the concatenation of source and target node embeddings \([h_{\phi(e_t)}; h_{\psi(e_t)}]\). Random walks naturally build a target-node-centric local context, filtering irrelevant neighbors (denoising). Bidirectional Mamba2 then processes the sequence: \(\lambda^{\ell+1} = \mathbf{F}(\mathbf{E}(\lambda^\ell) + \mathcal{R}(\mathbf{E}(\mathcal{R}(\lambda^\ell))), \lambda^\ell)\), where \(\mathcal{R}\) denotes sequence reversal and \(\mathbf{E}\) denotes the Mamba2 state space model operation. Bidirectional processing ensures both forward and backward contextual logic are captured.
- Design Motivation: Isolated behaviors appear benign but reveal malicious intent only in combination. Mamba2's long-sequence modeling capability surpasses RNNs, and its linear complexity suits large-scale log processing. Bidirectional modeling is necessary because attack behaviors may depend on both preceding and subsequent context.
-
Threat Detection and Attack Investigation
- Function: Detects anomalies based on deviation from benign patterns and clusters attack behaviors into attack narratives.
- Mechanism: During training, key behavioral information (read/write/execute) is masked to learn benign behavioral reconstruction patterns. During inference, behaviors whose reconstruction error \(RE = \text{CrossEntropy}(\mathbf{P}(a_t), L(a_t))\) exceeds the threshold (mean + 1.5 standard deviations) are flagged as malicious. For attack investigation, the concatenation of behavioral intent embedding \(h_e\) with source/target node embeddings is clustered as \(C_k = \{e_i | \arg\min_k \|h_{behavior}^{(i)} - \mu_k\|^2\}\), merging alerts with similar intent to reduce analyst burden.
- Design Motivation: Training solely on benign data avoids the scarcity of attack samples. Reconstruction error naturally quantifies the degree of behavioral anomaly.
Loss & Training¶
The pre-training loss is weighted cross-entropy (node type reconstruction); the detection loss is cross-entropy (behavior type reconstruction). The anomaly threshold is set to mean + 1.5 standard deviations computed over the training period.
Key Experimental Results¶
Main Results¶
Results on Streamspot, Unicorn Wget, and DARPA E3 datasets:
| Dataset | Method | Precision | Recall | F-score | FPR |
|---|---|---|---|---|---|
| Streamspot | Threatrace | 98% | 99% | 98% | 0.4% |
| Streamspot | Sentient | 99% | 99% | 99% | 0.2% |
| Unicorn Wget | Threatrace | 93% | 98% | 95% | 7.4% |
| Unicorn Wget | Sentient | 96% | 99% | 97% | 4.1% |
| DARPA Cadets | Flash | 92% | 99% | 95% | 0.3% |
| DARPA Cadets | Slot | 94% | 96% | 95% | 0.2% |
| DARPA Cadets | Sentient | 96% | 99% | 97% | 0.2% |
| DARPA Theia | Flash | 91% | 99% | 95% | 0.8% |
| DARPA Theia | Sentient | 95% | 99% | 97% | 0.4% |
| DARPA Trace | Flash | 93% | 99% | 96% | 0.4% |
| DARPA Trace | Sentient | 97% | 99% | 98% | 0.2% |
Ablation Study¶
| Configuration | Precision Change | Notes |
|---|---|---|
| w/o Pre-training (PT) | −20.75% | Loss of indirect dependency information |
| w/o Intent Analysis (IAM) | −31.59% | Loss of behavioral logic correlation; largest impact |
| w/o Laplacian PE | −8.2% | Loss of topological positional information |
| w/o Semantic Encoding | −12.3% | Loss of node attribute semantics |
Key Findings¶
- IAM contributes the most — removing it causes a 31.59% drop in precision, underscoring the critical importance of behavioral logic correlation for APT detection.
- Advantages are most pronounced in complex scenarios (Unicorn Wget, DARPA Theia), where noise and indirect dependencies are more prevalent.
- Achieving state-of-the-art detection using only benign training data is a significant practical advantage for real-world deployment.
- Runtime overhead is acceptable: processing one day of logs requires only 63.6 seconds with a peak memory footprint of 2.01 GB.
Highlights & Insights¶
- Graph Transformer + Sequential SSM combination: Using Graph Transformer for global representation and Mamba2 for sequential logic correlation addresses long-range dependencies at both the graph and sequence levels. This combination strategy is transferable to other graph-plus-sequence tasks.
- Random walk as a denoising mechanism: Random walks naturally construct a target-node-centric context that filters irrelevant neighbors — an elegant denoising design.
- Clustering for attack investigation: Beyond anomaly detection, clustering behaviors with similar intent into "attack stories" substantially reduces the workload of security analysts.
Limitations & Future Work¶
- The anomaly threshold (mean + 1.5σ) is heuristically defined; an adaptive threshold may yield better results.
- The random walk sequence length \(W\) is fixed; adaptive length selection could offer greater flexibility.
- Robustness under concept drift (i.e., evolving system behavior patterns over time) has not been evaluated.
- The clustering method for attack investigation is relatively simple (K-means); more sophisticated clustering approaches could generate higher-quality attack narratives.
Related Work & Insights¶
- vs. Flash/Threatrace: Employ GNN (GraphSAGE) neighborhood aggregation, which fails to capture indirect dependencies and introduces noise. Sentient addresses this via global attention in Graph Transformers.
- vs. Slot: Uses graph reinforcement learning to adaptively select neighbors, but remains constrained by the GNN receptive field. Sentient bypasses the neighborhood aggregation paradigm entirely.
- vs. Atlas: Requires attack data for training; Sentient requires only benign data.
Rating¶
- Novelty: ⭐⭐⭐⭐ The combination of Graph Transformer, bidirectional Mamba2, and random-walk-based denoising is novel
- Experimental Thoroughness: ⭐⭐⭐⭐ Three datasets covering real and simulated attacks; complete ablation study
- Writing Quality: ⭐⭐⭐⭐ Problem definition is clear; challenges are illustrated intuitively with figures
- Value: ⭐⭐⭐⭐ Offers practical deployment value for real-world cybersecurity