Skip to content

StRap: Spatio-Temporal Pattern Retrieval for Out-of-Distribution Generalization

Conference: NeurIPS 2025 arXiv: 2505.19547 Code: None Area: Spatio-Temporal Forecasting / Graph Neural Networks Keywords: Spatio-Temporal Graph Neural Networks, OOD Generalization, Retrieval-Augmented Learning, Continual Learning, Pattern Memory Bank

TL;DR

This paper proposes StRap, a framework that constructs a multi-dimensional pattern memory bank comprising spatial, temporal, and spatio-temporal key-value pairs. At inference time, StRap retrieves historical patterns most similar to the current input and adaptively fuses them into the model representation, effectively addressing the Spatio-Temporal Out-Of-Distribution (STOOD) problem in streaming spatio-temporal data.

Background & Motivation

State of the Field

Background: Spatio-temporal graph neural networks (STGNNs) have achieved notable success in domains such as traffic forecasting and climate prediction, yet face severe generalization challenges under streaming data settings.

Limitations of Prior Work

Limitations of Prior Work: The STOOD problem encompasses three categories of distribution shift: spatial structural changes (node addition/deletion), temporal dynamic changes (trend drift), and joint spatio-temporal changes.

Root Cause

Key Challenge: Limitations of existing approaches:

Starting Point

Key Insight: Backbone methods (direct application / retraining): catastrophic forgetting.

Supplementary Notes

Supplementary Notes: Architecture-based methods (e.g., TrafficStream): difficulty balancing stability and plasticity.

Supplementary Notes

Supplementary Notes: Regularization methods (e.g., EWC): over-constrain adaptability.

Supplementary Notes

Supplementary Notes: Replay methods: unable to distinguish relevant from irrelevant historical knowledge.

Supplementary Notes

Supplementary Notes: Fundamental question: How can one identify the most valuable portion of historical knowledge for the current prediction task?

Method

Overall Architecture

StRap follows an "explicit storage + retrieval-based fusion" paradigm:

  1. Subgraph Partitioning: Decompose the original spatio-temporal graph into a spatial subgraph \(\mathcal{G}_S\), a temporal subgraph \(\mathcal{G}_T\), and a spatio-temporal subgraph \(\mathcal{G}_{ST}\).
  2. Multi-Dimensional Pattern Bank Construction: Build a key-value memory bank \(\mathcal{B}_S, \mathcal{B}_T, \mathcal{B}_{ST}\) for each subgraph type.
  3. Similarity-Based Retrieval + Adaptive Fusion: At inference time, match the current input against stored patterns and adaptively fuse the retrieved representations into the current encoding.

Key Designs

  1. Multi-Dimensional Pattern Key Generation:

    • Function: Design discriminative keys for each of the three subgraph types (spatial, temporal, spatio-temporal).
    • Mechanism:
      • Spatial key \(\mathbf{k}_S\): node degree statistics + clustering coefficients + shortest path statistics + Forman-Ricci curvature.
      • Temporal key \(\mathbf{k}_T\): statistical moments + spectral components + wavelet transform + autocorrelation + entropy.
      • Spatio-temporal key: interaction features derived from spatial and temporal keys.
    • Design Motivation: Keys must be sufficiently discriminative yet robust to distribution shift; geometric and topological features exhibit greater stability than learned features.
  2. Adaptive Fusion and Knowledge-Balanced Training:

    • Function: Retrieved pattern values are injected into the model via a plug-and-play prompting mechanism.
    • Mechanism: A knowledge balancing objective is introduced to dynamically calibrate the relative influence of historical patterns and current observations.
    • Design Motivation: Over-reliance on historical patterns leads to overfitting past cases, while over-reliance on current observations foregoes the information gain from historical knowledge.

Loss & Training

  • Training phase: Construct and optimize the pattern memory bank; update key-value pairs based on retrieved values.
  • Inference phase: Retrieve the most relevant patterns via similarity matching and fuse them into the STGNN backbone through prompting.
  • Knowledge balancing objective: Coordinate new information with retrieved knowledge to prevent catastrophic forgetting.
  • StRap is a plug-and-play framework compatible with any STGNN backbone.

Key Experimental Results

Main Results (Table)

Method Dataset MAE RMSE Relative Improvement
Pretrain Streaming Traffic Baseline Baseline -
TrafficStream Streaming Traffic - - +5–8%
EWC Streaming Traffic - - +3–5%
StRap Streaming Traffic Best Best +10–15%

Ablation Study

  • Removing the spatial pattern bank: notable performance degradation under spatial distribution shift.
  • Removing the temporal pattern bank: most severe degradation under temporal distribution shift.
  • Removing the knowledge balancing objective: catastrophic forgetting observed across multiple consecutive data segments.
  • StRap yields consistent improvements across different backbones (GCN, GAT, STGCN).

Key Findings

  • StRap consistently outperforms state-of-the-art baselines across all tested streaming graph datasets.
  • Explicit pattern storage retains more historical information than implicit parametric memory.
  • Generalization is maintained even without task-specific fine-tuning.

Highlights & Insights

  • Novel combination of retrieval augmentation and spatio-temporal learning: the first work to systematically introduce RAG-style thinking into continual learning for STGNNs.
  • The three-dimensional subgraph partitioning design elegantly decouples spatial, temporal, and spatio-temporal distribution shifts.
  • The plug-and-play design allows StRap to serve as a general-purpose enhancement module attachable to existing STGNNs.
  • Theoretical analysis demonstrates the effectiveness of the pattern extraction and retrieval mechanisms under distribution shift.

Limitations & Future Work

  • The size of the pattern memory bank grows with accumulated data, necessitating pruning or compression strategies.
  • Key extraction involves graph topology computations (e.g., shortest paths, curvature), which may introduce efficiency bottlenecks for large graphs.
  • Cross-dataset and cross-domain transferability of the pattern bank remains unexplored.
  • Hyperparameter selection for temporal key features such as wavelet transforms may require domain expertise.
  • RAFT (retrieval-augmented time series forecasting) addresses only the temporal dimension; StRap extends this to the joint spatio-temporal setting.
  • TrafficStream employs a replay strategy but does not distinguish between useful and irrelevant historical samples.
  • At a high level, this work transfers the RAG paradigm from the LLM domain to spatio-temporal forecasting.

Rating

  • Theoretical Innovation: ⭐⭐⭐⭐
  • Experimental Thoroughness: ⭐⭐⭐⭐
  • Value: ⭐⭐⭐⭐
  • Writing Quality: ⭐⭐⭐⭐
  • Overall: ⭐⭐⭐⭐