AAAI 2026 Autonomous Driving Traffic Flow Prediction Spatio-Temporal Graph Dynamic Graph Meta-Learning GCN GRU Spatio-Temporal Heterogeneity

Meta Dynamic Graph for Traffic Flow Prediction¶

Conference: AAAI 2026 arXiv: 2601.10328 Code: zouyiqing-221/MetaDG Authors: Yiqing Zou, Hanning Yuan, Qianyu Yang, Ziqiang Yuan, Shuliang Wang, Sijie Ruan (Beijing Institute of Technology) Area: Autonomous Driving Keywords: Traffic Flow Prediction, Spatio-Temporal Graph, Dynamic Graph, Meta-Learning, GCN, GRU, Spatio-Temporal Heterogeneity

TL;DR¶

This paper proposes MetaDG, a framework that generates dynamic node representations at each time step and enhances them via spatio-temporal correlation, extending dynamism modeling beyond merely updating the adjacency matrix to simultaneously generating meta-parameters, adjacency matrices, and edge-weight adjustment matrices. This enables unified spatio-temporal heterogeneity modeling (ST-unification) and achieves state-of-the-art performance on four benchmark datasets: PEMS03/04/07/08.

Background & Motivation¶

Traffic flow prediction is a canonical spatio-temporal forecasting problem whose core challenge lies in modeling complex spatio-temporal dependencies. Existing methods such as STGCN and GWNet combine temporal models (RNN/CNN) with spatial models (GCN/GAT) to capture temporal and spatial dependencies separately. This "ST-isolated" decoupled architecture struggles to capture complex cross-dimensional spatio-temporal interactions.

Recent studies show that dynamism modeling can effectively bridge the spatial and temporal dimensions: DGCRN generates a dynamic adjacency matrix at each time step, and PDFormer explicitly models propagation delays. However, these methods restrict the use of dynamism to spatial topology (i.e., changes in the adjacency matrix), neglecting the dynamics of latent semantics. In practice, focusing solely on spatial topology while ignoring implicit semantics severely limits model performance.

On the other hand, meta-learning approaches such as AGCRN, MegaCRN, and HimNet model spatio-temporal heterogeneity by generating adaptive node representations, but their underlying model architectures remain ST-isolated, meaning heterogeneity modeling still suffers from the same spatio-temporal separation problem. This paper argues that dynamism modeling can simultaneously address both the ST-isolated problem and heterogeneity separation, pushing both toward ST-unification.

Core Problem¶

How to extend dynamism modeling beyond its effect on spatial topology (adjacency matrix) to a broader scope that includes meta-parameters and edge weights, while simultaneously unifying spatio-temporal heterogeneity modeling through a single dynamic node representation?

Method¶

Overall Architecture¶

MetaDG adopts the Graph Convolutional Recurrent Unit (GCRU) as the fundamental building block of an encoder-decoder structure and dynamically generates an adjacency matrix and meta-parameters at each time step. The framework consists of three core modules:

Dynamic Node Generation (DNG): Generates raw dynamic node embeddings.
Spatio-Temporal Correlation Enhancement (STCE): Enhances spatio-temporal correlations in node representations.
Dynamic Graph Qualification (DGQ): Refines the adjacency matrix based on message-passing reliability.

DNG Module¶

A learnable static node embedding \(\boldsymbol{N} \in \mathbb{R}^{N \times d_s}\) is combined with the current time embedding \(\boldsymbol{T}_t\) and the previous hidden state \(\boldsymbol{H}_{t-1}\) via a time-driven dynamic gate \(\boldsymbol{\gamma}_t\) to produce the dynamic node embedding:

\[\boldsymbol{N}_t = \boldsymbol{\gamma}_t \odot \boldsymbol{N} + (1 - \boldsymbol{\gamma}_t) \odot \hat{\boldsymbol{H}}_{t-1}\]

where \(\boldsymbol{\gamma}_t = \text{sigmoid}(\hat{\boldsymbol{T}_t} \boldsymbol{\Gamma})\). A low \(\boldsymbol{\gamma}_t\) indicates greater reliance on the dynamic hidden state, i.e., higher flexibility.

STCE Module¶

Spatial Correlation Enhancement (SCE): Cross-attention is employed to allow each node to aggregate information from the global node representations of the previous time step:

\[\text{Attn}(\boldsymbol{Q}_t, \boldsymbol{K}_t, \boldsymbol{V}_t) = \text{Softmax}\left(\frac{\boldsymbol{Q}_t \boldsymbol{K}_t^T}{\sqrt{d'}}\right) \boldsymbol{V}_t\]

where \(\boldsymbol{Q}_t\) is derived from \(\boldsymbol{N}_t\), and \(\boldsymbol{K}_t\), \(\boldsymbol{V}_t\) are derived from \(\boldsymbol{N}_{t-1}\). Within the RNN architecture, Variational Dropout is adopted in place of standard Dropout.

Temporal Correlation Enhancement (TCE): The update gate \(\boldsymbol{z}_{t-1}\) from the GRU is used to fuse node representations across adjacent time steps:

\[\boldsymbol{N}_t^{T_*} = \hat{\boldsymbol{z}}_{t-1} \odot \boldsymbol{N}_{t-1} + (1 - \hat{\boldsymbol{z}}_{t-1}) \odot \boldsymbol{N}_t\]

SCE and TCE are concatenated to form STCE, following a "fuse-then-smooth" order (SCE → TCE).

DGQ Module¶

Edge message-passing reliability is measured via the similarity between node representations across time steps, yielding an edge-weight adjustment matrix \(\boldsymbol{\phi}_t\). Edges exceeding a threshold are proportionally amplified, while those below are attenuated:

\[\tilde{\boldsymbol{A}_t} = \text{asym}(\boldsymbol{\phi}_t \odot \boldsymbol{A}_t)\]

An adaptive scaling coefficient \(\boldsymbol{\beta}_t\) is computed via InstanceNorm and an exponential function, avoiding the limitations of fixed coefficients.

Meta-DGCRU¶

The three separately enhanced node representations \(\boldsymbol{N}_t^p\), \(\boldsymbol{N}_t^g\), and \(\boldsymbol{N}_t^m\) are used to generate: - Meta-parameters: \(\boldsymbol{\theta}_t = \boldsymbol{N}_t^p \boldsymbol{\Theta}\) - Raw adjacency matrix: \(\boldsymbol{A}_t = \text{ReLU}(\boldsymbol{N}_t^g \cdot {\boldsymbol{N}_t^g}^T)\) - Edge-weight adjustment matrix: \(\boldsymbol{\phi}_t\) generated from \(\boldsymbol{N}_t^m\)

Key Experimental Results¶

Overall Performance (12-step prediction)¶

Method	PEMS03 MAE	PEMS04 MAE	PEMS07 MAE	PEMS08 MAE
STGCN	15.91	19.64	21.89	16.09
GWNet	14.62	18.54	20.53	14.41
AGCRN	15.36	19.34	20.57	15.31
DGCRN	14.63	19.09	19.87	14.59
HimNet	15.14	18.31	19.50	13.57
PDFormer	14.92	18.32	19.88	13.64
ST-SSDL	14.56	18.13	19.24	13.88
MetaDG	14.29	17.80	18.79	13.04

Ablation Study¶

Variant	PEMS03 MAE	PEMS04 MAE	PEMS07 MAE	PEMS08 MAE
MetaDG	14.29	17.80	18.79	13.04
w/o SCE	14.88	18.20	19.39	13.33
w/o TCE	14.35	17.87	18.95	13.06
w/o STCE	14.98	18.17	19.28	13.37
w/o DGQ	14.48	17.88	18.91	13.06
TSCE (reversed)	14.33	17.92	18.91	13.04
Joined	14.55	18.00	18.93	13.04

Efficiency Comparison (PEMS03)¶

Model	Parameters	Training Time	Inference Time
DGCRN	208K	287s	33s
HimNet	2742K	175s	19s
ST-SSDL	234K	172s	19s
MetaDG	666K	250s	23s

Highlights & Insights¶

Expanded scope of dynamism modeling: Dynamism is extended from affecting only the adjacency matrix to simultaneously generating meta-parameters, adjacency matrices, and edge-weight adjustment matrices, enabling more comprehensive dynamic modeling.
ST-unification paradigm: By enhancing dynamic node representations with spatio-temporal correlations, the previously separated modeling of spatio-temporal heterogeneity is unified into a single framework—a conceptually clean and effective contribution.
Message-passing reliability refinement (DGQ): The module innovatively addresses the quality of information propagation in GCRU graph convolution, reinforcing reliable edges and attenuating unreliable ones via an edge-weight adjustment matrix.
Theoretically motivated SCE → TCE ordering: The fuse-then-smooth ordering (global historical aggregation before temporal smoothing) is supported by ablation experiments confirming its superiority over the reversed order.
Stronger advantage on long-horizon prediction: Per-time-step analysis shows that MetaDG's performance gains over baselines become increasingly pronounced at distant future steps.

Limitations & Future Work¶

Not the lowest computational cost: Compared to ST-SSDL (234K parameters, 19s inference), MetaDG incurs higher parameter count (666K) and inference time (23s), partly due to the three independent STCE branches.
Validated only on traffic flow data: Generalizability to other spatio-temporal forecasting tasks (e.g., air quality, crowd flow, energy consumption) has not been demonstrated.
\(O(N^2)\) complexity of cross-attention: The cross-attention in SCE may become a bottleneck for large-scale road networks with a large number of nodes \(N\).
Hyperparameter sensitivity: Different embedding dimension configurations (\(d_s\), \(d_{tod}\), \(d_{dow}\), \(d_c\)) are required for each dataset; while the authors claim MetaDG alleviates hyperparameter search burden, manual tuning remains necessary.

vs. AGCRN/HimNet (meta-learning methods): MetaDG generates meta-parameters from dynamic node representations rather than static embeddings, outperforming these static meta-learning approaches across all datasets.
vs. DGCRN (dynamic method): DGCRN generates only a dynamic adjacency matrix, whereas MetaDG extends this to meta-parameters and edge weights, reducing MAE on PEMS03 from 14.63 to 14.29.
vs. PDFormer: PDFormer models dynamism via self-attention and propagation delay but offers less flexibility; MetaDG's per-step dynamic graph generation is more adaptive.
vs. ST-SSDL: The most recent self-supervised bias learning method; MetaDG surpasses it on all four datasets (e.g., PEMS04 MAE: 18.13 → 17.80).

The core insight of this paper—that dynamism modeling can advance ST-isolated architectures toward ST-unification—has broader applicability. In many spatio-temporal modeling scenarios, temporal and spatial dimensions are processed in isolation, and bridging them through dynamic node representations is an elegant unification strategy. The message-passing reliability assessment in DGQ is also generalizable to other GNN-based models, particularly in settings where the graph structure itself is noisy. The three-branch STCE design, where separate node representations are enhanced for different model components, reflects the principle that different objectives benefit from different perspectives.

Rating¶

Novelty: ⭐⭐⭐⭐ — The ST-unification idea is original and the DGQ module is creative, though the overall framework is still built upon GCRU.
Experimental Thoroughness: ⭐⭐⭐⭐ — Comprehensive evaluation on four standard datasets with complete ablation and efficiency analyses, though cross-domain validation is absent.
Writing Quality: ⭐⭐⭐⭐ — Motivation is clearly articulated; the narrative from ST-isolated to ST-unification is persuasive.
Value: ⭐⭐⭐⭐ — Open-source code, state-of-the-art results on four datasets, and solid reference value for the spatio-temporal prediction community.