Adaptive Riemannian Graph Neural Networks¶

Conference: AAAI 2026 arXiv: 2508.02600 Code: Available (public repository, built on PyG + Geoopt) Area: Graph Neural Networks / Geometric Deep Learning Keywords: Riemannian geometry, adaptive metric tensor, Ricci flow regularization, geometric heterogeneity, message passing

TL;DR¶

This paper proposes ARGNN, a framework that learns a continuous, anisotropic diagonal Riemannian metric tensor for each node in a graph, enabling adaptive capture of local geometric properties across different graph regions (hierarchical structures vs. dense communities). ARGNN unifies and outperforms geometric GNN methods based on fixed curvature or discrete mixed-curvature spaces.

Background & Motivation¶

Real-world graph data commonly exhibit geometric heterogeneity: a single network may contain both tree-like hierarchical structures best represented in hyperbolic space and dense cyclic communities more suited to spherical space. Existing geometric GNN methods either embed the entire graph into a single fixed-curvature space (Euclidean/Hyperbolic/Spherical) or resort to discrete product spaces (e.g., CUSP's \(\mathbb{H} \times \mathbb{S} \times \mathbb{E}\)), neither of which can adequately express continuous geometric variation at the node level. While \(\kappa\)-GCN learns a scalar curvature per node, it remains isotropic and fails to capture directional geometric information.

The authors' visualization on the Wisconsin network clearly demonstrates significant curvature variation across regions (from flat to strongly curved), confirming that a fixed geometric space inevitably introduces severe distortion in certain areas.

Core Problem¶

How can one learn a symmetric positive definite (SPD) metric tensor for each node in a graph—while maintaining computational feasibility—so that message passing adapts to local geometry, and training stability and theoretically provable expressiveness are both guaranteed?

Method¶

Overall Architecture¶

ARGNN comprises three core components: (1) learning of node-level diagonal metric tensors, (2) geometric message passing based on the learned metrics, and (3) Ricci flow–inspired geometric regularization. The geometric field and node representations are jointly learned end-to-end.

Key Designs¶

1. Diagonal Metric Tensor Parameterization

The metric tensor for node \(i\) is parameterized as a diagonal matrix \(\mathbf{G}_i = \text{diag}(\mathbf{g}_i)\), where \(\mathbf{g}_i \in \mathbb{R}^d_{++}\). This is not merely a computational simplification (reducing parameters from \(O(d^2)\) to \(O(d)\)); it corresponds to an anisotropic conformal transformation, assigning each feature dimension an independent local scaling factor—an elegant compromise between the full metric tensor and scalar curvature.

The metric vector is generated by a small MLP: node features \(\mathbf{h}_i\) and neighborhood aggregation features \(\mathbf{a}_i\) are concatenated and passed through a softplus activation to guarantee strict positivity:

\[\mathbf{g}_i = \text{softplus}\left(f_\theta^{(g)}([\mathbf{h}_i; \mathbf{a}_i])\right)\]

2. Geometric Message Passing

Geodesic distance: Under metric \(\mathbf{G}_i\), this becomes the weighted Euclidean distance \(d_{\mathbf{G}_i}(\mathbf{h}_i, \mathbf{h}_j) = \sqrt{\sum_k g_{i,k}(h_{i,k} - h_{j,k})^2}\)
Geometric modulation coefficient \(\tau_{ij}\): Projects the direction vector onto principal axes and uses \(\tanh(-\log g_{i,k})\) as a curvature switch—large \(g_{i,k}\) yields values near \(-1\) (spatial contraction), small \(g_{i,k}\) yields values near \(+1\) (spatial expansion)
Geometric attention \(\alpha_{ij}\): Computes cosine similarity under each node's respective metric, with norms measured in the node's own metric space

The message update is \(\mathbf{m}_{ij} = \tau_{ij} \cdot \sigma(\alpha_{ij}) \cdot \mathbf{W}_m \mathbf{h}_j\).

3. Ricci Flow Regularization

The discrete Ricci curvature in the \(k\)-th dimension is approximated as \(\text{Ric}_{kk}^{(i)} = \frac{1}{2|\mathcal{N}(i)|} \sum_{j \in \mathcal{N}(i)} \frac{g_{i,k} - g_{j,k}}{d_{\text{graph}}(i,j)}\), with two regularization terms: - \(\mathcal{L}_{\text{Ricci}}\): Penalizes the sum of squared Ricci curvatures, encouraging Ricci flatness - \(\mathcal{L}_{\text{smooth}}\): Penalizes differences in metric vectors between adjacent nodes, ensuring smoothness of the geometric field

Loss & Training¶

\[\mathcal{L}_{\text{total}} = \mathcal{L}_{\text{task}} + \alpha \mathcal{L}_{\text{Ricci}} + \beta \mathcal{L}_{\text{smooth}}\]

The authors theoretically derive the relationship between optimal hyperparameters and the homophily ratio \(\mathcal{H}\): \(\alpha^* \propto \mathcal{H}/L\), \(\beta^* \propto d/|\mathcal{V}|\). Constants are set as \(c_1 = (1 - \mathcal{H}) + 0.1\) and \(c_2 = 0.1(1 + \mathcal{H})\); experiments confirm that deviations from grid-search optima remain within 0.5%. Optimization employs Adam combined with Riemannian Adam (Geoopt).

Key Experimental Results¶

Node classification and link prediction are evaluated on 9 benchmark datasets:

Metric	Cora	Actor	Wisconsin	Baseline
F1 (classification)	86.83	42.18	90.65	CUSP: 83.45 / 41.91 / 88.30
AUROC (link prediction)	91.03	76.40	77.48	CUSP: 89.85 / 74.20 / 74.50

Node classification: best on all 9 datasets; Cora surpasses CUSP by 3.38%, Wisconsin exceeds the best baseline by 2.35%
Link prediction: best on all 9 datasets; Actor achieves 76.40% vs. GNRF's 73.50%
Efficiency: approximately 35% faster than CUSP, ~40% lower memory than full-tensor methods, comparable to HGCN

Ablation Study¶

Removing Ricci regularization: larger performance drops on heterophilic graphs (Actor \(-1.3\%\), Wisconsin \(-1.8\%\))
Removing smoothness regularization: Wisconsin drops by 3.5% (mixed homophilic structures require smooth geometric transitions)
Fixed vs. adaptive geometry: ARGNN outperforms fixed geometry by ~5% on heterophilic graphs (Actor)
Theory-guided hyperparameters deviate from grid-search optima by only 0.3–0.5%
Optimal configuration: \(L = 3\) layers, embedding dimension \(d = 128\)

Highlights & Insights¶

Elegant parameterization: The diagonal metric tensor strikes a refined balance among computational efficiency, geometric expressiveness, and interpretability—each \(g_{i,k}\) directly quantifies the geometric importance of the \(k\)-th feature dimension for node \(i\)
Strong theoretical completeness: Convergence guarantees, universal approximation (unifying Euclidean, Hyperbolic, Spherical, and Product spaces as special cases), generalization bounds, and robustness proofs are all provided
Theory-guided practice: The homophily-based hyperparameter formulas prove highly effective empirically, reducing tuning overhead by a factor of \(100\times\)
Interpretable learned geometry: The learned curvature distributions are highly consistent with graph homophily ratios—heterophilic graphs exhibit larger curvatures and higher metric variance; visualizations provide intuitive geometric insight into graph structure

Limitations & Future Work¶

Expressiveness bottleneck of the diagonal constraint: Diagonal metrics cannot capture inter-dimensional correlations or rotational geometry, which may limit performance on highly entangled feature spaces. The authors acknowledge that low-rank factorization \(\mathbf{G}_i = \mathbf{L}_i \mathbf{L}_i^T\) is a worthwhile intermediate direction
Scalability: Although complexity is on par with standard GNNs, maintaining \(O(d)\) metric vectors per node imposes nontrivial overhead on very large graphs (the authors suggest metric sharing via clustering as a potential remedy)
Deep network degradation: Performance degrades for \(L > 3\), indicating that geometric regularization does not fully resolve the over-smoothing problem
Limited experimental scope: Evaluation covers only node classification and link prediction, without graph-level tasks or heterogeneous graphs

Method	Geometry Type	Node-Adaptive	Anisotropic	End-to-End
HGCN	Fixed hyperbolic	✗	✗	✗
\(\kappa\)-GCN	Scalar curvature	✓	✗	✓
CUSP	Discrete product space	Partial	✗	✓
GNRF	Fixed geometry + Ricci evolution	✗	✗	✓
ARGNN	Continuous diagonal metric field	✓	✓	✓

ARGNN is the first graph learning framework to learn a continuous and anisotropic metric tensor field, surpassing prior methods in both hierarchical structure representation and theoretical unification.

Beyond graph learning, the diagonal metric tensor paradigm generalizes naturally to other geometric data such as point clouds and molecular graphs, replacing global metrics with per-point anisotropic ones. The discretization approach to Ricci flow regularization is worth borrowing in other manifold learning settings (e.g., curvature control in representation learning), and the homophily-aware hyperparameter selection strategy transfers readily to other methods relying on graph structural priors.

Rating¶

Novelty: ⭐⭐⭐⭐⭐ — First work to introduce a continuous anisotropic Riemannian metric field into GNNs, representing a fundamental conceptual advance in the geometric GNN paradigm
Theoretical Depth: ⭐⭐⭐⭐⭐ — Convergence, universality, generalization bounds, and robustness proofs are all provided, forming a tight theory–experiment loop
Experimental Thoroughness: ⭐⭐⭐⭐ — Comprehensive ablations and multi-metric evaluation, though graph-level tasks and larger-scale datasets are absent
Value: ⭐⭐⭐⭐ — Computationally efficient with theory-guided hyperparameter selection, though applicability to industrial-scale settings requires further validation
Writing Quality: ⭐⭐⭐⭐⭐ — Clear structure, thorough motivation, and natural integration of theory and experiments