What Expressivity Theory Misses: Message Passing Complexity for GNNs¶
Conference: NeurIPS 2025 arXiv: 2509.01254 Code: https://www.cs.cit.tum.de/daml/message-passing-complexity/ Area: GNN Theory / Expressivity Keywords: message passing complexity, lossyWL, GNN expressivity, over-squashing, continuous metric
TL;DR¶
This paper critiques the binary expressivity theory of GNNs for its inability to explain practical performance differences, and proposes MPC—a continuous, task-specific complexity measure grounded in probabilistic lossyWL—achieving a Spearman correlation of -1 with accuracy (versus ρ = 0 for classical WLC), and successfully explaining why GCN with virtual nodes outperforms higher-expressivity higher-order models on long-range tasks.
Background & Motivation¶
State of the Field¶
Background: GNN expressivity theory centers on the WL isomorphism test, with a large body of work pursuing expressivity beyond WL.
Limitations of Prior Work: Expressivity theory is binary (can/cannot distinguish), and thus cannot explain performance differences among models of equal expressivity; nearly all graph pairs in standard benchmarks are already distinguished by WL, so higher expressivity should yield no additional benefit.
Key Challenge: GCN with virtual nodes (which does not increase WL expressivity) outperforms higher-order models that surpass WL on long-range tasks—a phenomenon expressivity theory cannot explain.
Goal: Propose a continuous complexity framework that preserves impossibility conclusions while explaining practical performance differences.
Key Insight: Probabilistically relax the WL test—each message survives independently with a random-walk probability (lossyWL).
Core Idea: The negative log of the information retention probability under lossyWL = MPC = a task-specific, architecture-specific, continuous measure of learning difficulty.
Method¶
Overall Architecture¶
lossyWL is a probabilistic variant of standard WL in which each message \(m_{u \to v}\) survives with probability \(I_{vu}\). MPC is defined as \(-\log P[\text{lossyWL}_v^L(G) \vDash f_v(G)]\).
Key Designs¶
-
lossyWL:
- Function: Models the lossy nature of message passing in practical MPNNs.
- Mechanism: At each layer, each message survives independently with probability \(I_{vu}\), making node colorings random variables.
- Design Motivation: Standard WL assumes lossless propagation, whereas over-squashing, over-smoothing, and under-reaching cause information loss in practice.
-
Theoretical Properties of MPC:
- Impossibility preservation: \(MPC = \infty\) if and only if WL also fails to distinguish.
- Function refinement: More fine-grained tasks yield higher complexity.
- Over-squashing lower bound: \(MPC \geq -\log(I^L_{vu})\).
-
Generalization to Arbitrary Architectures:
- VN, rewiring, higher-order graphs, etc., are handled by executing lossyWL on the transformed message-passing graph.
Key Experimental Results¶
Feature Retention Task (Over-Smoothing Proxy)¶
Main Results¶
| Architecture | MPC vs. Accuracy Spearman ρ | WLC ρ |
|---|---|---|
| GCN / GIN / GCN-VN / GSN | ρ = −1 (perfect negative correlation) | ρ = 0 |
Long-Range Information Propagation Task¶
Ablation Study¶
| Architecture | Expressivity | MPC | Performance |
|---|---|---|---|
| GCN | WL-equivalent | High | Poor |
| GCN-VN | WL-equivalent | Low | Good |
| CIN | Beyond WL | High | Poor |
| FragNet | Beyond WL | High | Poor |
Key Findings¶
- MPC is perfectly negatively correlated with accuracy; WLC yields ρ = 0.
- GCN+VN surpasses higher-expressivity models on long-range tasks.
Highlights & Insights¶
- "Pursuing higher expressivity may be misleading"—the key to success is minimizing task-specific MPC.
- The design of lossyWL is elegant: modifying a single assumption transforms a binary theory into a continuous one.
Limitations & Future Work¶
- Does not model learning dynamics (gradients / optimization difficulty).
- Monte Carlo simulation may be computationally expensive for large graphs.
- Currently limited to node-level tasks.
Related Work & Insights¶
- vs. WL Expressivity Theory: The binary theory is shown to have limited explanatory power for practical performance.
- vs. Over-Squashing Literature: Over-squashing provides a lower bound on MPC.
Rating¶
- Novelty: ⭐⭐⭐⭐⭐ lossyWL + MPC represents a significant advance in GNN theory.
- Experimental Thoroughness: ⭐⭐⭐⭐⭐ Validated across multiple tasks × architectures × ZINC.
- Writing Quality: ⭐⭐⭐⭐⭐ Arguments are sharp and theory is rigorous.
- Value: ⭐⭐⭐⭐⭐ Redefines how GNNs are understood and improved.
Supplementary Technical Details¶
- The Bernoulli survival probability \(I_{vu}\) in lossyWL corresponds to elements of the normalized adjacency matrix, i.e., random-walk transition probabilities.
- Impossibility theorem for MPC: \(MPC = \infty\) if and only if there exist \(G', w\) such that \(f_v(G) \neq f_w(G')\) but \(M_v(G) = M_w(G')\) for all \(M \in \mathcal{M}\).
- Function refinement theorem: If \(f \vDash g\) (f is more fine-grained than g), then \(MPC(f_v, G) \geq MPC(g_v, G)\).
- Task triangle inequality: \(MPC(f_v \| g_v, G) \leq MPC(f_v, G) + MPC(g_v, G)\).
- Validated on 3-regular random graphs; results transfer to ZINC and the Long Range Graph Benchmark.
- Complexity of the feature retention task grows at least linearly: \(\mathbb{E}[MPC] \in \Omega(L)\) (Lemma retaining).
- Analysis covers 8 architectures: MLP, GCN, GIN, GraphSAGE, GCN-VN, GSN, FragNet, and CIN.
- CIN and GraphSAGE achieve perfect accuracy due to explicit residual connections—the MPC framework abstracts away such implementation details.
- Table 1 shows that WL already distinguishes nearly all graph pairs in popular benchmarks, casting doubt on the practical necessity of higher expressivity.
- MPC can be computed via Monte Carlo simulation, yielding a complexity value per (graph, node, task) tuple.
- Virtual nodes do not increase WL expressivity but reduce MPC (all nodes become reachable via VN in one hop), explaining their empirical effectiveness.
- Per-tuple MPC computation is fast; large-scale evaluation requires sampling.
- Transferability is further validated on ZINC and the Long Range Graph Benchmark.
- The MPC framework naturally generalizes to arbitrary MP graph transformations (including rewiring and higher-order graphs); see Appendix B.1.