Improving the Effective Receptive Field of Message-Passing Neural Networks¶

Conference: ICML 2025
arXiv: 2505.23185
Code: https://github.com/BGU-CS-VIL/IM-MPNN
Area: LLM Evaluation
Keywords: MPNN, Over-squashing, Effective Receptive Field, Multiscale, Graph Coarsening

TL;DR¶

This paper formalizes the concept of Effective Receptive Field (ERF) in MPNNs, proves that node contributions decay exponentially with distance (modeled as a binomial distribution), and proposes the IM-MPNN architecture. By utilizing multiscale graph coarsening and cross-scale information interleaving, IM-MPNN expands the ERF and achieves significant improvements on long-range dependency benchmarks such as LRGB.

Background & Motivation¶

Background: MPNNs update node representations via local message passing, but they are limited by over-squashing, making it difficult to aggregate long-range information effectively.

Limitations of Prior Work: Graph rewiring methods (e.g., SDRF) and Transformer-based methods increase the computational complexity of dense communication; most methods are constrained by the original graph resolution.

Key Challenge: How to enable MPNNs to effectively capture long-range dependencies without significantly increasing computational complexity?

Key Insight: Analogous to multiscale approaches in CNNs, graph coarsening can be leveraged to achieve long-range communication using only a few layers at coarser scales.

Core Idea: For a linear graph, after \(\ell\) layers, node contributions follow a \(B(\ell, 1/2)\) binomial distribution, which exhibits exponential decay; a coarser scale is equivalent to increasing the value of \(\kappa\).

Method¶

Overall Architecture¶

Input graph → Encoding → Graclus coarsening \(S\) times to obtain multiscale graphs → Parallel MP on \(S+1\) scales → Scale-mix information interleaving → Repeat \(L\) times → Unpooling + Concatenation → Task head.

Key Designs¶

ERF Theoretical Analysis:
- The feature of the central node on a linear graph after \(\ell\) layers of uniform-weight convolution is: \(x_0^\ell = \sum_{i=0}^\ell \binom{\ell}{i} v_{2i-\ell}\)
- After normalization, this corresponds to the binomial distribution \(B(\ell, 1/2)\); the exponential decay of tail contributions can be derived from Hoeffding's inequality.
- Continuous diffusion PDE analysis yields Gaussian kernel decay: \(x(p,t) \propto \exp(-\|p-p_0\|^2 / 4\kappa t)\)
Multiscale Processing:
- The Laplacian on a coarser scale is equivalent to a larger \(\kappa\) (coarsening factor of 2 \(\to\) multiplying \(\kappa\) by 4).
- 1 hop on a coarser scale is equivalent to approximately 4 hops on the original graph, while preventing excessive information compression.
Scale-Mix Layer:
- Each node receives information from its parent node (coarser scale) and child nodes (finer scale).
- Different scales utilize MPNN backbones with independent parameters.

Loss & Training¶

Task-related standard loss, which can be combined with different backbones such as GCN, GINE, or GatedGCN.

Key Experimental Results¶

Main Results¶

Dataset	Metric	IM-GatedGCN	GatedGCN	Gain
Peptides-func	AP↑	0.684	0.653	+3.1%
Peptides-struct	MAE↓	0.244	0.256	-4.7%
PascalVOC-SP	F1↑	0.397	0.367	+3.0%

Ablation Study¶

Configuration	Peptides-func AP	Description
GCN baseline	0.594	No multiscale
IM-GCN S=1	0.623	1-level coarsening
IM-GCN S=2	0.645	2-level coarsening
IM-GCN S=3	0.659	3-level coarsening

Key Findings¶

Increasing the number of scales \(S\) continuously improves performance (visualized in Figure 1), indicating that multiscale processing effectively expands the ERF.
IM-MPNN is a general framework that can boost the performance of various MPNN backbones.
The computational efficiency is superior to fully-connected Transformer-based methods.

Highlights & Insights¶

The theoretical analysis of ERF is clear and elegant: a dual perspective from discrete binomial distributions to continuous diffusion equations.
Translating multiscale methods from CNNs to GNNs is a natural yet well-validated idea.
The framework is highly general and can serve as a plug-and-play enhancement for any MPNN.

Limitations & Future Work¶

Graclus coarsening is a fixed preprocessing step and is not learnable.
For highly heterogeneous graph topologies, coarsening quality may be inconsistent.
There is a lack of comprehensive comparison with the latest Graph Transformer methods.

Rating¶

Novelty: ⭐⭐⭐⭐ Novel ERF analysis, incremental methodology.
Experimental Thoroughness: ⭐⭐⭐⭐ Comprehensive LRGB benchmarks.
Writing Quality: ⭐⭐⭐⭐⭐ Clear theoretical motivation.
Value: ⭐⭐⭐⭐ High practical value for addressing long-range dependency issues in GNNs.