ICML2025 Graph Learning AI paper notes paper summaries GNNs LLM Layout & Composition RAG Continual Learning Reasoning

🕸️ Graph Learning¶

🧪 ICML2025 · 31 paper notes

📌 Same area in other venues: 📷 CVPR2026 (8) · 🔬 ICLR2026 (118) · 💬 ACL2026 (24) · 🧪 ICML2026 (35) · 🤖 AAAI2026 (37) · 🧠 NeurIPS2025 (54)

🔥 Top topics: GNNs ×6 · LLM ×3

A Cognac Shot To Forget Bad Memories: Corrective Unlearning for Graph Neural Networks: Cognac is proposed as the first effective corrective unlearning method for GNNs. By alternating between Contrastive Unlearning on Graph Neighborhoods (CoGN) and AsCent DesCent DeCoupled (AC⚡DC), it restores performance close to the oracle (trained on fully clean data) while identifying only 5% of the manipulated entities, achieving 8× higher efficiency than retraining from scratch.
A General Graph Spectral Wavelet Convolution via Chebyshev Order Decomposition: Proposed WaveGC, a multi-resolution graph spectral convolutional network that constructs learnable graph wavelets strictly satisfying the admissibility condition by Separating Odd and Even Chebyshev terms and combines them with matrix-valued filter kernels, achieving consistent improvements across both short- and long-range graph tasks (with a 15.7% gain on VOC).
A Recipe for Causal Graph Regression: Confounding Effects Revisited: This paper systematically extends causal graph learning from classification to regression tasks for the first time. By acknowledging the predictive capacity of confounding subgraphs via an Enhanced Graph Information Bottleneck (Enhanced GIB) and substituting discrete-label-dependent causal intervention methods with contrastive learning, the proposed method significantly outperforms existing approaches on graph-level OOD regression benchmarks.
Balancing Efficiency and Expressiveness: Subgraph GNNs with Walk-Based Centrality: Proposed HyMN: an efficient framework that uses walk-based centrality (Subgraph Centrality) to sample subgraph bags for subgraph GNNs. With only 1-2 subgraphs, it matches the performance of full-bag subgraph GNNs while using centrality as a structural encoding to further enhance discriminative power, scaling subgraph methods to graphs hundreds of times larger for the first time.
Banyan: Improved Representation Learning with Explicit Structure: Banyan introduces two innovations, entangled hierarchical tree structures and diagonalised message passing. With only 14 non-embedding parameters, it outperforms large-scale Transformer models on semantic textual similarity tasks, offering an efficient and viable alternative for semantic representation learning in low-resource languages.
Beyond Message Passing: Neural Graph Pattern Machine: This paper proposes the Neural Graph Pattern Machine (GPM), which samples graph patterns using random walks. It utilizes dual encoders for semantic and anonymous paths to capture node features and topological structures, respectively. A Transformer is then employed to identify task-relevant pattern tokens, bypassing the message-passing paradigm entirely and outperforming SOTA methods across node-, edge-, and graph-level tasks.
CoDy: Counterfactual Explainers for Dynamic Graphs: Proposes CoDy—the first counterfactual explanation method for Temporal Graph Neural Networks (TGNNs), which efficiently explores the space of possible explanatory subgraphs by combining Monte Carlo Tree Search (MCTS) with spatio-temporal heuristic policies, achieving a 16% improvement in AUFSC+ across multiple datasets.
Diss-l-ECT: Dissecting Graph Data with Local Euler Characteristic Transforms: This paper proposes the Local Euler Characteristic Transform (\(\ell\)-ECT), extending classical ECT topological invariants to the local neighborhoods of graphs to generate lossless topological-geometric fingerprints for each node. It outperforms standard GNNs on node classification tasks, particularly on highly heterophilous graphs, while providing theoretical invertibility guarantees and interpretability.
Does Graph Prompt Work? A Data Operation Perspective with Theoretical Analysis: Provides the first comprehensive theoretical framework for Graph Prompts from the perspective of "data operation": proves that prompts can map the original graph to a "bridge graph" by simulating graph data transformations to adapt the frozen model to downstream tasks, and derives the error upper bounds and distributions in both single-graph and multi-graph scenarios.
EvoMesh: Adaptive Physical Simulation with Hierarchical Graph Evolutions: EvoMesh proposes a fully differentiable hierarchical graph evolution framework that adaptively constructs a multi-scale graph hierarchy evolving over time based on physical inputs using Anisotropic Message Passing (AMP) and Gumbel-Softmax-based differentiable node selection (DiffSELECT), outperforming fixed-hierarchy methods by an average of approximately 20% across five physical simulation benchmarks.
From RAG to Memory: Non-Parametric Continual Learning for Large Language Models: HippoRAG 2 is proposed to comprehensively outperform standard RAG across factual memory, semantic understanding, and associative reasoning. This is achieved by integrating passage nodes into knowledge graphs, utilizing query-to-triple deep contextualized linking, and employing LLM-driven recognition memory filtering, moving a step closer to non-parametric continual learning for LLMs.
Graph-constrained Reasoning: Faithful Reasoning on Knowledge Graphs with Large Language Models: This paper proposes Graph-constrained Reasoning (GCR), which encodes knowledge graphs into a KG-Trie and embeds it into the LLM decoding process to achieve faithful reasoning with zero hallucinations, achieving SOTA on KGQA benchmarks and demonstrating zero-shot cross-graph transferability.
Graph Attention is Not Always Beneficial: A Theoretical Analysis of Graph Attention Mechanisms via Contextual Stochastic Block Models: This paper theoretically analyzes the effectiveness conditions of graph attention mechanisms under the Contextual Stochastic Block Model (CSBM) framework. It proves that attention is beneficial when structural noise outweighs feature noise, and harmful otherwise. Furthermore, it designs a multi-layer GAT that relaxes the SNR requirement for perfect node classification from \(\omega(\sqrt{\log n})\) to \(\omega(\sqrt{\log n}/\sqrt[3]{n})\).
GrokFormer: Graph Fourier Kolmogorov-Arnold Transformers: Proposes GrokFormer, which utilizes Fourier series-parameterized Kolmogorov-Arnold learnable activation functions to adaptively learn filter bases on multi-order spectra of the graph Laplacian. It possesses both spectral-order adaptability and spectral adaptability, making it currently the only graph Transformer filter learnable in both dimensions.
HGOT: Self-supervised Heterogeneous Graph Neural Network with Optimal Transport: This paper proposes HGOT, which introduces optimal transport theory into heterogeneous graph self-supervised learning for the first time. It replaces data augmentation and positive/negative sample selection in traditional contrastive learning with a Fused Gromov-Wasserstein transport plan between the branch view (metapath view) and the central view (aggregated view), achieving an average improvement of over 6% in node classification.
Hyperbolic-PDE GNN: Spectral Graph Neural Networks in the Perspective of A System of Hyperbolic Partial Differential Equations: Graph message passing is modeled as a system of hyperbolic partial differential equations. It is proven that the solution space of node features is spanned by the eigenvectors of the Laplacian matrix, thereby embedding topological structure information into node representations. A bridge to spectral GNNs is established through polynomial approximation to enhance their performance.
Is Complex Query Answering Really Complex?: This paper reveals that up to 98% of "complex" queries in existing benchmarks for complex query answering (CQA) over knowledge graphs can actually be simplified to simple single-link prediction problems, leading to a severe overestimation of research progress. The authors propose new benchmarks with balanced sampling (FB15k237+H, NELL995+H, ICEWS18+H) and introduce a hybrid solver, CQD-Hybrid, to validate this finding. On these new benchmarks, the MRR of all SOTA methods drops significantly (by up to more than 30 percentage points).
L-STEP: Learnable Spatial-Temporal Positional Encoding for Link Prediction: L-STEP is proposed as a lightweight temporal link prediction model based on learnable spatial-temporal positional encoding. It captures the temporal evolution of positional encoding through Discrete Fourier Transform, replacing Transformer attention mechanisms with MLPs to achieve SOTA performance with faster execution.
LLM Enhancers for GNNs: An Analysis from the Perspective of Causal Mechanism Identification: Analyzes the internal mechanisms of the "LLM Enhancer + GNN" paradigm from the perspective of causal mechanism identification, finding that LLM enhancers primarily provide node-level/raw-data-level information, and subsequently proposes an Attention Transmission (AT) module to optimize the information transfer between them.
Machines and Mathematical Mutations: Using GNNs to Characterize Quiver Mutation Classes: This work investigates quiver mutation equivalence classes using graph neural networks (GNNs) and explainability techniques, independently rediscovering the combinatorial characterization theorem for mutation classes of type \(\tilde{D}\) quivers, demonstrating the value of ML as a tool for mathematical research.
Mitigating Over-Squashing in Graph Neural Networks by Spectrum-Preserving Sparsification: Proposes GOKU (Densification-Sparsification Rewiring paradigm), which treats the input graph as a spectral sparsifier of an unknown dense latent graph and solves the inverse sparsification problem. It effectively mitigates the over-squashing problem in GNNs by enhancing graph connectivity while explicitly preserving the Laplacian spectrum.
Mixed-Curvature Decision Trees and Random Forests: Extends classical decision tree and random forest algorithms from Euclidean space to mixed-curvature product manifolds (hyperbolic \(\times\) spherical \(\times\) Euclidean). By utilizing angular reformulation to construct split criteria that respect manifold geometry, the proposed method achieves outstanding performance across 57 classification, regression, and link prediction tasks (ranking 1st in 29 tasks and top-2 in 41 tasks).
GlycanAA: Modeling All-Atom Glycan Structures via Hierarchical Message Passing and Multi-Scale Pre-training: This work proposes GlycanAA, the first all-atom glycan modeling approach. Glycans are represented as heterogeneous graphs containing atom nodes and monosaccharide nodes. A hierarchical message passing scheme captures multi-scale information ranging from local atomic interactions to global monosaccharide interactions. This is further enhanced by multi-scale masked prediction pre-training (PreGlycanAA), achieving top performance across 11 tasks on the GlycanML benchmark.
On Measuring Long-Range Interactions in Graph Neural Networks: This paper formally defines "long-range interactions" in graph tasks from first principles for the first time. It derives a unique range measure \(\hat{\rho}_u = \mathbb{E}_{v \sim I_u}[d_G(u,v)]\) that satisfies four axioms. After validating its effectiveness through synthetic experiments, the authors utilize this measure to reveal that the peptides tasks in the LRGB benchmark are actually short-range.
Open Your Eyes: Vision Enhances Message Passing Neural Networks in Link Prediction: This work introduces visual perception into message-passing neural networks (MPNNs) for the first time. By visualizing subgraphs as images and leveraging a vision encoder to extract Visual Structural Features (VSFs), the proposed GVN/E-GVN framework achieves state-of-the-art (SOTA) performance across 7 link prediction benchmarks.
Positional Encoding meets Persistent Homology on Graphs: Theoretically proves that positional encoding (PE) and persistent homology (PH) on graphs are mutually incomparable in terms of distinguishing non-isomorphic graphs. This paper proposes PiPE (Persistence-informed Positional Encoding), which unifies both via message-passing networks and is provably more expressive than either method alone, consistently outperforming pure PE and pure PH baselines on multiple benchmarks including ZINC, Alchemy, DrugOOD, and BREC.
TINED: GNNs-to-MLPs by Teacher Injection and Dirichlet Energy Distillation: The authors propose TINED, which directly injects the parameters of feature transformations (FT) in GNNs into MLPs (Teacher Injection) and transfers the opposing smoothing properties of FT and graph propagation (GP) in GNN layers using Dirichlet energy distillation. TINED outperforms GNN teachers across 7 datasets while achieving a 94x speedup in inference.
TopInG: Topologically Interpretable Graph Learning via Persistent Rationale Filtration: TopInG proposes a topologically interpretable graph learning framework based on persistent homology. By learning a "rationale filtration", it identifies stable and persistent rationale subgraphs. It introduces a "topological discrepancy" constraint to reinforce the topological distinction between rationale and irrelevant subgraphs, significantly outperforming existing methods in handling variform rationale subgraphs.
Towards Graph Foundation Models: Learning Generalities Across Graphs via Task-Trees: The authors propose Task-Tree as a unified learning instance, aligning node/edge/graph-level tasks into a single representation space by introducing virtual task nodes. Combined with a reconstruction objective for GNN pre-training, they build the graph foundation model GIT, achieving cross-domain and cross-task generalization across fine-tuning, in-context learning, and zero-shot settings on 32 graphs from 5 domains.
Unifews: You Need Fewer Operations for Efficient Graph Neural Networks: Unifews proposes a unified element-wise sparsification framework that treats graph propagation and feature transformation in GNNs as matrix operations. By simultaneously pruning graph edges and model weights based on magnitude thresholds, it provides bounded approximation error guarantees via spectral graph smoothing theory, achieving up to 100x speedup on billion-edge graphs without accuracy loss.
WILTing Trees: Interpreting the Distance Between MPNN Embeddings: This paper discovers that the embedding distances learned by MPNNs align with task-related functional distances rather than structural distances. Consequently, it proposes an optimal transport distance based on Labeled Weisfeiler-Leman Trees (WILT) to distill and interpret MPNN distances, where edge weights reveal that a small number of key subgraphs dominate the metric structure of the embedding space.