Skip to content

🕸️ Graph Learning

🔬 ICLR2026 · 118 paper notes

📌 Same area in other venues: 📷 CVPR2026 (8) · 💬 ACL2026 (24) · 🧪 ICML2026 (35) · 🤖 AAAI2026 (37) · 🧠 NeurIPS2025 (54) · 📹 ICCV2025 (1)

🔥 Top topics: GNNs ×14 · LLM ×7 · Diffusion Models ×6 · Reasoning ×5 · Alignment/RLHF ×4

A Graph Meta-Network for Learning on Kolmogorov–Arnold Networks

This paper demonstrates that Kolmogorov–Arnold Networks (KAN) share the same neuron permutation symmetries as MLPs. Based on this, it encodes trained KANs into "KAN-graphs" (where nodes represent neurons and edges carry parameters of 1D functions). It proposes WS-KAN, the first weight-space architecture designed for KANs using a bidirectional message-passing GNN, which significantly outperforms symmetry-agnostic baselines in tasks such as accuracy prediction, INR classification, and pruning mask prediction.

Actions Speak Louder than Prompts: A Large-Scale Study of LLMs for Graph Inference

This paper presents a large-scale, controlled empirical study systematically comparing three "interaction modes" for LLMs to process textual graphs: direct prompting, ReAct-style tool calling, and Graph-as-Code (where the LLM writes code to query the graph). The study finds that allowing the LLM to write code for graph operations (rather than stuffing the graph into the prompt) is overall superior for node classification, especially on dense graphs with long text or high degrees, as it enables adaptive switching between structural, feature, and label signals.

Adaptive Mixture of Disentangled Experts for Dynamic Graph Out-of-Distribution Generalization

Addressing the phenomenon where "distribution shift itself evolves over time" on dynamic graphs, this paper proposes AdaMix: it utilizes a spatio-temporal distribution detector to perceive shifts at each time step in real-time, employs prototype-guided disentangled mixture of experts (using various GNN architectures as experts) for adaptive routing based on shifts, and finally applies a distribution-aware intervention mechanism to mine invariant patterns, significantly outperforming fixed-architecture SOTA methods on real and synthetic dynamic graph datasets.

AdaSpec: Adaptive Spectrum for Enhanced Node Distinguishability

This paper characterizes the expressive power of spectral GNNs from the perspective of "node distinguishability." It proves that the lower bound of distinguishable nodes is jointly determined by the number of distinct eigenvalues of the graph matrix and the number of non-zero frequency components of node features. Based on this, the authors propose AdaSpec, a plug-and-play adaptive graph matrix generation module that significantly enhances the ability of spectral GNNs to distinguish nodes in heterophilic graphs without increasing the order of computational complexity or violating permutation equivariance.

AdS-GNN - a Conformally Equivariant Graph Neural Network

This paper "lifts" point clouds from flat Euclidean space to a higher-dimensional Anti-de Sitter (AdS) space. Leveraging the correspondence in physics between AdS isometry transformations and boundary conformal transformations, the authors construct AdS-GNN, the first Graph Neural Network equivariant to the full conformal group (including translations, rotations, scaling, and non-affine special conformal transformations). The model demonstrates stronger scale generalization on tasks such as SuperPixel MNIST, shape segmentation, and Ising model correlation functions, and can directly read out physically meaningful universal quantities like conformal dimensions from the trained network.

Are We Measuring Oversmoothing in Graph Neural Networks Correctly?

This work points out that the widely used Dirichlet energy metric fails to correctly capture the oversmoothing phenomenon in practical GNN scenarios. It proposes using the numerical/effective rank (Erank) of feature representations as an alternative metric. Under the setting of independent training for each depth (from 2 to 24), Erank achieves an average correlation of 0.91 with accuracy (consistent positive direction), whereas Dirichlet energy averages only −0.72 and its correlation direction fluctuates across datasets (failing particularly on large-scale OGB-Arxiv). Furthermore, it theoretically proves that for linear GNNs and a family of non-linear GNNs with non-negative weights, the numerical rank of the feature matrix converges to 1 (rank collapse), thereby redefining oversmoothing as rank collapse rather than eigenvector alignment.

AtlasKV: Augmenting LLMs with Billion-Scale Knowledge Graphs in 20GB VRAM

AtlasKV directly converts each triple in a knowledge graph (KG) into Q-K-V data for injection into LLMs via attention. By employing hierarchical key-value pruning, it reduces complexity from linear to sub-linear, enabling LLMs to access billion-scale (1B triples) knowledge graphs within 20GB of VRAM without external retrievers, long context windows, or retraining for new knowledge.

Atomic HINs: Entity-Attribute Duality for Heterogeneous Graph Modeling

This paper proposes the "entity-attribute duality" principle, atomizing all attributes in a Heterogeneous Information Network (HIN) into entity nodes to obtain an "Atomic HIN" as a canonical form with maximal expressiveness. By applying a genetic algorithm for binary selection (schema refinement) on node/edge types, a minimal version of RGCN (sRGCN) achieves SOTA performance on node classification and link prediction across 8 datasets.

Beyond Entity Correlations: Disentangling Event Causal Puzzles in Temporal Knowledge Graphs

This paper proposes HEDRA, the first representation learning framework for heterogeneous causal disentanglement at the event level in Temporal Knowledge Graphs (TKGs). By using three modules—counterfactual detection, instrumental variable guidance, and evolutionary orthogonality—it sequentially strips away non-causal and pseudo-causal relations while separating dynamic and static causality, achieving SOTA on five real-world datasets.

Beyond Simple Graphs: Neural Multi-Objective Routing on Multigraphs

This paper proposes GMS, the first neural combinatorial optimization routing method for multigraphs. It includes two variants: GMS-EB, which performs edge-level autoregressive construction directly on multigraphs, and GMS-DH, a dual-head approach that learns to prune multigraphs before node-level routing. GMS achieves performance close to the exact solver LKH on asymmetric multi-objective TSP and CVRP while being dozens of times faster.

Bridging Input Feature Spaces Towards Graph Foundation Models

ALL-IN utilizes "random Gaussian projection + node covariance operators" to unify graph node features with varying dimensions, semantics, and ranges into a shared representation independent of the original feature space. This enables a single pre-trained GNN to transfer to unseen datasets with entirely new input features without architectural changes or retraining.

Bridging ML and Algorithms: Comparison of Hyperbolic Embeddings

This is an empirical benchmark paper that "bridges academic silos." The authors bring together 14 hyperbolic embedding methods from the Machine Learning (ML), Network Theory (NT), and Algorithms communities—which have long ignored each other—to compete on 38 real-world networks + 600 simulated networks. They find that the near-linear BFKL algorithm from the 2016 Algorithms community is approximately 100x faster than the popular Poincaré/Lorentz embeddings in the ML community while maintaining comparable or superior quality. They also propose a new quality metric, ICV, which penalizes high-dimensional and high-radius embeddings.

Bures-Wasserstein Flow Matching for Graph Generation

Addressing the issue where existing graph diffusion/flow models "decouple nodes and edges for independent linear interpolation," leading to non-smooth probability paths and convergence difficulties, this paper models graphs as coupled colored Gaussian systems using Markov Random Fields (MRF). By constructing smooth, closed-form, simulation-free probability paths based on Bures-Wasserstein (BW) displacement between graph distributions, the proposed BWFlow framework achieves superior performance, faster convergence, and highly efficient sampling in planar graph and molecular generation.

Canonical Tree Cover Neural Networks for Expressive and Invariant Graph Learning

To address the issues where "compressing a graph into a single sequence for canonicalization distorts graph distances and limits expressivity by node labelers," this paper proposes CTNN. It represents a graph as a set of canonical spanning tree covers. Each tree is processed by a powerful recursive tree encoder and then aggregated. Theoretically, this approach preserves invariance, maintains distances more effectively, and is strictly more expressive than sequence canonicalization. It consistently outperforms invariant GNNs and sequence canonicalization baselines in sparse molecular/protein graph classification.

CheckMate! Watermarking Graph Diffusion Models in Polynomial Time

CheckWate is the first sampling-time watermarking framework for graph diffusion models. It embeds watermarks into the eigenvalues of noise latent variables (as eigenvalues are invariant to graph isomorphism), thereby bypassing the NP-hard obstacles of Graph Isomorphism (GI) and Graph Edit Distance (GED). It achieves \(O(N^3)\) polynomial-time watermark verification with stable detection across four datasets and four types of graph attacks, whereas baselines adapted from image/tabular watermarks almost entirely fail under isomorphism attacks.

CLAUSE: Agentic Neuro-Symbolic Knowledge Graph Reasoning via Dynamic Learnable Context Engineering

CLAUSE treats the problem of "what context to retrieve" in multi-hop KGQA as a budgeted sequential decision-making process. Three collaborative neuro-symbolic agents (Architect, Navigator, and Curator) are jointly optimized under three types of resource constraints (edges, steps, and tokens) using the proposed LC-MAPPO (Lagrangian-Constrained Multi-Agent PPO). A single checkpoint can adjust the "accuracy-latency-cost" trade-off based on per-query budgets or prices without retraining.

Compactness and Consistency: A Conjoint Framework for Deep Graph Clustering

CoCo utilizes graph convolutional filtering to extract complementary node representations from two views: local (adjacency graph) and global (graph diffusion matrix). It then employs a shared low-rank subspace to compress these into compact embeddings for redundancy and noise removal (Compactness). Finally, a cross-view similarity distribution consistency loss aligns the semantics of both sides (Consistency), outperforming existing SOTA methods across five graph clustering benchmarks.

Confident Block Diagonal Structure-Aware Invariable Graph Completion for Incomplete Multi-view Clustering

Addressing Incomplete Multi-view Clustering (IMVC) with partial view missing, this paper employs a "confident block diagonal regularizer" to recover strictly consistent local block diagonal structures across all views. It utilizes an "invariable graph completion" term to infer the latent structures of missing instances and jointly learns a consensus spectral clustering representation. The method outperforms existing IMVC approaches on benchmarks including BBCSport, COIL-20, Caltech-7, and BUAA.

Constant Degree Matrix-Driven Incomplete Multi-View Clustering via Connectivity-Structure and Embedding Tensor Learning

CAMEL unifies graph connectivity constraints and low-rank constraints of latent embedding tensors into a single objective. By replacing the data-dependent degree matrix in the Laplacian with a constant degree matrix \(D=\beta I\), it reduces the degree matrix construction complexity from \(O(n^2)\) to \(O(1)\). It performs k-means directly on latent embeddings without SVD post-processing, achieving high speed and accuracy across nine incomplete multi-view datasets.

Contraction and Hourglass Persistence for Learning on Graphs, Simplices, and Cells

This paper identifies that inclusion-based forward persistent homology (PH) in mainstream Graph Neural Networks (GNNs) suffers from expressivity and metric limitations. It proposes using "contraction" to retroactively extinguish immortal topological features and interleaves inclusion and contraction into Hourglass Persistence. This method is proven to be more expressive, measurable, and stable. The authors provide a differentiable algorithm that, when integrated into GNNs, consistently outperforms existing PH methods across multiple graph datasets.

Controllable Logical Hypothesis Generation for Abductive Reasoning in Knowledge Graphs

This paper proposes CtrlHGen, upgrading abductive reasoning on knowledge graphs (inferring rational logical hypotheses from observed entities) into a "controllable" task. This allows users to specify the semantic direction and structural complexity of hypotheses. Through data augmentation via sub-logic decomposition, it mitigates the "scarcity of long hypothesis samples." By using a smooth semantic reward involving Dice/Overlap alongside a condition-following reward, it addresses "reward oversensitivity." On three KG datasets, CtrlHGen demonstrates better adherence to control signals and superior semantic similarity compared to baselines.

Cooperative Sheaf Neural Networks

This paper proposes defining in/out-degree Laplacians for cellular sheaves on directed graphs to construct the Cooperative Sheaf Neural Network (CSNN). This allows nodes to independently choose information propagation or reception strategies, simultaneously mitigating oversquashing and addressing heterophilic tasks.

CORDS - Continuous Representations of Discrete Structures

The task of "predicting a set of objects with unknown cardinality" is reformulated as inference over continuous fields. CORDS employs an invertible mapping to encode discrete object sets into a density field (encoding position and count) and a feature field (carrying attributes). The model learns entirely within the field space and performs precise decoding back to discrete sets when necessary. This allows handling variable cardinality in tasks such as molecule generation, object detection, and simulation inference without the need for padding or specialized counting heads.

DAMR: Efficient and Adaptive Context-Aware Knowledge Graph Question Answering with LLM-Guided MCTS

DAMR models KGQA as a Monte Carlo Tree Search (MCTS) guided by an LLM planner. The LLM selects top-\(k\) relevant relations during the expansion step to prune the search space, while path scoring is delegated to a lightweight Transformer scorer (jointly encoding questions and relation sequences via cross-attention). This scorer is online fine-tuned using pseudo-paths generated during the search process. DAMR outperforms all SOTAs on WebQSP (Hits@1 94.0) and CWQ (78.0), while reducing LLM calls by over 50% and token consumption by over 75%.

DHG-Bench: A Comprehensive Benchmark for Deep Hypergraph Learning

DHG-Bench is the first comprehensive benchmark for Hypergraph Neural Networks (HNNs). Under a unified experimental protocol, it systematically evaluates 17 SOTA HNN algorithms against 22 datasets (covering node, hyperedge, and hypergraph task granularities) across four dimensions: effectiveness, efficiency, robustness, and fairness. Through extensive controlled experiments, it reveals systemic shortcomings in existing HNNs, such as "performance collapse when switching data/tasks," "inability to handle large graphs," "vulnerability to feature/label noise," and "lower fairness compared to MLPs."

Differentiable Lifting for Topological Neural Networks

The authors propose ∂lift (DiffLift), an end-to-end learnable graph "lifting" framework. It uses GNN node embeddings to parameterize the "distribution of candidate high-order cells" and employs Bernoulli sampling combined with a Straight-Through Estimator (STE) to determine which cells enter the topological structure. This transforms hypergraph, simplicial, or cell complex structures—originally determined by prior heuristics—into structures learned under downstream task supervision. It achieves performance gains of up to 45% over static lifting across 12 datasets and 4 TNN architectures.

Directed Semi-Simplicial Learning with Applications to Brain Activity Decoding

This paper proposes Semi-Simplicial Neural Networks (SSNs)—the first topological deep learning model to operate directly on "semi-simplicial sets". By unifying and surpassing various networks on graphs, directed graphs, and simplicial complexes through a relational algebra induced by face maps, it achieves strictly higher theoretical expressivity. It outperforms the runner-up model by up to 27% and message-passing GNNs by up to 50% on brain activity decoding tasks using biologically realistic cortical microcircuits.

Discrete Bayesian Sample Inference for Graph Generation

This paper proposes GraphBSI, extending Bayesian Sample Inference (BSI) from continuous to discrete categorical data. This allows the model to iteratively refine beliefs about the graph in the "distribution parameter space on the probability simplex" rather than directly evolving discrete graphs. It formulates this process as a family of SDEs with adjustable noise \(\gamma\), achieving SOTA in a one-shot manner on Moses and GuacaMol molecule generation benchmarks.

Diverse and Sparse Mixture-of-Experts for Causal Subgraph–Based Out-of-Distribution Graph Learning

DiSCO delegates the task of "identifying causal subgraphs" in graph Out-of-Distribution (OOD) generalization to a set of experts (MoE). Each expert extracts a distinct candidate causal subgraph, and a learned sparse gating mechanism selects the most appropriate expert for each instance. It requires no environment labels and makes no assumptions about the independence between spurious subgraphs and labels, achieving first place on average in the GOOD benchmark.

DR-GGAD: Dual Residual Centering for Mitigating Anomaly Non‑Discriminativity in Generalist Graph Anomaly Detection

To address the long-standing issue where normal and anomalous node representations become entangled when a trained graph anomaly detector transfers to a new graph, this paper proposes a quantifiable metric, AnD (Anomaly non-Discriminativity). It further introduces Dual Residual Centering (Hyper Residual + Affinity Residual) to mitigate this by comparing each node to domain-invariant residual centers rather than directly comparing nodes. With frozen parameters and zero target-domain fine-tuning, the method achieves an average AUROC improvement of 5.14% over prior state-of-the-art generalist methods across 8 target graphs.

Dual-Branch Representations with Dynamic Gated Fusion and Triple-Granularity Alignment for Deep Multi-View Clustering

DREAM explicitly decouples semantic and structural information, which are often treated with imbalanced emphasis in multi-view clustering, into two parallel branches using VAE and GCN. It employs gated fusion to adaptively adjust weights based on the dataset and utilizes triple-granularity alignment (cross-view, intra-sample, and inter-cluster) to unify heterogeneous embedding spaces, outperforming eight SOTA methods across six benchmarks.

Dynamic Multi-sample Mixup with Gradient Exploration for Open-set Graph Anomaly Detection

Addressing the challenge of open-set graph anomaly detection (GAD)—where models only see a few anomaly types during training but must detect never-before-seen anomalies during inference—this paper proposes DEMO. It uses dynamic multi-sample Mixup to fuse seen anomalies into diverse synthetic anomalies to expand decision boundaries, employs energy gradient feedback to dynamically reweight samples, and utilizes memory-guided class-adaptive thresholds for reliable pseudo-labeling. DEMO consistently outperforms various GAD baselines across six graph datasets.

Efficient Learning on Large Graphs using a Densifying Regularity Lemma

This paper proposes "Intersecting Block Graphs" (IBG)—a low-rank decomposition representing large directed sparse graphs as a superposition of \(K \ll N\) intersecting bipartite blocks. It proves a "densifying" version of the Weak Regularity Lemma, ensuring that the required number of blocks \(K\) depends only on the approximation precision and is independent of graph scale or sparsity. Consequently, IBG-based neural networks achieve SOTA performance on node classification, spatio-temporal prediction, and knowledge graph completion with \(O(N)\) (rather than \(O(E)\)) time and space complexity.

\(\ell_1\) Latent Distance Based Continuous-Time Graph Representation

This work replaces the squared \(\ell_2\) distance in existing continuous-time graph representations—which violates the triangle inequality—with the \(\ell_1\) distance. It derives closed-form piecewise exponential integrals and addresses non-differentiability via subgradient methods, outperforming eight baselines including GRASSP across 11 datasets and three evaluation tasks.

Entropy-Guided Dynamic Tokens for Graph-LLM Alignment in Molecular Understanding

The authors propose EDT-Former (Entropy-guided Dynamic Token Transformer), which establishes efficient alignment between a frozen graph encoder and an LLM through an entropy-guided dynamic token generation mechanism. It achieves SOTA performance on benchmarks including molecular QA, molecular instructions, and property prediction without fine-tuning the LLM backbone.

Escaping the Homophily Trap: A Threshold-free Graph Outlier Detection Framework via Clustering-guided Edge Reweighting

Addressing the "homophily trap" where graph convolution pollutes normal node representations with outliers through neighbor aggregation, this paper proposes CER-GOD. It employs a learnable mask to adaptively weaken edge weights between heterophilic neighbors and utilizes an unsupervised binary clustering detector to generate pseudo-labels. These labels guide the mask optimization and provide threshold-free outlier scores. Combined with a diversity loss to prevent cluster collapse, it achieves a new SOTA across 8 benchmarks (e.g., 96.98% AUC on the Email dataset, over 12% higher than the runner-up).

EvA: Evolutionary Attacks on Graphs

Ours utilizes a carefully designed genetic algorithm to search for adversarial perturbations directly in the discrete edge-flip space, bypassing gradient relaxation and differentiable proxy losses. It achieves an average accuracy drop of ~11% more than the SOTA PRBCD in node classification attacks and implements the first graph structural attacks on conformal prediction and robustness certificates.

Exchangeability of GNN Representations with Applications to Graph Retrieval

This paper discovers and proves a novel probabilistic symmetry: node embeddings of standardly trained GNNs are exchangeable random variables along the dimensional axis. Leveraging this property, the authors approximate high-dimensional transportation similarity as one-dimensional sorted Euclidean similarity, designing GraphHash—the first unified LSH retrieval framework for asymmetric graph similarities.

Federated Graph-Level Clustering Network with Dual Knowledge Separation

FGCN-DKS decomposes each graph into a "cluster-oriented shared invariant subgraph" and a "client-private personalized subgraph." Only the pattern summaries of the invariant subgraphs are uploaded to the server, and personalized aggregation is performed by calculating inter-cluster affinity using graph kernels. This addresses the challenge of "consensus failure" in federated graph-level clustering caused by attempts to share all information.

FLOCK: A Knowledge Graph Foundation Model via Learning on Random Walks

FLOCK replaces the conventional message passing and deterministic equivariance constraints of Knowledge Graph Foundation Models (KGFMs) with a paradigm of "sampling random walks → anonymizing into sequences → encoding with sequence models → consensus pooling." By leveraging probabilistic node-relation equivariance, it maintains cross-graph generalization while breaking symmetry to distinguish "structurally isomorphic but semantically opposite" relations. As a universal approximator for link-invariant functions, it achieves SOTA performance across 54 KGs.

FlowSymm: Physics–Aware, Symmetry–Preserving Graph Attention for Network Flow Completion

This work decomposes the "missing flow completion" inverse problem into two phases: first, spanning a feasible solution subspace using a set of algebraic group actions that preserve node conservation and freeze observed edges; second, using graph attention to select correction directions within this physically valid basis. Finally, a differentiable Tikhonov convex solver is employed to absorb noise, thereby strictly maintaining physical conservation laws while completing missing flows.

Forest-Based Graph Learning for Semi-Supervised Node Classification

This work reinterprets message passing on graphs as "transmission across multiple spanning trees (forests)." By using homophily-guided sampling to select high-quality trees and a linear-time tree aggregator, the method achieves a global receptive field with \(O(n+m)\) complexity, outperforming both deep GNNs and Graph Transformers in semi-supervised node classification.

FS-KAN: Permutation Equivariant Kolmogorov-Arnold Networks via Function Sharing

This paper generalizes the classic "parameter sharing" scheme in equivariant networks to KANs, proposing FS-KAN which shares learnable univariate functions (rather than scalar weights) based on group actions. It unifies various existing equivariant KANs and proves that its expressive power is equivalent to parameter-sharing MLPs, thereby achieving significantly higher sample efficiency in low-data scenarios.

FSD-CAP: Fractional Subgraph Diffusion with Class-Aware Propagation for Graph Feature Imputation

To address graph node feature imputation under extreme sparsity (up to 99.5% missing rate), FSD-CAP utilizes a fractional diffusion operator with adjustable "sharpness" for locally adaptive propagation, suppresses error accumulation via progressive subgraph diffusion expanding outward from observed nodes, and injects semantic structure through class-aware propagation driven by pseudo-labels and neighborhood entropy. The imputed features allow GCNs to approach or even exceed the performance of models trained on complete features in node classification and link prediction tasks.

Full-Graph vs. Mini-Batch Training: Comprehensive Analysis from a Batch Size and Fan-Out Size Perspective

This paper treats GNN full-graph training as a special case of mini-batch training where both batch size and fan-out size are maximized. By analyzing convergence, generalization, and computational efficiency from the perspective of these two hyperparameters, it reaches a counter-intuitive conclusion: full-graph training is not always superior to carefully tuned small-batch mini-batch training.

G-Merging: Graph Models Merging for Parameter-Efficient Multi-Task Knowledge Consolidation

G-Merging targets multi-task graph learning scenarios by synthesizing multiple task models, fine-tuned from the same pre-trained GNN, into a shared backbone via task arithmetic. It then employs topology-aware alignment to train lightweight task adapters and utilizes a training-free MoE routing during inference to dynamically combine adapters, preserving multi-task knowledge with parameter overhead close to a single model.

GDGB: A Benchmark for Generative Dynamic Text-Attributed Graph Learning

To address the gap in "Generative Dynamic Text-Attributed Graph (DyTAG) Learning," the authors construct the GDGB benchmark with 8 high-quality text datasets, define two new generative tasks (TDGG and IDGG) with multi-dimensional evaluation protocols, and propose GAG-General, an LLM multi-agent framework, as a reproducible unified baseline.

Gelato: Graph Edit Distance via Autoregressive Neural Combinatorial Optimization

GELATO reformulates the approximate solution of Graph Edit Distance (GED) as an autoregressive decision-making process that incrementally constructs node matches. By using a GNN equipped with matching history to iteratively select the next source-target node pair, it achieves higher exact hit rates and faster inference speeds across multiple GED benchmarks.

Geometric Constraints for Small Language Models to Understand and Expand Scientific Taxonomies

By encoding the hierarchy transitivity constraint of "parent → query → child" into hyperbolic space and augmenting semantic context via a frozen LLM, a 110M DistilBERT (SS-MONO) outperforms frozen large models like GPT-4o mini and Gemma-2-9B, as well as domain-specific baselines, on scientific taxonomy expansion tasks.

Geometric Graph Neural Diffusion for Stable Molecular Dynamics Simulations

This paper introduces graph heat diffusion equations into geometric graph neural networks, utilizing "equivariant gradient operators + equivariant diffusion operators" to perform all-to-all node information flow on fully connected molecular graphs. Acting as a plug-and-play module, it captures geometric topological invariant features insensitive to conformational changes, thereby enabling machine learning force fields to run stable long-range MD simulations even on unseen conformations.

Glance for Context: Learning When to Leverage LLMs for Node-Aware GNN-LLM Fusion

For text-attributed graphs, this paper moves away from applying LLMs uniformly across all nodes. Instead, it employs a lightweight router to "glance" at the LLM only for "heterophilous/low-degree" nodes where GNNs typically fail. This non-differentiable routing decision is trained using counterfactual advantage signals, significantly reducing LLM calls while boosting accuracy on heterophilous nodes by up to +13%.

Global-Recent Semantic Reasoning on Dynamic Text-Attributed Graphs with Large Language Models

DyGRASP utilizes the implicit reasoning of LLMs to capture "recent semantic dependencies" and explicit reasoning to capture "global semantic evolution". By fusing these with temporal GNNs, it improves destination node retrieval Hit@10 by up to 34% on Dynamic Text-Attributed Graphs (DyTAG) while reducing LLM reasoning complexity from \(O(|E|\cdot d)\) to \(O(|E|)\).

Global and Local Topology-Aware Graph Generation via Dual Conditioning Diffusion

DualDiff decomposes graphs into node-level (local) and cluster-level (global) diffusion branches. By employing a "bidirectional conditioning" mechanism, global and local information alternately serve as conditions during the denoising process. This joint modeling of \(p(Z_l, Z_g)\) in a unified latent space significantly improves the generation quality of both general and molecular graphs.

gLSTM: Mitigating Over-Squashing by Increasing Storage Capacity

This paper disentangles GNN over-squashing into two independent failure modes: "sensitivity limitation" and "storage capacity saturation." It proposes the NAR synthetic task to measure capacity bottlenecks separately and introduces the gLSTM architecture, which incorporates associative memory and gating mechanisms from xLSTM into message passing to explicitly increase node storage capacity.

GNN-as-Judge: Unleashing the Power of LLMs for Graph Learning with GNN Feedback

GNNs with structural inductive bias act as "judges" to filter reliable pseudo-labels using agreement/disagreement signals between LLM and GNN predictions. A weakly-supervised algorithm combining "Instruction Tuning + Preference Tuning" distills pseudo-label knowledge into the LLM, significantly improving node classification performance on text-attributed graphs where annotations are extremely scarce.

GNN Explanations that do not Explain and How to find Them

This paper reveals a fatal failure mode of self-explaining Graph Neural Networks (SE-GNNs): models can output "degenerate explanations" that are completely unrelated to their actual reasoning process while maintaining optimal accuracy. It proves that most existing faithfulness metrics fail to identify such explanations. To address this, the authors construct a controllable benchmark and propose a new metric, EST, which reliably identifies these degenerate explanations as unfaithful.

GRAPHITE: Graph Homophily Booster — Reimagining the Role of Discrete Features in Heterophilic Graph Learning

Ours proposes GRAPHITE, a non-learning graph transformation method that directly boosts graph homophily by introducing "feature nodes" as hubs to indirectly connect nodes sharing common features. It addresses the heterophily problem by "modifying graph structure" rather than "modifying GNN architecture" for the first time, significantly outperforming 27 SOTA methods on difficult benchmarks such as Actor.

Graph Random Features for Scalable Gaussian Processes

Sparse and unbiased graph node kernel estimates are constructed using random walk-based Graph Random Features (GRF), reducing Bayesian inference for Gaussian processes on graphs from \(\mathcal{O}(N^3)\) to \(\mathcal{O}(N^{3/2})\) with probabilistic accuracy guarantees, enabling Bayesian optimization on graphs exceeding 1 million nodes on a single GPU.

Graph Representational Learning: When Does More Expressivity Hurt Generalization?

This paper proposes a family of pseudometrics \(\zeta\)-TMD parameterized by graph invariants and derives a data-dependent generalization bound based on "train-test graph structural similarity + model complexity + training set size." It theoretically explains "when higher expressivity GNNs generalize worse"—only when the added expressivity aligns with the structure-label correlation of the task is it beneficial; otherwise, it merely increases complexity and degrades generalization.

Graph Signal Processing Meets Mamba2: Adaptive Filter Bank via Delta Modulation

This paper reinterprets the multi-head recursion of Mamba2 as a graph filter bank on a line graph. It proposes HADES, a hierarchical structure of "shared low-pass filters + expert high-pass filters" via spectral residual-based delta modulation, achieving or exceeding Mamba2 performance with only 58.9% of the parameters.

Graph Tokenization for Bridging Graphs and Transformers

The GraphTokenizer framework is proposed to convert graphs into symbolic sequences through reversible frequency-guided serialization, followed by BPE to learn a graph substructure vocabulary. This allows standard Transformers (e.g., BERT/GTE) to process graph data directly without architectural modifications, achieving SOTA results across 14 benchmarks.

Graphon Cross-Validation: Assessing Models on Network Data

To address the challenge where traditional cross-validation (CV) fails due to the non-independence of edges in network data, this paper proposes CV-imputation. By treating edges in the validation set as missing values and filling them with Bernoulli random variables at a fixed probability to construct the training graph, the method uses an affine transformation to recover the probability matrix. This allows for hyperparameter tuning and model selection for graphon estimation methods while preserving the graph topology, theoretically ensuring that the CV score is asymptotically parallel to the true estimation error.

GraphUniverse: Synthetic Graph Generation for Evaluating Inductive Generalization

The authors propose GraphUniverse, a framework for generating graph families with persistent semantic communities across a global universe. This allows for the first systematic evaluation of inductive generalization capability in graph learning models, revealing the critical finding that transductive performance does not reliably predict inductive generalization capacity.

HarmonyGNNs: Harmonizing Heterophily and Homophily in GNNs via Self-Supervised Node Encoding

HarmonyGNNs achieves goal harmony via "Teacher-Student Prediction SSL (JEPA-style) + Node Difficulty-Driven Dynamic Masking" and representation harmony via "Linear/MLP Projection + Weighted GCN + Feature-level Self-Attention + Hierarchical Fusion," allowing a single unlabeled framework to achieve SOTA on both homophilic and heterophilic graphs simultaneously.

HGNet: Scalable Foundation Model for Automated Knowledge Graph Generation from Scientific Literature

A two-stage framework with ~300M parameters is proposed: Z-NERD utilizes "Orthogonal Semantic Decomposition + Multi-scale TCQK Attention" for domain-agnostic multi-word entity recognition, while HGNet employs "Parent/Child/Peer three-channel message passing + Differentiable Hierarchy Loss + Continuous Abstraction Field Loss" to constrain relation extraction into a logically consistent and geometrically ordered Directed Acyclic Graph (DAG), achieving new SOTA results on SciERC, SciER, and SPHERE.

HYPER: A Foundation Model for Inductive Link Prediction with Knowledge Hypergraphs

HYPER is the first foundation model for link prediction on knowledge hypergraphs. By encoding "positional interactions between relations" into transferable base relations, the model achieves zero-shot generalization to hypergraphs containing entirely new entities, new relations, and arbitrary arities.

Improving Long-Range Interactions in Graph Neural Simulators via Hamiltonian Dynamics

Information-preserving Graph Neural Simulators (IGNS) are proposed to maintain non-dissipative information flow on graphs using port-Hamiltonian dynamics. Combining warmup initialization, geometric encoding, and multi-step training objectives, IGNS consistently outperforms existing graph neural simulators across six physical simulation benchmarks.

Inductive Reasoning for Temporal Knowledge Graphs with Emerging Entities

To address emerging entities with "no historical interactions" in Temporal Knowledge Graphs (TKG), TransFIR utilizes a BERT-based text embedding combined with a learnable VQ codebook to assign entities to semantic clusters. It then transfers interaction chain patterns from semantically similar known entities to avoid representation collapse, achieving an average MRR improvement of 28.6% across four benchmarks.

Is Graph Unlearning Ready for Practice? A Benchmark on Efficiency, Utility, and Forgetting

This paper constructs the first systematic benchmark for graph unlearning, evaluating 10 categories of mainstream methods across 7 datasets based on three dimensions: efficiency, utility, and forgetting quality. The study reaches a dispiriting yet pragmatic conclusion: on large-scale graphs, most unlearning methods are neither faster than retraining from scratch nor thorough in forgetting. Retraining remains the most reliable option at present.

Knowledge Reasoning Language Model: Unifying Knowledge and Language for Inductive Knowledge Graph Reasoning

KRLM unifies Knowledge Graph (KG) structural representations and LLM internal knowledge into a "Knowledge Reasoning Language" (KRL). Through a KRL tokenizer, a KRL attention layer with knowledge memory, and a structure-aware next-entity predictor, it suppresses "knowledge distortion" caused by sparse KG contexts and out-of-scope hallucinations in inductive KGR tasks.

Latent Geometry-Driven Network Automata for Complex Network Dismantling

This paper proposes the LGD-NA framework, which utilizes "network cellular automata rules" based solely on local topology to approximate latent geometric distances. By using node geometric centrality as the dismantling priority, the method outperforms all existing dismantling algorithms (except the global NBC) across 1,475 real-world networks. It supports GPU acceleration and can be conversely applied to enhance network robustness.

LEAP: Local ECT-Based Learnable Positional Encodings for Graphs

LEAP transforms the "Local Euler Characteristic Transform" (ℓ-ECT) into an end-to-end trainable local structural positional encoding: it computes a differentiable ECT matrix for the m-hop neighborhood of each node, which is then compressed into low-dimensional vectors via learnable projections to be integrated into GNNs. This approach simultaneously encodes geometric and topological information with a complexity on the same order as a single message-passing step.

Learning Concept Bottleneck Models from Mechanistic Explanations

This paper proposes Mechanistic CBM (M-CBM), which extracts concepts from the features learned by the black-box model itself using Sparse Autoencoders (SAEs). These concepts are then named and labeled by a Multimodal LLM to construct an interpretable Concept Bottleneck Model. M-CBM significantly outperforms existing CBM methods while effectively controlling information leakage.

Learning from Historical Activations in Graph Neural Networks

The authors propose HISTOGRAPH, a two-stage attention readout layer that pools the "historical activations" of GNN layers (rather than just the final layer) as a trajectory sequence. By applying inter-layer attention followed by inter-node attention, it significantly mitigates over-smoothing and improves graph classification performance in deep GNNs.

Learning Posterior Predictive Distributions for Node Classification from Synthetic Graph Priors

By migrating the Prior-Fitted Network (PFN) paradigm from tabular data to graphs, the authors pre-train NodePFN on thousands of synthetic graphs generated from controllable priors. This allows for training-free, single-forward-pass general node classification on arbitrary real-world graphs, achieving a 71.27% average accuracy across 23 benchmarks.

Learning with Dual-level Noisy Correspondence for Multi-modal Entity Alignment

Addressing the prevalent "Dual-level Noisy Correspondence" (entity-attribute level + cross-graph level) in multi-modal entity alignment, this paper proposes the RULE framework. It estimates the reliability of each correspondence using "uncertainty + consensus" criteria to suppress noise during attribute fusion and cross-graph alignment. Additionally, it leverages MLLM reasoning during test-time to uncover implicit attribute associations, achieving an average H@1 improvement of over 5 points across five benchmarks compared to the runner-up.

LogicXGNN: Grounded Logical Rules for Explaining Graph Neural Networks

LogicXGNN proposes a post-hoc framework for extracting interpretable first-order logic rules from trained Graph Neural Networks (GNNs). By identifying predicates via graph structural hashing and hidden layer embedding patterns, determining discriminative DNF rule structures with decision trees, and grounding abstract predicates back to the input space, it generates a rule-based classifier that can substitute the original GNN and serve as a controllable graph generative model.

Low-Rank Few-Shot Node Classification by Node-Level Graph Diffusion

This work utilizes a node-level graph diffusion model, FGDM, to synthesize "realistic" support set nodes and their edges for augmenting few-shot tasks. It further incorporates a low-rank transductive classifier—inspired by the Low-Frequency Property (LFP) and backed by generalization bounds—to resist diffusion noise, achieving SOTA performance in few-shot node classification.

LRIM: a Physics-Based Benchmark for Provably Evaluating Long-Range Capabilities in Graph Learning

The authors construct a provable and controllable long-range graph learning benchmark using the well-studied long-range Ising model from statistical physics (consisting of 10 datasets, 256 to 65k nodes). The task involves predicting the energy change \(\Delta E\) for each spin flip, where the ground truth mathematically necessitates interactions with distant nodes, providing a reliable metric for evaluating long-range modeling gains.

MobileKGQA: On-Device KGQA System on Dynamic Mobile Environments

MobileKGQA compresses high-dimensional LLM embeddings into binary hash codes for a GNN reasoning module and pairs it with a step-by-step automatic label generation method. This allows a Knowledge Graph Question Answering (KGQA) system to train directly on mobile/edge devices and adapt to accumulating user data for the first time, achieving a 20.3% performance improvement with only 30.4% energy consumption on Jetson Orin Nano.

Multi-Domain Riemannian Graph Gluing for Building Graph Foundation Models

This paper reconstructs multi-domain graph pre-training from the perspective of "manifold gluing" in differential geometry: it fuses arbitrary graph datasets onto a unified and smooth Riemannian manifold. This provides the first rigorous theoretical characterization of "how knowledge is integrated and transferred across domains" and leads to the GRAPHGLUE framework, which features quantifiable transfer difficulty and geometric scaling laws.

Multi-Scale Diffusion-Guided Graph Learning with Power-Smoothing Random Walk Contrast for Multi-View Clustering

The proposed MANGO framework uses "entropy-guided multi-scale graph diffusion" to dynamically fuse similarity matrices of different step lengths, balancing local and global structures. It further employs "random walk + \(\beta\) power-smoothing" to correct false negatives in contrastive learning and mitigates the contradiction between consistency and specificity through a shared structural embedding module, achieving new SOTA results across 12 datasets.

Neural Graduated Assignment for Maximum Common Edge Subgraphs

This paper reformulates the NP-complete Maximum Common Edge Subgraph (MCES) problem as a Quadratic Assignment Problem (QAP) on an Associated Common Graph. It introduces a "Neural Graduated Assignment" network with fully learnable high-dimensional temperature parameters to approximate the optimal solution in polynomial time without supervision, outperforming traditional search solvers in both speed and accuracy.

On The Expressive Power of GNN Derivatives

This paper discovers that feeding high-order derivatives of a base MPNN with respect to input node features as additional structural features into a downstream GNN strictly enhances expressivity. The proposed HOD-GNN is theoretically aligned with the WL hierarchy and shown to be depth-equivalent to subgraph GNNs and Random Walk Structural Encodings. By utilizing a differentiable message-passing-style derivative computation algorithm that exploits graph sparsity, the model consistently ranks in the top two across seven to eight graph benchmarks and scales to large graphs where traditional subgraph GNNs struggle.

On the Expressive Power of GNNs for Boolean Satisfiability

This work strictly proves from the perspective of the Weisfeiler-Leman (WL) test that the complete WL hierarchy cannot distinguish between satisfiable and unsatisfiable 3-SAT instances. It reveals the theoretical limits of GNN expressivity for SAT solving while identifying families of positive instances, such as planar SAT and random SAT, that GNNs can successfully distinguish.

On the Trade-off Between Expressivity and Privacy in Graph Representation Learning

This paper theoretically characterizes the fundamental tension between "expressivity of graph representations" and "edge-level differential privacy" for the first time. It proposes using noisy homomorphism density vectors as graph embeddings: these maintain full discriminative power in expectation while injecting calibrated noise based on the smooth sensitivity of each density to satisfy formal differential privacy guarantees. It further proves an explicit trade-off where more expressive pattern classes require higher noise levels.

On the Universality and Complexity of GNN for Solving Second-order Cone Programs

Ours designs a graph representation that decomposes nonlinear cone constraints into four types of nodes and a supporting three-sublayer message-passing GNN for Second-Order Cone Programming (SOCP). It proves universal approximation capability for SOCP feasibility and optimal solutions, and provides the first Rademacher-based sample complexity bound for WL-type L2O-GNNs. Experiments achieve higher prediction accuracy with significantly fewer parameters than fully connected networks (e.g., 0.35Mb vs. 110Mb on a 500-dimensional problem, ~300× compression).

One for Two: A Unified Framework for Imbalanced Graph Classification via Dynamic Balanced Prototype

UniImb employs a unified framework of "Dynamic Balanced Prototypes + Load Balancing Regularization" to simultaneously address class imbalance (too few samples for minority classes) and topological imbalance (small graphs being overwhelmed by large graphs) in graph classification. It achieves comprehensive leads across 19 datasets compared to 23 baselines.

Out-of-Distribution Graph Models Merging

This paper proposes OGMM to investigate the novel problem of "Out-of-Distribution Graph Model Merging." Without access to any source or target domain data and assuming potentially heterogeneous GNN architectures, each pre-trained GNN first inverts a small batch of labeled synthetic graphs. Subsequently, a sparse MoE with masked experts fine-tunes and merges these models into a unified model capable of generalizing to unseen distributions.

Pairwise is Not Enough: Hypergraph Neural Networks for Multi-Agent Pathfinding

The authors propose HMAGAT, which replaces the pairwise message passing of GNNs with a directed hypergraph attention network to model group interactions in multi-agent pathfinding. It outperforms SOTA models with 85M parameters using only 1M parameters and 1% of the training data.

PRISM: Partial-label Relational Inference with Spatial and Spectral Cues

PRISM addresses the Partial-label Graph Learning (PLGL) problem, where each graph is assigned a candidate label set containing the ground truth. By extracting spatial cues through prototype-guided substructure alignment and spectral cues via multi-band spectral attention, the model constructs a hybrid relationship graph. It then performs iterative label propagation under candidate constraints to effectively disambiguate labels, significantly outperforming existing weakly supervised graph classification methods across various noise levels.

RAS: Retrieval-And-Structuring for Knowledge-Intensive LLM Generation

Ours proposes the RAS framework, which dynamically constructs query-specific knowledge graphs at inference time. Through a three-stage process of iterative retrieval planning, text-to-triple transformation, and graph-augmented answering, RAS achieves structured reasoning. It delivers improvements of up to 7.0% and 8.7% for open-source and closed-source LLMs, respectively, across 7 knowledge-intensive benchmarks.

ReLaSH: Reconstructing Joint Latent Spaces for Efficient Generation of Synthetic Hypergraphs with Hyperlink Attributes

ReLaSH decomposes the generation of "attributed hypergraphs" into two steps: first, an interpretable likelihood-based embedding model compresses hyperedges and their attributes into a low-dimensional joint latent space; second, a distribution-free score-based diffusion generator reconstructs the data distribution within this low-dimensional space. This approach bypasses the curse of dimensionality associated with high-dimensional discrete structures and significantly outperforms general baselines such as VAE, GAN, and Diffusion on medical records, co-authorship, and recipe datasets.

Relational Graph Transformer

RelGT is proposed as the first Graph Transformer specifically designed for relational databases. By utilizing multi-element tokenization (a 5-tuple of features/type/hop/time/local structure) and a hybrid local-global attention mechanism, it consistently outperforms GNN baselines across 21 tasks in the RelBench benchmark, achieving gains of up to 18%.

Relatron: Automating Relational Machine Learning over Relational Databases

Ours systematically compares the performance of Relational Deep Learning (RDL/GNN) and Deep Feature Synthesis (DFS) on relational database (RDB) prediction tasks. Finding that both have distinct advantages and are highly task-dependent, the authors propose Relatron—a meta-selector based on task embeddings that achieves automatic architecture selection through RDB task homophily and affinity embeddings, yielding gains up to 18.5% in joint architecture-hyperparameter search.

Rethinking the Gold Standard: Why Discrete Curvature Fails to Fully Capture Over-squashing in GNNs?

This paper systematically refutes the "high negative curvature = over-squashing" gold standard in graph learning. By constructing a counterexample graph family, it proves that high negative curvature is a sufficient but not necessary condition for over-squashing. The authors propose the MOSR metric to quantify that curvature misses 30%–40% of squashed edges and introduce a new weighted curvature, WAF3, along with a linear-time MinHash approximation algorithm (23.6 seconds for a graph with 5 million edges, 133.7x faster than existing methods).

Revisiting Node Affinity Prediction in Temporal Graphs

This paper identifies that existing Temporal Graph Neural Networks (TGNNs) surprisingly underperform against simple heuristics like Moving Average in "node affinity prediction." The root cause is their inability to express moving averages and their use of cross-entropy loss, which is ill-suited for ranking. Consequently, the authors propose NAVIS—a learnable linear State Space Model (SSM) that generalizes heuristics as special cases. Combined with a ranking loss, NAVIS outperforms both heuristics and all existing TGNNs on the Temporal Graph Benchmark (TGB).

Robustness in Text-Attributed Graph Learning: Insights, Trade-offs, and New Defenses

This paper provides the first horizontal comparison of classic GNNs, Robust GNNs (RGNNs), and GraphLLMs within a unified adversarial robustness evaluation framework for Text-Attributed Graphs (TAGs). It reveals a "text-structure trade-off"—where models can defend against either text or structural attacks but rarely both—and proposes SFT-auto, a defense framework that leverages LLM reasoning to integrate "attack detection + recovery + prediction" into a single model, achieving balanced and superior robustness across both attack types.

SAGA: Structural Aggregation Guided Alignment with Dynamic View and Neighborhood Order Selection for Multiview Graph Domain Adaptation

SAGA addresses unsupervised graph domain adaptation on multi-relational graphs by proposing Structural Aggregation Distance to dynamically select the most transferable combination of views and neighborhood orders during training. This combination guides cross-view and cross-domain alignment, significantly outperforming existing GDA methods on ACM and MAG multi-view graph node classification tasks.

Scaling Knowledge Graph Construction through Synthetic Data Generation and Distillation

Addressing the dilemma of "large models being expensive and small models being poor" in document-level Knowledge Graph (KG) construction, this paper proposes a multi-step synthetic pipeline, SynthKG (chunking → decontextualization → entity/proposition/triple extraction), to generate 100,000 high-quality document-KG training pairs. This multi-step process is distilled into an 8B small model, Distill-SynthKG, enabling single-step inference to produce KGs comparable to models eight times its size, while outperforming GraphRAG and HippoRAG in retrieval and multi-hop QA tasks.

Self-Consistency Improves the Trustworthiness of Self-Interpretable GNNs

Self-interpretable GNNs (SI-GNNs) optimize cross-entropy and sparsity during training but are evaluated on faithfulness, creating a training-evaluation misalignment. This paper posits that faithfulness is essentially equivalent to "explanation self-consistency." By introducing a self-consistency (SC) loss that aligns the original explanation with a secondary explanation generated after feeding the first back into the model, a model-agnostic fine-tuning approach is proposed to simultaneously improve explanation quality across consistency, accuracy, faithfulness, and informativeness.

Sheaves Reloaded: A Directional Awakening

This paper proposes Directed Cellular Sheaves, which encode edge directions into phases using complex-valued, direction-aware restriction maps. This construction forms a Hermitian Directed Sheaf Laplacian \(L_{\tilde F}\), leading to DSNN—the first Sheaf Neural Network to embed directional inductive biases into its architecture. It achieves SOTA results on 10 out of 12 node classification benchmarks.

Si-GT: Fast Interconnect Signal Integrity Analysis for Integrated Circuit Design via Graph Transformers

Si-GT models chip interconnects as coupled RC circuit graphs. It utilizes a graph Transformer customized for crosstalk effects (mesh structural encoding + virtual NET tokens + intra/inter-net attention bias) to directly predict crosstalk delay and glitches. The accuracy surpasses existing GNNs and graph Transformers, with an inference time of only 4ms, which is two orders of magnitude faster than SPICE simulation.

\(_k\)>: One LLM Token for Explicit Graph Structural Understanding

This paper compresses the topology of an entire graph or target node into a single discrete structural token <SOG$_k$> that coexists with the native vocabulary of the LLM. By aligning this token with text tokens through structural QA, the method significantly enhances the graph structural understanding of LLMs in molecular graph and node classification tasks with minimal token overhead.

Structurally Human, Semantically Biased: Detecting LLM-Generated References with Embeddings and GNNs

By constructing paired citation graphs for 10,000 papers (Human vs. GPT-4o Generated vs. Random Baseline), it is found that LLM-generated references are nearly indistinguishable from human ones in terms of graph topology (RF achieves only 60% accuracy). However, they can be effectively detected using semantic embeddings (RF 83%, GNN 93%), indicating that LLMs precisely mimic citation topology while leaving detectable semantic fingerprints.

Structure-Aware Graph Hypernetworks for Neural Program Synthesis

This paper recasts "program synthesis" as continuous optimization within the weight space of a fixed network architecture and proposes Meta-GNN, a structure-aware graph hypernetwork. By representing the target network as a "neural graph" (weights as edges, biases as nodes) and tying encoding/message/decoding parameters within permutation equivalence groups, it collapses redundant supervision caused by neuron permutations. This enables direct one-shot generation of entire weight sets from user intent and significantly improves OOD generalization to unseen intents.

Temporal Graph Thumbnail: Robust Representation Learning with Global Evolutionary Skeleton

TGT distills an entire temporal graph sequence into a static "thumbnail" (global evolutionary skeleton). It characterizes structural evolution via von Neumann graph entropy and feature evolution via the Donsker-Varadhan mutual information estimator. This thumbnail then serves as an Information Bottleneck (IB) constraint to guide representation learning. On Bitcoin, MathOverflow, and MOOC datasets, it significantly outperforms SOTA in link prediction under both clean and various noisy perturbations, particularly in fast-evolving and heavy-noise scenarios.

TGM: A Modular and Efficient Library for Machine Learning on Temporal Graphs

TGM is the first temporal graph learning research framework to unify Continuous-Time Dynamic Graphs (CTDG) and Discrete-Time Dynamic Graphs (DTDG) under the same data abstraction. By using "event streams + time granularity iteration" to unify both paradigms and a composable Hook mechanism to standardize data transformations, it achieves an average end-to-end training speedup of 7.8× over the widely used library DyGLib, with graph discretization being 175× faster on average.

TopoFormer: Topology Meets Attention for Graph Learning

TopoFormer segments a graph into a sequence of local topological slices based on node or edge filtration functions. It constructs a short token sequence using Betti numbers and scale statistics from each slice, which is then processed by a Transformer to learn graph-level representations. This approach achieves performance comparable to or exceeding strong baselines in graph classification and molecular property prediction at a lower computational cost.

Topological Anomaly Quantification for Semi-Supervised Graph Anomaly Detection

Aiming at semi-supervised graph anomaly detection with "only normal node labels," TAQ-GAD quantifies the "anomaly degree" of each labeled normal node using two pure topological indicators (Boundary Score NBS + Isolation Score PIS). It filters high-quality pseudo-anomaly nodes and utilizes a Topological Anomaly Enhancement (TAE) module to generate virtual anomaly centers and reconnect graph structures. The model is jointly trained on the augmented graph, consistently outperforming SOTAs like GGAD across 6 datasets.

Topological Flow Matching

By reinterpreting flow matching as a "degenerate Schrödinger bridge in the zero-noise limit" and augmenting its reference process with a heat diffusion drift derived from the Hodge Laplacian, the authors propose Topological Flow Matching (TFM). TFM is a topology-aware generative framework that retains simulation-free training objectives and deterministic sampling paths, serving as a plug-and-play replacement for standard flow matching. It significantly outperforms flow matching and topological Schrödinger bridges on structured signals such as brain fMRI, ocean currents, earthquakes, and traffic.

Topology Matters in RTL Circuit Representation Learning

Addressing the issue where existing RTL representation learning treats Verilog as ordinary code and ignores hardware topology, TopoRTL decomposes circuits into register cones and constructs a "Graph + Text Summary" dual-modality. By injecting three topology-aware positional encodings into the attention mechanism and employing topology-guided cross-modal alignment, it surpasses 7B-parameter Large Language Models in PPA prediction and circuit retrieval tasks with only 29M parameters.

Towards a Foundation Model for Crowdsourced Label Aggregation

CrowdFM upgrades the task of "inferring ground truth from noisy crowdsourced labels" from "estimating parameters per individual dataset" to "a single pre-trained bipartite graph neural network that handles everything zero-shot." By pre-training an attention-based GNN that explicitly models workers, tasks, and options on domain-randomized synthetic crowdsourced data, it matches or even surpasses dataset-specific customized methods across 22 real-world datasets without any retraining, with an inference time of only 0.53 seconds per dataset.

Towards Improved Sentence Representations using Token Graphs

The authors propose Glot, a lightweight structure-aware pooling module that constructs latent similarity graphs from the token-level hidden states of frozen LLMs. These are refined via GNNs and aggregated into sentence representations, achieving performance competitive with fine-tuning on GLUE/MTEB while requiring 20× fewer parameters and 100× faster training.

Towards Quantifying Long-Range Interactions in Graph Machine Learning: A Large Graph Dataset and a Measurement

This paper constructs City-Networks, a million-node scale, large-diameter transductive dataset based on the road networks of four real-world cities. It uses "local eccentricity" to label node classification tasks that naturally require long-range information. Building on this, it proposes a Jacobian-based "hop-wise influence" metric to directly quantify the distance of the neighbor information utilized by any GNN/GT. This replaces the traditional indirect argument of "global attention vs. local aggregation" performance gaps with a measurable, comparable, and theoretically grounded solution.

Training-Free Counterfactual Explanation for Temporal Graph Model Inference

TemGX is a training-free, model-agnostic, and queryable explanation framework for Temporal Graph Neural Networks (TGNNs). It formalizes the question "which historical subgraph led to the current prediction" as a sliding-window counterfactual analysis. Using an explainability score that integrates "cascade influence + temporal effective resistance + temporal decay," it identifies nodes that truly impact decisions. The framework efficiently generates explanations via a greedy "Select-Verify" algorithm with a \((1-1/e)\) approximation guarantee, significantly outperforming existing TGNN explainers in fidelity and speed.

UrbanGraph: Physics-Informed Spatio-Temporal Dynamic Heterogeneous Graphs for Urban Microclimate Prediction

UrbanGraph encodes known physical laws—such as solar shading, vegetation evapotranspiration, and convective diffusion—directly into the graph topology. By reconstructing a sparse dynamic heterogeneous graph hourly based on physical equations and utilizing an RGCN+LSTM architecture to decouple spatial and temporal features, it achieves SOTA performance on urban microclimate prediction (\(R^2=0.8542\)). Compared to the implicit dynamic graph baseline LRGCN, it reduces FLOPs by 73.8% and accelerates training by 21%.

WATS: Wavelet-Aware Temperature Scaling for Reliable Graph Neural Networks

WATS is a post-hoc calibration framework for node classification. It predicts a personalized temperature for each node using heat-kernel graph wavelet features with adjustable scales to scale logits. Without retraining the model or relying on neighbor logits, WATS aligns GNN confidence with true accuracy, reducing ECE by up to 41.2% across 9 datasets.