Skip to content

SPHINX: Structural Prediction using Hypergraph Inference Network

Conference: ICML2025
arXiv: 2410.03208
Code: To be confirmed
Area: Structure Learning / Hypergraph Inference
Keywords: Hypergraph Inference, Higher-Order Relations, k-subset Sampling, Slot Attention, Trajectory Prediction

TL;DR

This paper proposes SPHINX, an unsupervised hypergraph inference model that frames hyperedge discovery as a sequential soft clustering problem. By employing differentiable k-subset sampling, SPHINX generates discrete, sparse hypergraph structures that can be seamlessly integrated into any hypergraph neural network. SPHINX achieves a 90% overlap rate in hypergraph reconstruction on synthetic data and outperforms existing methods in NBA trajectory prediction and 3D object classification.

Background & Motivation

Background: Graph Neural Networks (GNNs) are widely used for relational data modeling but are limited to representing pairwise relationships. Many real-world systems (in neuroscience, chemistry, biology, and team sports) naturally exhibit multi-entity, high-order interactions, which necessitate hypergraph representations. However, hypergraph datasets are extremely scarce—current data acquisition techniques often only detect pairwise relationships, or the collected high-order data is simplified into pairwise formats when released.

Limitations of Prior Work: Hypergraph inference is significantly more challenging than graph inference, as the number of candidate edges explodes from \(O(N^2)\) to \(O(2^N)\). Existing methods suffer from multiple drawbacks: (1) most of them only work in the transductive setting (optimizing a single fixed hypergraph) and cannot generalize to inductive tasks; (2) using Gumbel-Softmax sampling prevents control over the hyperedge size, leading to unstable optimization and the need for extra regularization; (3) top-k selection only propagates partial gradients, leading to insufficient optimization.

Key Challenge: Hypergraph inference must simultaneously satisfy three conflicting requirements: end-to-end differentiability (requiring continuous relaxation), discrete and sparse output (as hypergraph networks require discrete incidence matrices), and controllable sparsity (to obtain reasonable hyperedge sizes without auxiliary regularization).

Key Insight: This paper reformulates hyperedge discovery as a clustering problem—where each hyperedge corresponds to a cluster of node subsets—and leverages recent breakthroughs in differentiable k-subset sampling to generate discrete hyperedges of exactly k nodes while preserving differentiability.

Core Idea: Using sequential slot attention for soft clustering to generate hyperedge probability distributions, followed by differentiable k-subset sampling to produce discrete, sparse hypergraphs. The entire framework is trained end-to-end in a weakly supervised manner using only downstream task losses.

Method

Overall Architecture

SPHINX takes a node feature matrix \(\mathbf{X} \in \mathbb{R}^{N \times f}\) and outputs a discrete hypergraph \(\mathcal{H} = (V, E)\) alongside node predictions \(\hat{y}_i\). The pipeline consists of three phases: (A) a hypergraph predictor converts node features into a probabilistic incidence matrix \(\mathbf{P} \in \mathbb{R}^{N \times M}\) via sequential clustering; (B) a differentiable k-subset sampler transforms the probabilistic incidence into a discrete binary incidence matrix; and (C) an arbitrary hypergraph neural network performs message passing over the inferred hypergraph to generate final predictions.

Key Designs

  1. Sequential Slot Attention Hyperedge Predictor:

    • Function: Converts node features into a probabilistic membership distribution over \(M\) hyperedges, while escaping the ambiguity of multiple slots converging to the same hyperedge.
    • Mechanism: Drawing inspiration from the Slot Attention algorithm used for unsupervised object discovery in computer vision, this module creates \(M\) randomly initialized slot vectors \(s_j \in \mathbb{R}^f\). Slots are iteratively updated by weighted aggregation based on learnable dot-product similarity with node features: \(s_j^{q+1} = \sum_i \sigma(f_1(s_j^q)^T f_2(x_i)) f_3(x_i)\). The slot representations mature after \(Q\) iterations. The key improvement is sequential inference: rather than inferring \(M\) hyperedges simultaneously (which causes slots to compete and collapse into duplicate hyperedges), they are inferred one by one. When inferring the \(j\)-th hyperedge, each node feature is concatenated with a binary vector \(b_i \in \{0,1\}^{M-1}\) indicating the node's membership in previously inferred hyperedges. This informs subsequent slots about what has already been discovered, promoting more diverse hyperedges.
    • Design Motivation: In standard Slot Attention, slots are symmetric, causing multiple slots to focus on the same prominent hyperedge. Sequential inference combined with historical membership information breaks this symmetry.
  2. Differentiable k-subset Discrete Sampling:

    • Function: Converts the continuous probability distribution \(p_{:,j}\) of each hyperedge into a discrete subset containing exactly \(k\) nodes.
    • Mechanism: Unlike Gumbel-Softmax (which samples each node-hyperedge incidence independently and cannot control hyperedge size) or top-k selection (which allows size control but is not fully differentiable), the k-subset sampler guarantees outputs containing exactly \(k\) elements while providing gradient estimates for backpropagation. The concrete implementation utilizes SIMPLE or AIMLE gradient estimators. The parameter \(k\) is set as a hyperparameter, allowing different hyperedges to have different \(k\) values.
    • Design Motivation: Unconstrained sampling in the early stages of training can produce hyperedges that are too large or too small, severely disrupting the optimization process. The exact cardinality constraint of k-subset sampling provides a stable optimization environment, eliminating the need for sparsity regularization and other training tricks required by Gumbel-Softmax methods.
  3. Plug-and-play Hypergraph Processing Compatibility:

    • Function: Ensures that the inferred hypergraph can be fed into any existing hypergraph network for downstream tasks.
    • Mechanism: Since the output is a standard discrete incidence matrix \(\mathcal{H} = (V, E)\)—which matches the format expected by any hypergraph network—SPHINX can be seamlessly coupled with various hypergraph processing architectures, such as AllDeepSets, HGNN, and HCHA. The experiments mostly use a two-stage message-passing mechanism: node-to-hyperedge aggregation \(z_j = f_{V\to E}(\{x_i | v_i \in e_j\})\) and hyperedge-to-node distribution \(x_i = f_{E\to V}(\{z_j | v_i \in e_j\})\).
    • Design Motivation: Hypergraph inference should serve as a general-purpose front-end module that is decoupled from specific downstream architectures. The discrete output (rather than soft-weighted incidences) ensures compatibility with any standard hypergraph network.

Loss & Training

The entire model is trained end-to-end, relying solely on downstream task losses (such as MSE for trajectory prediction or Cross-Entropy for classification) without any supervision over the hypergraph structure. The hypergraph inference module receives gradient signals via gradient estimators for the k-subset sampler. The model uses the Adam optimizer. It is trained for 1000 epochs on the particle simulation task and 300 epochs on the NBA dataset, with an initial learning rate of 0.001 that decays by \(0.1\times\) upon reaching a plateau.

Key Experimental Results

NBA Trajectory Prediction: Inductive Hypergraph Predictor Comparison (ADE/FDE)

Method 1sec 2sec 3sec 4sec
SPHINX w/o sequential 0.62/0.94 1.18/2.08 1.73/3.14 2.25/4.08
SPHINX w_Gumbel 1.29/1.81 2.10/3.34 2.81/4.60 3.45/5.66
TDHNN 0.68/1.03 1.27/2.22 1.84/3.31 2.30/4.29
GroupNet 0.65/1.03 1.38/2.61 2.15/4.11 2.83/5.15
SPHINX 0.59/0.92 1.12/2.06 1.65/3.13 2.14/4.09

Transductive Classification (NTU/ModelNet40, Accuracy %)

Model NTU GVCNN NTU GV+MV ModelNet GVCNN ModelNet GV+MV
HGNN (static) 82.50±1.62 83.64±0.37 91.80±1.73 96.96±1.43
DHGNN (dynamic) 82.30±0.98 85.13±0.26 92.13±1.55 96.99±1.46
TDHNN* (dynamic) 92.70±1.61 92.70±1.26 96.96±0.34 98.69±0.30
SPHINX 94.80±0.90 94.67±0.71 97.29±0.29 98.92±0.21

Key Findings

  • Hypergraph inference is highly aligned with ground-truth structures: In synthetic particle simulations, hypergraphs inferred by SPHINX in an unsupervised manner achieve a 90% overlap with the ground-truth high-order connections (vs. 67% for TDHNN), despite the model never being explicitly optimized for hypergraph reconstruction.
  • Sequential prediction is critical: In the non-sequential variant, multiple slots lock onto the same hyperedge (which is clearly illustrated in the experimental visualization). The sequential version avoids this issue by leveraging historical membership information.
  • Massive gap between k-subset and Gumbel-Softmax: The Gumbel-Softmax variant yields an ADE of up to 3.45 on NBA 4sec (vs. 2.14 for SPHINX), demonstrating that unconstrained sampling leads to unstable optimization.
  • Inferred hypergraphs steadily improve throughout training (Fig 2c) and remain independent of the downstream hypergraph processing architecture (AllDeepSets/HGNN/HCHA)—validating the general plug-and-play capability.
  • Qualitative analysis of TDHNN reveals that all of its hyperedges eventually contain all nodes—inherently learning attention weights rather than a true sparse structure.

Highlights & Insights

  • Reformulating hyperedge discovery as sequential clustering is an intuitive yet highly practical insight. Migrating Slot Attention from the computer vision domain to the hypergraph domain is remarkably natural—where "objects" correspond to "hyperedges" and "pixels" to "nodes". This cross-domain migration strategy is highly inspirational.
  • The decisive impact of k-subset sampling on the quality of hypergraph inference is impressive. A seemingly pure engineering choice of sampling strategy actually determines the success of structural inference. This highlights the vital role of discretization strategies in structure learning.
  • Achieving a 90% structural reconstruction purely through unsupervised inference suggests that downstream task losses already embody sufficient high-order structural information, bypassing the need for explicit structural supervision.
  • The qualitative comparison with TDHNN is particularly compelling: TDHNN's hyperedges degenerate into fully connected cliques (containing all nodes), implying its performance stems from attention weights rather than structural inference. In contrast, the sparse and discrete structures generated by SPHINX represent genuine high-order relationship modeling.
  • End-to-end training is elegant and clean, requiring no auxiliary regularization losses or complex training tricks.

Limitations & Future Work

  • The hyperedge cardinality \(k\) and the number of hyperedges \(M\) are hyperparameters that must be predefined, which can be difficult to select without prior domain knowledge.
  • Sequential prediction introduces order dependency, where the quality of the first hyperedge affects all subsequent ones. Whether there are better symmetry-breaking mechanisms remains an open question.
  • Efficiency on large-scale graphs (\(N > 10^4\)) is unverified, as the \(O(NM)\) complexity of slot attention may become a bottleneck.
  • The model is only validated in static hypergraph inference scenarios; adaptation to temporal, dynamic hypergraphs (e.g., social network evolution) has not been explored.
  • It assumes all hyperedges contain the same number of nodes (\(k\) is fixed), whereas in real-world scenarios, hyperedge sizes typically vary significantly.
  • vs EvolveHypergraph (Li et al. 2022): EvolveHypergraph uses Gumbel-Softmax sampling, requiring sparsity regularization and an auxiliary pairwise loss. SPHINX eliminates the need for these tricks by leveraging k-subset sampling.
  • vs TDHNN (Zhou et al. 2023): TDHNN employs differentiable clustering paired with top-k selection, but top-k only partially propagates gradients, and qualitative analysis reveals its hyperedges degenerate into containing all nodes.
  • vs NRI (Fetaya et al. 2018): NRI is a pioneer in graph (pairwise relation) inference. SPHINX generalizes this to the hypergraph domain, which faces the challenge of an exponentially large candidate space.
  • Insight: One can potentially combine SPHINX's hypergraph inference with graph structure learning to build a hierarchical relation inference framework.

Rating

  • Novelty: ⭐⭐⭐⭐ The combination of sequential slot attention and k-subset sampling is both original and highly effective.
  • Experimental Thoroughness: ⭐⭐⭐⭐⭐ Includes synthetic, inductive, and transductive scenarios, with detailed ablation studies and convincing qualitative analyses.
  • Writing Quality: ⭐⭐⭐⭐ The design motivation and desiderata are clearly articulated.
  • Value: ⭐⭐⭐⭐ Provides a significant boost to the fields of high-order relationship learning and structure inference.
  • Overall: ⭐⭐⭐⭐ Highly versatile, requiring no auxiliary regularization while producing interpretable structures. It stands as a landmark work in unsupervised hypergraph inference.