Skip to content

GFlowNets for Learning Better Drug-Drug Interaction Representations

Conference: NeurIPS 2025 arXiv: 2508.06576 Code: N/A Area: Medical AI / Drug Discovery Keywords: Drug-drug interaction, GFlowNet, variational graph autoencoder, class imbalance, graph generation

TL;DR

To address the severe class imbalance in drug-drug interaction (DDI) prediction, this paper proposes combining GFlowNet with a variational graph autoencoder (VGAE). By reward-guided generative sampling, the framework synthesizes training samples for rare interaction types, thereby enhancing predictive performance on infrequent yet clinically critical interaction categories.

Background & Motivation

Background: DDI prediction is a critical task for drug safety. Existing approaches leverage diverse features such as chemical structures and biological networks to construct predictive models.

Limitations of Prior Work: DDI datasets suffer from severe class imbalance — common interaction types (e.g., synergistic effects) dominate the data, while rare but clinically important interaction types are substantially underrepresented, leading to poor model performance on low-frequency classes.

Key Challenge: Prevailing SOTA methods predominantly formulate DDI prediction as a binary classification problem (interaction / no interaction), overlooking the semantic heterogeneity across interaction types and thereby amplifying bias toward frequent categories.

Goal: To improve coverage and prediction accuracy for rare interaction types without sacrificing performance on frequent categories.

Key Insight: Exploiting the reward-proportional sampling property of GFlowNets to selectively generate synthetic DDI samples for low-frequency classes.

Core Idea: A GFlowNet is trained to generate synthetic DDI samples according to a reward function defined as "rarity × plausibility," thereby restoring class balance in the training data.

Method

Overall Architecture

The framework consists of a three-stage pipeline: (1) pre-train a VGAE on the original imbalanced data to learn drug embeddings; (2) train a GFlowNet to learn a policy for generating synthetic DDI samples; (3) augment the original data with synthetic samples and retrain the VGAE to obtain the final model.

Key Designs

  1. Variational Graph Autoencoder (VGAE):

    • Function: Learns graph-structured latent representations of drugs and predicts DDI types.
    • Design Motivation: Graph structures naturally model the multi-relational interaction network among drugs.
    • Mechanism: The encoder is a Relational Graph Convolutional Network (R-GCN) that outputs a variational posterior for each drug, \(q_\phi(\mathbf{z}_i | \mathcal{G}) = \mathcal{N}(\mathbf{z}_i | \boldsymbol{\mu}_i, \text{diag}(\boldsymbol{\sigma}_i^2))\); the decoder employs DistMult or MLP to predict interaction type probabilities.
    • Training Objective: Maximizes the ELBO, comprising a reconstruction term and KL divergence regularization.
  2. GFlowNet Synthetic DDI Generation:

    • Function: Samples synthetic DDI triples \((d_i, d_j, t)\) in proportion to a reward signal.
    • Design Motivation: GFlowNets learn policies under which the generation probability is proportional to the reward, making them naturally suited for sampling biased toward rare categories.
    • Mechanism: A three-step trajectory is defined — select interaction type \(t\) → select the first drug \(d_i\) → select the second drug \(d_j\) from the \(K\)-nearest neighbors of \(d_i\). The reward function is: $\(R(t, d_i, d_j) = \underbrace{\left(\frac{1}{n_t + 1}\right)^\alpha}_{\text{rarity}} \times \underbrace{p_\theta(t | \mathbf{z}_i, \mathbf{z}_j)}_{\text{plausibility}}\)$ where \(n_t\) denotes the frequency of type \(t\) and \(\alpha\) controls the strength of preference toward rare classes.
    • Novelty: Unlike simple oversampling or SMOTE, samples generated by GFlowNet are constrained by the VGAE plausibility score, preventing the synthesis of implausible drug pairs.
  3. Trajectory Balance (TB) Loss Training:

    • Function: Trains the forward policy network of the GFlowNet.
    • Design Motivation: The TB loss enforces flow-matching conditions over complete trajectories, ensuring the sampling distribution converges to one proportional to the reward.
    • Mechanism: $\(\mathcal{L}_{\text{TB}}(\psi) = \left(\log \frac{Z_\psi \prod_{s \to s' \in \tau} P_F(s'|s;\psi)}{R(s_f)}\right)^2\)$ where \(Z_\psi\) is a learnable partition function (total flow).
    • Novelty: Compared to the Detailed Balance (DB) loss, TB operates over complete trajectories and exhibits greater training stability.

Loss & Training

  • Stage 1: Pre-train the VGAE on the original imbalanced data to obtain drug embeddings \(\mathbf{Z}\) and decoder \(p_\theta\).
  • Stage 2: Freeze the VGAE; use its embeddings and decoder to compute rewards and train the GFlowNet policy.
  • Stage 3: Sample \(N\) synthetic DDI triples using the trained GFlowNet, merge them with the original data, and retrain the VGAE.

Key Experimental Results

Main Results

Dataset: DrugBank (1,703 drugs, 191,870 drug pairs, 86 DDI types)

Metric w/o GFlowNet w/ GFlowNet
AUROC 0.99081 0.99071
Accuracy 0.96859 0.96792
AUPRC 0.98861 0.98922
F1 Score 0.98982 0.99914
Shannon Entropy (SE) ↑ 1.23 1.69
Jensen-Shannon Divergence (JSD) ↓ 0.35 0.12
Coverage ↑ 0.2441 0.7709

Ablation Study

No ablation table is provided; key conclusions are drawn by comparing classification metrics against diversity metrics:

Evaluation Dimension Observation
Classification Performance AUROC/Accuracy remain essentially unchanged (~0.99), indicating that synthetic augmentation does not harm majority classes.
Diversity SE increases from 1.23 to 1.69 (+37%), reflecting a more uniform distribution.
Distribution Alignment JSD decreases from 0.35 to 0.12 (−66%), indicating closer alignment between synthetic and real distributions.
Coverage Increases from 0.2441 to 0.7709 (+216%), substantially improving coverage of rare interaction types.

Key Findings

  • Conventional classification metrics (AUROC, Accuracy) remain nearly unchanged, as they are dominated by high-frequency classes.
  • Meaningful improvements manifest in diversity metrics: Coverage increases from 24.4% to 77.1%, indicating that the model can cover the vast majority of interaction types.
  • The GFlowNet reward design ensures that generated samples are both biased toward rare classes (via the rarity term) and remain plausible (via the VGAE decoder score).

Highlights & Insights

  • Precise Problem Framing: The paper targets the often-neglected class imbalance problem in DDI prediction rather than simply pursuing overall classification accuracy.
  • Elegant Framework Design: The reward-proportional sampling of GFlowNets aligns naturally with the data augmentation objective; the composite reward of rarity × plausibility is concise and effective.
  • Appropriate Evaluation Metrics: The use of Shannon Entropy and JSD, rather than relying solely on classification metrics, more faithfully reveals improvements in class distribution coverage.
  • Modular Design: The decoupled design of VGAE and GFlowNet makes the framework generalizable to other imbalanced graph classification problems.

Limitations & Future Work

  • Validation is conducted on a single dataset (DrugBank), with no cross-dataset generalization experiments.
  • Classification metrics show negligible change; per-class F1 scores would better demonstrate improvements on rare interaction types.
  • No comparison with alternative data augmentation methods (e.g., SMOTE, GANs, mixup) is provided.
  • Sensitivity analysis for GFlowNet hyperparameters (\(\alpha\), candidate set size \(K\), number of synthetic samples \(N\)) is absent.
  • The experimental section is relatively thin, comprising only one main results table and lacking thorough ablation studies and analysis.
  • GFlowNet (Bengio et al., 2023): Provides the theoretical foundation for reward-proportional sampling.
  • VGAE (Kipf & Welling): Variational graph autoencoder serves as the backbone for drug representation learning.
  • DDI Prediction: Existing methods such as MFConv and GraphDTA focus on binary classification; this paper complements them with a multi-class perspective.
  • Insights: The combination of GFlowNet with domain-specific graph models is transferable to other imbalanced biomedical problems, such as rare disease modeling and adverse drug reaction prediction.

Rating

  • Novelty: ⭐⭐⭐⭐ Applying GFlowNet to DDI data augmentation is a novel combination; the reward function design is elegant.
  • Experimental Thoroughness: ⭐⭐ Single dataset, lacking comparative baselines and ablation studies.
  • Writing Quality: ⭐⭐⭐ Method description is clear, but the experimental section is overly brief.
  • Value: ⭐⭐⭐ The approach is conceptually valuable, but insufficient experimental validation limits its persuasiveness.