GFlowNets for Learning Better Drug-Drug Interaction Representations¶

Conference: NeurIPS 2025 arXiv: 2508.06576 Code: N/A Area: Medical AI / Drug Discovery Keywords: Drug-drug interaction, GFlowNet, variational graph autoencoder, class imbalance, graph generation

TL;DR¶

To address the severe class imbalance in drug-drug interaction (DDI) prediction, this paper proposes combining GFlowNet with a variational graph autoencoder (VGAE). By reward-guided generative sampling, the framework synthesizes training samples for rare interaction types, thereby enhancing predictive performance on infrequent yet clinically critical interaction categories.

Background & Motivation¶

Background: DDI prediction is a critical task for drug safety. Existing approaches leverage diverse features such as chemical structures and biological networks to construct predictive models.

Limitations of Prior Work: DDI datasets suffer from severe class imbalance — common interaction types (e.g., synergistic effects) dominate the data, while rare but clinically important interaction types are substantially underrepresented, leading to poor model performance on low-frequency classes.

Key Challenge: Prevailing SOTA methods predominantly formulate DDI prediction as a binary classification problem (interaction / no interaction), overlooking the semantic heterogeneity across interaction types and thereby amplifying bias toward frequent categories.

Goal: To improve coverage and prediction accuracy for rare interaction types without sacrificing performance on frequent categories.

Key Insight: Exploiting the reward-proportional sampling property of GFlowNets to selectively generate synthetic DDI samples for low-frequency classes.

Core Idea: A GFlowNet is trained to generate synthetic DDI samples according to a reward function defined as "rarity × plausibility," thereby restoring class balance in the training data.

Method¶

Overall Architecture¶

The framework consists of a three-stage pipeline: (1) pre-train a VGAE on the original imbalanced data to learn drug embeddings; (2) train a GFlowNet to learn a policy for generating synthetic DDI samples; (3) augment the original data with synthetic samples and retrain the VGAE to obtain the final model.

Key Designs¶

Variational Graph Autoencoder (VGAE):
- Function: Learns graph-structured latent representations of drugs and predicts DDI types.
- Design Motivation: Graph structures naturally model the multi-relational interaction network among drugs.
- Mechanism: The encoder is a Relational Graph Convolutional Network (R-GCN) that outputs a variational posterior for each drug, $q_\phi(\mathbf{z}_i | \mathcal{G}) = \mathcal{N}(\mathbf{z}_i | \boldsymbol{\mu}_i, \text{diag}(\boldsymbol{\sigma}_i^2))$; the decoder employs DistMult or MLP to predict interaction type probabilities.
- Training Objective: Maximizes the ELBO, comprising a reconstruction term and KL divergence regularization.
GFlowNet Synthetic DDI Generation:
- Function: Samples synthetic DDI triples $(d_i, d_j, t)$ in proportion to a reward signal.
- Design Motivation: GFlowNets learn policies under which the generation probability is proportional to the reward, making them naturally suited for sampling biased toward rare categories.
- Mechanism: A three-step trajectory is defined — select interaction type $t$ → select the first drug $d_i$ → select the second drug $d_j$ from the $K$-nearest neighbors of $d_i$. The reward function is: $$R(t, d_i, d_j) = \underbrace{\left(\frac{1}{n_t + 1}\right)^\alpha}_{\text{rarity}} \times \underbrace{p_\theta(t | \mathbf{z}_i, \mathbf{z}_j)}_{\text{plausibility}}$$ where $n_t$ denotes the frequency of type $t$ and $\alpha$ controls the strength of preference toward rare classes.
- Novelty: Unlike simple oversampling or SMOTE, samples generated by GFlowNet are constrained by the VGAE plausibility score, preventing the synthesis of implausible drug pairs.
Trajectory Balance (TB) Loss Training:
- Function: Trains the forward policy network of the GFlowNet.
- Design Motivation: The TB loss enforces flow-matching conditions over complete trajectories, ensuring the sampling distribution converges to one proportional to the reward.
- Mechanism: $$\mathcal{L}_{\text{TB}}(\psi) = \left(\log \frac{Z_\psi \prod_{s \to s' \in \tau} P_F(s'|s;\psi)}{R(s_f)}\right)^2$$ where $Z_\psi$ is a learnable partition function (total flow).
- Novelty: Compared to the Detailed Balance (DB) loss, TB operates over complete trajectories and exhibits greater training stability.

Loss & Training¶

Stage 1: Pre-train the VGAE on the original imbalanced data to obtain drug embeddings $\mathbf{Z}$ and decoder $p_\theta$.
Stage 2: Freeze the VGAE; use its embeddings and decoder to compute rewards and train the GFlowNet policy.
Stage 3: Sample $N$ synthetic DDI triples using the trained GFlowNet, merge them with the original data, and retrain the VGAE.

Key Experimental Results¶

Main Results¶

Dataset: DrugBank (1,703 drugs, 191,870 drug pairs, 86 DDI types)

Metric	w/o GFlowNet	w/ GFlowNet
AUROC	0.99081	0.99071
Accuracy	0.96859	0.96792
AUPRC	0.98861	0.98922
F1 Score	0.98982	0.99914
Shannon Entropy (SE) ↑	1.23	1.69
Jensen-Shannon Divergence (JSD) ↓	0.35	0.12
Coverage ↑	0.2441	0.7709

Ablation Study¶

No ablation table is provided; key conclusions are drawn by comparing classification metrics against diversity metrics:

Evaluation Dimension	Observation
Classification Performance	AUROC/Accuracy remain essentially unchanged (~0.99), indicating that synthetic augmentation does not harm majority classes.
Diversity	SE increases from 1.23 to 1.69 (+37%), reflecting a more uniform distribution.
Distribution Alignment	JSD decreases from 0.35 to 0.12 (−66%), indicating closer alignment between synthetic and real distributions.
Coverage	Increases from 0.2441 to 0.7709 (+216%), substantially improving coverage of rare interaction types.

Key Findings¶

Conventional classification metrics (AUROC, Accuracy) remain nearly unchanged, as they are dominated by high-frequency classes.
Meaningful improvements manifest in diversity metrics: Coverage increases from 24.4% to 77.1%, indicating that the model can cover the vast majority of interaction types.
The GFlowNet reward design ensures that generated samples are both biased toward rare classes (via the rarity term) and remain plausible (via the VGAE decoder score).

Highlights & Insights¶

Precise Problem Framing: The paper targets the often-neglected class imbalance problem in DDI prediction rather than simply pursuing overall classification accuracy.
Elegant Framework Design: The reward-proportional sampling of GFlowNets aligns naturally with the data augmentation objective; the composite reward of rarity × plausibility is concise and effective.
Appropriate Evaluation Metrics: The use of Shannon Entropy and JSD, rather than relying solely on classification metrics, more faithfully reveals improvements in class distribution coverage.
Modular Design: The decoupled design of VGAE and GFlowNet makes the framework generalizable to other imbalanced graph classification problems.

Limitations & Future Work¶

Validation is conducted on a single dataset (DrugBank), with no cross-dataset generalization experiments.
Classification metrics show negligible change; per-class F1 scores would better demonstrate improvements on rare interaction types.
No comparison with alternative data augmentation methods (e.g., SMOTE, GANs, mixup) is provided.
Sensitivity analysis for GFlowNet hyperparameters ($\alpha$, candidate set size $K$, number of synthetic samples $N$) is absent.
The experimental section is relatively thin, comprising only one main results table and lacking thorough ablation studies and analysis.

GFlowNet (Bengio et al., 2023): Provides the theoretical foundation for reward-proportional sampling.
VGAE (Kipf & Welling): Variational graph autoencoder serves as the backbone for drug representation learning.
DDI Prediction: Existing methods such as MFConv and GraphDTA focus on binary classification; this paper complements them with a multi-class perspective.
Insights: The combination of GFlowNet with domain-specific graph models is transferable to other imbalanced biomedical problems, such as rare disease modeling and adverse drug reaction prediction.

Rating¶

Novelty: ⭐⭐⭐⭐ Applying GFlowNet to DDI data augmentation is a novel combination; the reward function design is elegant.
Experimental Thoroughness: ⭐⭐ Single dataset, lacking comparative baselines and ablation studies.
Writing Quality: ⭐⭐⭐ Method description is clear, but the experimental section is overly brief.
Value: ⭐⭐⭐ The approach is conceptually valuable, but insufficient experimental validation limits its persuasiveness.