Improved Exploration in GFlowNets via Enhanced Epistemic Neural Networks¶

Conference: ICML 2025
arXiv: 2506.16313
Code: None
Area: Others
Keywords: GFlowNets, epistemic uncertainty, Epistemic Neural Networks, Thompson sampling, exploration

TL;DR¶

This paper integrates Epistemic Neural Networks (ENN/epinet) into GFlowNets to achieve uncertainty-driven exploration, proposing the ENN-GFN-Enhanced algorithm. It significantly improves mode discovery efficiency and distribution learning quality on HyperGrid and sequence generation tasks.

Background & Motivation¶

Background: GFlowNets are a class of generative models that sample objects proportional to a reward through sequential construction, finding crucial applications in scientific discoveries such as molecular design.

Limitations of Prior Work: (a) GFlowNets are prone to mode collapse during training, getting attracted to prematurely discovered modes; (b) conventional exploration strategies (on-policy, $\epsilon$-noisy) are inefficient; (c) existing Thompson Sampling (TS-GFN) approximates the posterior using ensembles, which entails high computational overhead and limited joint prediction quality.

Key Challenge: The performance of GFlowNets depends heavily on the quality of sampled trajectories, yet effective exploration requires awareness of epistemic uncertainty—namely, knowing "what is not known".

Key Insight: Use ENN (epinet) instead of ensembles to obtain more efficient joint prediction and uncertainty quantification.

Core Idea: Attach a lightweight epinet module to the GFlowNet's policy network to implement implicit Thompson Sampling via epistemic index sampling.

Method¶

Overall Architecture¶

Input: GFlowNet + Reward function $\rightarrow$ Attach epinet after the policy network $\rightarrow$ Sample epistemic index $z \sim P_Z$ $\rightarrow$ Generate uncertainty-aware policy $\rightarrow$ Sample trajectories $\rightarrow$ Update with TB loss $\rightarrow$ Iteratively improve distribution learning.

Key Designs¶

ENN and Epinet:
- The ENN output additionally depends on an epistemic index $z$: $f_\theta(x,z) = \mu_\zeta(x) + \sigma_\eta(\text{sg}[\phi_\zeta(x)], z)$
- $\mu_\zeta(x)$: base network output
- $\sigma_\eta$: learnable epinet (lightweight MLP) + fixed prior function $\sigma_P$
- sg denotes stop-gradient to prevent epinet from influencing base network feature learning
- Joint prediction: $\hat{P}^{\text{ENN}}(y_{1:\tau}) = \int_z P_Z(dz) \prod_t \text{softmax}(f_\theta(x_t, z))_{y_t}$
- Design Motivation: ENN achieves calibrated epistemic uncertainty estimation with minimal computational overhead.
ENN-GFN (Basic Version):
- Attach epinet to the forward policy network of GFlowNet.
- Sample a set of $z$ for each trajectory, combining prior network outputs via weighted sum.
- Design Motivation: Direct application of the ENN framework to GFlowNets.
ENN-GFN-Enhanced (Enhanced Version):
- Key difference: Instead of weighted sum, randomly select one prior ensemble member.
- Maintains an approximate posterior policy similar to TS-GFN.
- However, uncertainty estimation comes from the epinet rather than an ensemble.
- Design Motivation: Combining the explorative advantage of Thompson Sampling with the efficient uncertainty estimation of ENN.

Loss & Training¶

Trajectory Balance (TB) Loss: $$\mathcal{L}_{\text{TB}}(\tau;\theta) = (\log \frac{Z_\theta \prod P_F(s_{t+1}|s_t)}{R(s_n) \prod P_B(s_t|s_{t+1})})^2$$
Simultaneously update parameters of both the base network and epinet.

Key Experimental Results¶

Main Results (2D HyperGrid, 8×8, $R_0=10^{-4}$)¶

Algorithm	$L_1$ Distance (↓)	4 Modes Discovered	Description
Default-GFN	High	Partial	Only finds modes near the starting corner
TS-GFN	High	Partial	Ensemble brings improvement but is insufficient
ENN-GFN	Low	✓ All	Epinet provides better uncertainty
ENN-GFN-Enhanced	Lowest	✓ All	Combines TS strategy for best results

Ablation Study¶

Configuration	4D Grid ($R_0=10^{-4}, H=8$)	Description
ENN-GFN-Enhanced	Lowest $L_1$	Rapid discovery of all modes
ENN-GFN	Close to optimal	Slightly worse than Enhanced
TS-GFN	Moderate	Ensemble is effective but inefficient
Default-GFN	Highest	Insufficient exploration

Environment (2D, $R_0=10^{-5}$)	DB-GFN	ENN-GFN	ENN-GFN-Enhanced
64×64 ($L_1 \times 10^{-5}$)	Baseline	Improved	Optimal
128×128 ($L_1 \times 10^{-5}$)	Baseline	Improved	Optimal

Valid Bit Sequences (Transformer)¶

Sequence Length	With Epinet	Without Epinet
2N=12	260	207
2N=14	216	189
2N=16	135	120

Key Findings¶

ENN-GFN-Enhanced consistently performs best across all environments.
The advantage is more pronounced in larger and sparser environments.
Epinet also brings significant diversity improvements to Transformer architectures.
ENN-GFN performs well in small environments but may degrade on the 16×16 grid.

Highlights & Insights¶

Lightweight uncertainty: epinet only adds small MLPs in the last few layers, with computational overhead far smaller than an ensemble.
Importance of joint prediction: Uncertainty-driven exploration requires joint predictions rather than marginal predictions.
Elegance of the Enhanced version: Randomly selecting ensemble members instead of adopting a weighted sum better simulates Thompson Sampling.

Limitations & Future Work¶

Evaluated only on toy environments (HyperGrid, Bit Sequences), lacking real-world applications such as molecular design.
The reason for the performance degradation of ENN-GFN in larger environments warrants deeper analysis.
Comparisons with other exploration-enhancement methods like GAFN, RND, etc., are not comprehensive enough.

Thompson Sampling for GFlowNets (Rector-Brooks et al. 2023) is a direct predecessor.
ENN (Osband et al. 2023) provides the core technical components.
Random Network Distillation is alternative exploration enhancement technique.
Insight: Reliable uncertainty estimation is the foundation of efficient exploration.

Rating¶

Novelty: ⭐⭐⭐⭐ The integration of ENN and GFlowNet, along with the Enhanced variant, represents a novel contribution.
Experimental Thoroughness: ⭐⭐⭐ Environments are somewhat simplistic, lacking practical application scenarios.
Writing Quality: ⭐⭐⭐⭐ Background introduction is comprehensive and the methodology is clear.
Value: ⭐⭐⭐⭐ Provides a lightweight and effective solution for the GFlowNet exploration problem.

Configuration	4D Grid (\(R_0=10^{-4}, H=8\))	Description
ENN-GFN-Enhanced	Lowest \(L_1\)	Rapid discovery of all modes
ENN-GFN	Close to optimal	Slightly worse than Enhanced
TS-GFN	Moderate	Ensemble is effective but inefficient
Default-GFN	Highest	Insufficient exploration

Environment (2D, \(R_0=10^{-5}\))	DB-GFN	ENN-GFN	ENN-GFN-Enhanced
64×64 (\(L_1 \times 10^{-5}\))	Baseline	Improved	Optimal
128×128 (\(L_1 \times 10^{-5}\))	Baseline	Improved	Optimal