Loss-Guided Auxiliary Agents for Overcoming Mode Collapse in GFlowNets¶

Conference: AAAI 2026 arXiv: 2505.15251 Code: Available Area: Generative Flow Networks / Exploration Keywords: GFlowNet, mode collapse, auxiliary agent, loss-guided, diversity sampling

TL;DR¶

This paper proposes LGGFN (Loss-Guided GFlowNets), in which the exploration of an auxiliary GFlowNet is directly driven by the training loss of the primary GFlowNet. The auxiliary agent's reward is defined as \(R_{aux}(x) = R(x) + \lambda \cdot L_{main}(x)\), prioritizing regions where the primary model is least well-understood. On grid, sequence generation, and Bayesian structure learning tasks, LGGFN discovers 40× more unique modes and reduces exploration error by 99%.

Background & Motivation¶

Background: GFlowNets are designed to sample from multimodal distributions proportional to a reward function—rather than merely finding the optimum—theoretically avoiding mode collapse. In practice, however, on-policy training still suffers from mode collapse, as the model is attracted to high-reward modes discovered early in training.

Limitations of Prior Work: - Existing exploration techniques rely on heuristic novelty signals (e.g., state counting, RND), which are decoupled from the model's actual learning state. - Novelty signals may direct exploration toward irrelevant regions (novel but unimportant). - There is no mechanism that leverages the model's own training signal to guide exploration.

Key Challenge: GFlowNets require broad exploration to learn the complete distribution, yet on-policy sampling is dominated by high-reward regions—a mechanism is needed to redirect sampling toward regions the model has not yet learned well.

Goal: Use the primary model's training loss as the exploration signal for an auxiliary agent, realizing a strategy of "explore where you understand least."

Key Insight: High training loss regions = regions poorly understood by the primary model → the auxiliary agent prioritizes exploration there → the collected samples are fed back to train the primary model.

Core Idea: Auxiliary GFlowNet reward = primary model reward + primary model loss → directed exploration of weak regions.

Method¶

Overall Architecture¶

A dual-GFlowNet architecture consisting of a primary agent (learning the target distribution) and an auxiliary agent (learning a loss-weighted distribution of the primary agent). The primary agent's training data is the union of on-policy samples and auxiliary agent samples. The auxiliary agent periodically updates its reward to reflect the current weaknesses of the primary agent.

Key Designs¶

Loss-Guided Auxiliary Reward:
- Function: Directs the auxiliary agent to prioritize regions where the primary agent incurs high loss.
- Mechanism: \(R_{aux}(x) = R(x) + \lambda \cdot L_{main}(x)\), where \(L_{main}\) is the primary GFlowNet's training loss on trajectory \(x\).
- Design Motivation: High loss = primary model uncertainty/poor understanding → the region most in need of additional data. This is more direct than novelty signals, as it explicitly measures "learned poorly" rather than "visited or not."
- Implementation Details: The loss can be trajectory-based TB loss, transition-based FM loss, or sub-trajectory loss, making the method agnostic to the specific training objective. \(\lambda\) is set to keep \(R_{aux}\) on the same scale as \(R_{main}\), preventing training instability.
Mixed Training Strategy:
- Function: Enables the primary agent to learn from two sources.
- Mechanism: Training batch = \(\alpha \cdot\) on-policy samples \(+ (1-\alpha) \cdot\) auxiliary agent samples. \(\alpha\) controls the exploration–exploitation trade-off.
- Design Motivation: Purely auxiliary sampling may deviate too far from the reward distribution; mixing maintains stable distribution learning.
Exploitation of Neural Network Generalization:
- Function: Leverages the generalization capacity of neural networks to propagate the loss signal across neighboring states.
- Mechanism: High loss on a given trajectory → through network generalization, neighboring trajectories also exhibit higher loss → auxiliary agent exploration is region-wide rather than point-wise.
- Design Motivation: This is more efficient than state counting—a single informative exploration step covers an entire "uncertain region."

Loss & Training¶

Primary agent: Standard GFlowNet training loss (TB/DB/SubTB).
Auxiliary agent: TB loss with reward \(= R + \lambda L_{main}\).

Key Experimental Results¶

Main Results¶

Task	LGGFN	Best Baseline	Gain
Hypergrid 128×128 L1↓	0.83±0.21	0.92±0.36 (AdaTeachers)	10%
Sequence Generation Unique Modes↑	40× more	Baseline	40×
Sequence Generation Exploration Error↓	99% reduction	Baseline	99%
Bayesian Structure Learning	+10%	Baseline	10%

Ablation Study¶

Configuration	Performance
On-policy only (no auxiliary)	Severe mode collapse
Random exploration (unguided auxiliary)	Moderate improvement
Novelty-guided	Improvement but unstable
Loss-guided (LGGFN)	Best and most stable

Key Findings¶

Loss signal > novelty signal: Loss directly measures "learned poorly," which is more precise than "visited or not."
40× more modes (sequence generation): Demonstrates that LGGFN genuinely resolves mode collapse rather than merely improving approximation accuracy.
"Loss decay" of the auxiliary agent: As the primary model improves, the auxiliary agent's exploration direction naturally shifts—an adaptive mechanism.

Highlights & Insights¶

The idea of directly driving exploration via training loss is simple yet powerful—"explore where you understand least" is the most natural exploration strategy.
The approach has transfer value for any generative model requiring diversity sampling (e.g., drug design, materials discovery).
The dual-model framework of GFlowNet + auxiliary agent is generalizable to other RL settings.

Limitations & Future Work¶

The \(\lambda\) hyperparameter of the auxiliary agent requires tuning.
The dual-GFlowNet architecture increases computational and memory overhead.
Validation is limited to discrete-space tasks.

vs. Adaptive Teachers (Jain et al.): Uses a teacher distribution for guidance. LGGFN is more precise by directly leveraging the loss signal.
vs. RND / Curiosity: Novelty-driven exploration focuses on state visitation frequency; LGGFN's loss-driven approach more accurately targets regions where the model has genuinely failed to learn.
vs. Diverse RL: Diversity RL typically uses intrinsic rewards or information-theoretic regularization, whereas LGGFN's explicit auxiliary-agent guidance is more controllable and interpretable.
Loss-guided exploration can be extended to diversity generation in RLHF and may also be applied to adversarial example mining.

Rating¶

Novelty: ⭐⭐⭐⭐ The concept of loss-guided exploration is concise and powerful; directing the primary agent's exploration via high-loss regions through an auxiliary agent is a highly natural decoupled design.
Experimental Thoroughness: ⭐⭐⭐⭐ Three task categories (hypergrid, molecules, sequence generation), multiple baselines, and complete ablations; cross-domain validation is persuasive.
Writing Quality: ⭐⭐⭐⭐ The motivation chain is clear, and the logical derivation from the mode collapse problem to the loss-guided solution is well-structured.
Value: ⭐⭐⭐⭐ Directly applicable to multimodal sampling and molecular design; represents an important advance for the GFlowNet community in addressing mode collapse.

Additional Notes¶

The loss-guided auxiliary agent design is not limited to GFlowNets and can be generalized to any generative model scenario requiring diversity-driven exploration.