BoltzNCE: Learning Likelihoods for Boltzmann Generation with Stochastic Interpolants¶

Conference: NeurIPS 2025 arXiv: 2507.00846 Code: Available Area: Generative Models / Molecular Simulation Keywords: Boltzmann distribution, noise contrastive estimation, stochastic interpolants, molecular conformation, free energy

TL;DR¶

BoltzNCE trains an Energy-Based Model (EBM) via a hybrid Score Matching + InfoNCE objective to approximate the likelihood of a Boltzmann Generator, eliminating expensive Jacobian trace computations. On alanine dipeptide conformation generation, it achieves a 100× inference speedup with a free energy error of only 0.02 \(k_BT\).

Background & Motivation¶

Background: Boltzmann Generators (BGs) are deep generative models designed to sample from Boltzmann distributions \(p(x) \propto \exp(-E(x)/k_BT)\) defined by an energy function \(E(x)\). Normalizing flow- and diffusion-based approaches can generate samples, but computing likelihoods requires expensive Jacobian determinants or trace estimation.

Limitations of Prior Work: Exact likelihood computation (e.g., the Hutchinson estimator for trace) is prohibitively slow at inference time — equivariant CNFs require 9.37 hours for alanine dipeptide — creating a bottleneck for applying BGs to free energy calculations and importance sampling.

Key Challenge: High sampling quality requires accurate flow/diffusion models, yet likelihood evaluation cost is independent of sampling quality: even when sampling is good, each likelihood evaluation remains expensive.

Goal: Decouple sampler quality from likelihood tractability by training a separate EBM to approximate the likelihood, thereby avoiding Jacobian computation entirely.

Key Insight: Decompose the BG pipeline into two stages — first train a Boltzmann Emulator via flow matching, then train an EBM to approximate its density using NCE combined with score matching.

Core Idea: Use NCE-trained EBMs as fast likelihood surrogates for Boltzmann Generators, achieving a 100× inference speedup.

Method¶

Overall Architecture¶

The framework consists of two stages: (1) train a Boltzmann Emulator using stochastic interpolants and flow matching to generate samples from the Boltzmann distribution; (2) train an EBM \(\hat{U}(x)\) on the generated samples via a hybrid Score Matching + InfoNCE objective, such that \(\exp(\hat{U}(x))\) approximates the true density.

Key Designs¶

Stochastic Interpolant Framework:
- Function: Constructs a smooth interpolation path \(I_t = \alpha_t x_0 + \beta_t x_1\) between noise \(x_0\) and Boltzmann samples \(x_1\).
- Mechanism: A vector field \(v_t\) is trained to match the probability flow induced by the interpolant, enabling a mapping from noise to the Boltzmann distribution.
- Design Motivation: Stochastic interpolants naturally bridge ODE flows and diffusion processes, and provide a well-defined score function that facilitates subsequent EBM training.
BoltzNCE Hybrid Training:
- Function: Jointly trains the EBM using Score Matching and InfoNCE.
- Score Matching Loss: \(\mathcal{L}_{SM} = \mathbb{E}[|\alpha_t \nabla\hat{U}_t(\tilde{I}_t) + x_0|^2]\) — enforces that the EBM gradient matches the score of the interpolation process.
- InfoNCE Loss: \(\mathcal{L}_{InfoNCE} = -\mathbb{E}[\log\frac{\exp(\hat{U}_t(\tilde{I}_t))}{\sum_{t'}\exp(\hat{U}_{t'}(\tilde{I}_t))}]\) — learns density through contrastive learning across time steps.
- Combined: \(\mathcal{L}_{BoltzNCE} = \mathcal{L}_{SM} + \mathcal{L}_{InfoNCE}\)
- Design Motivation: InfoNCE alone yields good global density estimates but inaccurate gradients; Score Matching alone yields accurate gradients but poor global density calibration — the two objectives are complementary.
Free Energy Estimation via Importance Reweighting:
- Function: Uses EBM likelihoods for importance sampling to correct sampling bias.
- Mechanism: \(\hat{Z} = \sum_i w_i\), \(w_i = \exp(-E(x_i)/k_BT) / \hat{\rho}(x_i)\); free energy difference: \(\Delta F = -k_BT \ln(\hat{Z}_A / \hat{Z}_B)\).

Loss & Training¶

Two-stage training. Stage 1: flow matching with an equivariant vector field. Stage 2: Score Matching + InfoNCE; total training time approximately 12 hours per epoch.

Key Experimental Results¶

Main Results (Alanine Dipeptide)¶

Method	\(\Delta F / k_BT\)	Error	Inference Time
Umbrella Sampling (ground truth)	4.10 ± 0.26	–	–
ECNF (exact likelihood)	4.09 ± 0.05	0.01	9.37h
GVP Vector Field	4.38 ± 0.67	0.28	18.4h
BoltzNCE	4.08 ± 0.13	0.02	0.09h

Ablation Study¶

Configuration	KL Divergence (8-mode Gaussian)	Notes
InfoNCE only	0.2395	Inaccurate gradients
Score Matching only	0.2199	Poor global density
BoltzNCE (combined)	0.0150	15× improvement

Configuration	KL (Checkerboard)	Notes
InfoNCE only	3.8478	Difficulty aligning multimodal distributions
BoltzNCE	0.1987	19× improvement

Key Findings¶

The hybrid objective improves KL divergence by 15–19× over either individual loss, confirming that the two objectives are genuinely complementary.
Inference is 100× faster (0.09h vs. 9.37h), as EBM forward passes are substantially cheaper than Jacobian trace computation.
Free energy error of 0.02 \(k_BT\) is comparable to the exact method (0.01 \(k_BT\)).
The approach generalizes to 7 dipeptide systems with acceptable error (0.43 \(k_BT\)) and a 6× speedup.

Highlights & Insights¶

Decoupling Sampling from Likelihood: Separating a high-quality sampler from a fast likelihood estimator is an elegant design choice — the generative model need not be likelihood-tractable; an independent EBM serves as a surrogate. This paradigm is broadly applicable to any setting where likelihoods are required but the generative model does not provide them.
Complementarity of NCE and SM: InfoNCE provides global density alignment through contrastive comparison across time steps, while Score Matching ensures local gradient accuracy. Their combination yields a 15× improvement in 2D experiments, offering strong empirical evidence of complementarity.
Practical Acceleration for Scientific Computing: A 100× inference speedup has substantial practical value for molecular simulation, making free energy calculations feasible on a single GPU rather than requiring a compute cluster.

Limitations & Future Work¶

Validation is limited to small molecules (dipeptides); scalability to large proteins and complex systems remains unexplored.
Generalization to unseen dipeptide systems incurs larger errors (0.43 vs. 0.02 \(k_BT\)), suggesting that fine-tuning may be necessary.
EBM training itself requires approximately 12 hours, which, while a one-time cost, is non-negligible.
Approximation errors in the learned likelihood may be amplified in the extreme tails of the distribution.

vs. ECNF (Köhler et al.): ECNF computes exact Jacobians but is extremely slow; BoltzNCE uses an approximate EBM and is 100× faster, with comparable free energy accuracy.
vs. Targeted Free Energy Perturbation: Classical reweighting methods rely on sufficient distributional overlap; BoltzNCE improves overlap by learning the likelihood directly.
vs. Flow Matching (Lipman et al., 2023): Stage 1 of BoltzNCE employs flow matching to construct the sampler, while Stage 2 — EBM training via the hybrid objective — constitutes the primary novel contribution.

Rating¶

Novelty: ⭐⭐⭐⭐ First application of sampling–likelihood decoupling with a hybrid NCE/SM objective in the context of Boltzmann generation.
Experimental Thoroughness: ⭐⭐⭐⭐ 2D synthetic benchmarks + alanine dipeptide + generalization across 7 dipeptide systems.
Writing Quality: ⭐⭐⭐⭐ Theoretical derivations are clear; 2D experiments intuitively illustrate the complementarity of the two loss terms.
Value: ⭐⭐⭐⭐ Significant practical impact for molecular simulation and free energy calculation.