Scalable Equilibrium Sampling with Sequential Boltzmann Generators¶

Conference: ICML2025
arXiv: 2502.18462
Code: GitHub
Area: Computational Biology
Keywords: Boltzmann Generators, Normalizing Flows, Molecular Sampling, Annealed Langevin Dynamics, Protein Peptides

TL;DR¶

SBG achieves efficient equilibrium sampling of hexapeptide (66 atoms) systems in Cartesian coordinates for the first time by utilizing a Transformer-based normalizing flow (TarFlow) and sequential Monte Carlo with annealed Langevin dynamics.

Background & Motivation¶

Limitations of Prior Work¶

Limitations of Prior Work: Molecular systems possess multiple metastable states, and energy barriers make transitions between states extremely slow. Traditional MCMC/MD requires extremely long simulations with femtosecond-scale timesteps.

Bottlenecks of Existing Boltzmann Generators¶

Insufficient architectural expressiveness: Equivariant continuous flows are not efficient enough.
Poor overlap between proposal and target distributions: SNIS suffers from extremely large variance and very small ESS.
Prior state-of-the-art BG methods can only handle dipeptides (2 amino acids, 22 atoms).

Biaxial Scaling¶

Pre-training improvement: A scalable non-equivariant architecture (TarFlow) replaces equivariant flows. Inference-time improvement: Annealed Langevin dynamics progressively transport samples.

Method¶

Key Designs¶

Key Design 1: Softly Equivariant Normalizing Flows¶

Rigid equivariance is discarded in favor of TarFlow (Vision Transformer-based patch-masked autoregressive flow).

Soft equivariance implementation: - Rotational equivariance: Random rotation data augmentation during training. - Translational equivariance: Center-of-mass noise + compensation via Proposition 1 during inference.

Key Design 2: Annealed Langevin Dynamics¶

Energy interpolation from proposal flow energy to target Boltzmann energy, leveraging the Jarzynski equality to track importance weights. This computation is more informative than starting from the prior.

Theoretical Guarantees¶

Proposition 1 proves that ESS strictly improves after center-of-mass adjustment.

Loss & Training¶

The model is trained end-to-end, with the optimization objective considering both task loss and regularization terms.

Key Experimental Results¶

Peptide System Sampling Capability¶

Main Results¶

System	Number of Atoms	SBG-SNIS	SBG-AIS	Continuous BG
Dipeptide	22	Excellent	Excellent	Feasible
Tripeptide	33	Good	Excellent	Failed
Tetrapeptide	44	Feasible	Good	Failed
Hexapeptide	66	-	Feasible	Failed

ESS Improvement¶

Ablation Study¶

System	SNIS ESS	AIS ESS	Gain
Tripeptide	~0.3	>0.8	2.7x
Tetrapeptide	~0.1	>0.5	5x
Hexapeptide	~0	Statistically significant	From infeasible to feasible

Key Findings¶

Annealing is key: A significant performance leap occurs after incorporating annealing.
Soft equivariance is effective: Flexible parameterization outperforms strict geometric constraints.
Transitioning from dipeptide to hexapeptide represents a qualitative leap.

Highlights & Insights¶

Discarding rigid equivariance to use Transformers for scaling breakthroughs reflects the major trend in ML.
Inference-time annealing yields exponential improvements in sampling quality.
A perfect fusion of physics and ML: Boltzmann distribution, Jarzynski equality + TarFlow, SMC.
Center-of-mass adjustment is rigorously proven theoretically (Proposition 1).
Open-source code.

Limitations & Future Work¶

Hexapeptide (66 atoms) is the limit; real proteins (thousands of atoms) remain far off.
Computational cost of annealing is significantly higher than SNIS.
Reliance on exact energy gradients makes black-box potential energies inapplicable.
Selection of center-of-mass noise parameters is not fully discussed.
No comparison was made against diffusion model sampling methods.

Future Vision¶

Combine TarFlow architecture with more efficient equivariant operations, leveraging the strengths of both.
Explore multi-resolution annealing strategies to dynamically adjust step sizes.
Validate on all-atom (non-coarse-grained) proteins.
Integrate with structure prediction models like AlphaFold to leverage their priors.
Complete validation on decapeptides and larger systems is the next milestone that must be conquered.
Attempt to apply SBG to drug-protein binding free energy estimation.

Noe et al. 2019: Original BG framework. NETS: flow matching + non-equilibrium sampling.
TarFlow: This paper represents its first application to molecules.
Insight: The philosophy that soft constraints outperform rigid constraints can be generalized to scientific ML.

Rating¶

Novelty: 4.5/5
Experimental Thoroughness: 4.0/5
Writing Quality: 4.0/5
Value: 4.0/5

Supplementary Analysis¶

Method Comparison Summary Table¶

Method	Uses Energy	Exact Likelihood	Uses Data	Annealing
DEM	Yes	No	No	No
NETS	Yes	Yes	No	Yes
BG	Yes	Yes	Yes	No
SBG (Ours)	Yes	Yes	Yes	Yes

SBG is the only method that simultaneously possesses all four key characteristics.

Alanine dipeptide validation¶

In a 22-atom system, OM optimization generates two possible transition pathways. Using these pathways as collective variables for umbrella sampling, the free energy barrier is accurately estimated to be approximately 6 kcal/mol.