Skip to content

MetaDNS: Enhancing Exploration in Discrete Neural Samplers via Well-Tempered Metadynamics

Conference: ICML 2026
arXiv: 2605.21722
Code: https://github.com/xiaochendu/metadns
Area: Statistical Physics / Neural Samplers / Enhanced Sampling
Keywords: Discrete Diffusion, Metadynamics, Mode Collapse, Free Energy Reconstruction, Boltzmann Sampling

TL;DR

This work incorporates "well-tempered metadynamics" from molecular dynamics into discrete neural samplers. By utilizing a history-dependent bias potential \(V_t(s)\) accumulated along a low-dimensional collective variable (CV) to flatten visited energy basins, the method forces MDNS-like models to cross energy barriers and cover multi-modal Boltzmann distributions. Importance reweighting is then employed to preserve unbiased estimations.

Background & Motivation

Background: In materials science and statistical physics, predicting phenomena such as ordered/disordered phase transitions in alloys or magnetic order parameters requires sampling from a discrete Boltzmann distribution \(\pi(x)\propto e^{-\beta E(x)}\). Traditional methods rely on MCMC (Metropolis–Hastings, Glauber, Swendsen–Wang). Recently, "discrete neural samplers" (MDNS, PDNS, LEAPS, DNFS, etc.) have emerged, using CTMC or any-order autoregressive models to learn samplers from energy functions, aiming for scalability in high-dimensional spaces.

Limitations of Prior Work: Discrete neural samplers trained with reverse KL divergence suffer from severe "mode collapse" at low temperatures. Probability mass tends to concentrate in a single energy basin discovered early in training, failing to sample configurations across high-energy barriers. This results in two critical issues: (i) missing other modes leads to biased estimations of equilibrium observables; (ii) the lack of barrier-crossing configurations makes it impossible to calculate the free energy surface \(F(s)\). These issues persist even with extended training or warm-start strategies.

Key Challenge: Existing methods rely on the natural convergence from a prior to the target distribution or fixed annealing paths, yet they lack a mechanism to encourage the generator to actively exit visited regions. Furthermore, evaluating \(E(x)\) in materials systems is extremely expensive (e.g., DFT or MLFF), making redundant evaluations in known low-energy regions a missed opportunity for discovering new phases.

Goal: To enable discrete neural samplers to actively cross energy barriers and cover all modes without relying on MCMC chains or compromising Boltzmann asymptotic correctness, while simultaneously reconstructing the free energy surface.

Key Insight: Well-tempered metadynamics (WT-MetaD) in continuous molecular dynamics is a classic tool for this purpose. it accumulates "bias hills" along a CV to flatten the energy landscape. While WT-MetaD is traditionally a sequential MCMC paradigm limited by chain autocorrelation, grafting this bias mechanism onto neural samplers—which can generate independent samples in parallel—theoretically offers the benefits of both approaches.

Core Idea: Maintain a bias potential \(V_t(s)\) along a low-dimensional CV \(s=\xi(x)\). The neural sampler is trained on a "biased Boltzmann" distribution \(\pi_{V_t}(x)\propto e^{-\beta[E(x)+V_t(\xi(x))]}\). During training, Gaussian hills are added to \(V_t\) at a well-tempered rate based on the current sample distribution. During inference, self-normalized importance sampling with \(w_i=\exp(V(\xi(x_i)))\) is used to recover the true Boltzmann distribution.

Method

Overall Architecture

MetaDNS utilizes a nested dual-loop training framework. Given an energy function \(E(x)\), inverse temperature \(\beta\), and a manually selected CV mapping \(\xi:\mathcal{X}\to\mathcal{S}\), the neural sampler \(q_\theta\) and an initial zero bias \(V_0\equiv 0\) are initialized.

In each iteration of the outer loop: (1) The inner loop fixes \(V_{t-1}\), samples \(M_\text{inner}\) configurations from \(q_\theta\), and updates \(\theta\) for \(N_\text{inner}\) steps using a biased energy \(E_\text{biased}(x)=E(x)+V_{t-1}(\xi(x))\) with discrete neural sampler losses (e.g., WDCE). (2) The outer loop then samples \(M_\text{outer}\) configurations from the updated \(q_\theta\) and deposits "hills" onto \(V_t\) based on their CV positions. By the end of training, \(V_{N_\text{outer}}\) flattens the energy basins, and \(q_\theta\) learns the flattened distribution. Inference is performed by sampling from \(q_\theta\) and reweighting with \(w_i=\exp(V(\xi(x_i)))\).

Key Designs

  1. Well-tempered hill deposition:

    • Function: Adds history-dependent Gaussian bumps to the bias potential, which automatically attenuate in visited regions to prevent infinite bias growth.
    • Mechanism: In round \(t\), the update for each CV bin \(s\) is \(V_t(s)\leftarrow V_{t-1}(s)+\sum_j h\,\exp(-V_{t-1}(s)/(\gamma k_B T))\,K(s,\xi(x_j))\), where \(h\) is the hill height and \(\gamma>1\) is the bias factor. The exponential factor ensures that the bias addition decreases as \(V\) increases, asymptotically satisfying \(V^\star(s)\approx-(1-1/\gamma)F(s)+c\).
    • Design Motivation: The well-tempered form ensures convergence and allows the free energy surface \(F(s)\) to be fitted "for free."
  2. Dual-loop training for discrete neural metadynamics:

    • Function: Adapts the "sample-bias-resample" cycle of WT-MetaD into a setup where the inner loop learns the biased distribution and the outer loop updates the bias.
    • Mechanism: The inner loop pushes \(q_\theta\) toward \(\pi_{V_{t-1}}\) using path-measure alignment losses (like WDCE in MDNS), while the outer loop updates \(V\) after forward sampling.
    • Design Motivation: Unlike MCMC chains, samples from neural samplers are independent, meaning each step of hill deposition in the outer loop contains more information. This avoids the need to re-burn long chains after every bias update, which is a significant advantage over MCMC-based WT-MetaD.
  3. Dual-track importance reweighting:

    • Function: Restores the true \(\pi(x)\) from configurations sampled from \(q_\theta\) after training.
    • Mechanism: (i) Bias-based weights \(w_i=\exp(V(\xi(x_i)))\) are applicable to all samplers, rely only on low-dimensional CVs, and have low variance. (ii) Likelihood-based weights \(\tilde w_i=\exp(-\beta E(x_i))/q_\theta(x_i)\) provide an exact-density interpretation but require explicit likelihood calculation (available in autoregressive models or via path-likelihood decomposition in MDNS).
    • Design Motivation: Bias-based weights are essential for samplers without tractable likelihoods, whereas likelihood-based weights are asymptotically unbiased and used when exactness is required.

Loss & Training

The inner loop employs the original Weighted Denoising Cross-Entropy (WDCE) from MDNS, replacing the target distribution energy \(E\) with \(E_\text{biased}=E+V_{t-1}\circ\xi\). Key hyperparameters include the bias factor \(\gamma\), initial hill height \(h\), Gaussian kernel width \(\sigma\), and the ratio \(N_\text{inner}/N_\text{outer}\). CVs are system-dependent: the fraction of up-spins for Ising, occupancy fractions for Potts, and the Au atomic fraction for Cu-Au alloys.

Key Experimental Results

Main Results

The authors evaluated the method on Ising, Potts, and Cu-Au alloy systems, comparing it against MDNS (baseline neural sampler) and MCMC-based WT-MetaD (gold standard). Reference values were obtained via Swendsen–Wang or long MCMC chains.

Setup (\(L=16\) Ising) MDNS MDNS warm-start MetaDNS SW ground truth
High \(T\) (\(\beta=0.28\)), \(x_\uparrow\) JS↓ 1.7e-2 1.7e-2
Critical (\(\beta=0.4407\)), \(x_\uparrow\) JS↓ 3.6e-2 4.2e-2
Low \(T\) (\(\beta=0.60\)), \(x_\uparrow\) JS↓ 2.2e-1 (Collapsed) 4.8e-3 4.6e-2
Low \(T\) (\(\beta=0.60\)), Magnetization 0.974 0.972 0.974 0.973

In low-temperature Ising systems, MetaDNS achieves a JS divergence for \(x_\uparrow\) that is approximately 5× lower than vanilla MDNS. For the Potts model, MetaDNS reaches a free energy accuracy of \(1 k_BT\) RMSE in significantly fewer bias deposition steps compared to MCMC-based WT-MetaD (e.g., 50k vs 94.5k steps at low temperature). In the Cu-Au alloy system, MetaDNS successfully identifies the Cu\(_3\)Au phase which vanilla MDNS misses.

Ablation Study

Dimension MDNS MetaDNS MCMC WT-MetaD
Low-T Mode Coverage Collapsed Full Modes Full Modes
Bias steps to converge (Potts) 14k–50k 36k–107k
Training wall-time (Cu-Au) 1.5 h 1.75 h
Training wall-time (Potts) 20 h 1 h
10k Sample Generation Time <1 min ≈30–40 min

Key Findings

  • Mode collapse primarily manifests when \(L \ge 8\) and \(\beta > \beta_\text{crit}\); it is not observable in smaller systems like \(L=4\).
  • While warm-starting MDNS can partially mitigate JS divergence issues for \(x_\uparrow\), it often degrades energy JS and two-point correlations.
  • The comparative advantage in training wall-time is determined by the cost of \(E(x)\) evaluation. For the Cu-Au system where evaluations are expensive, MetaDNS is faster in both training and inference.
  • MetaDNS provides significant amortized inference speedup (>30-40×) compared to MCMC, as it only requires a single forward pass.

Highlights & Insights

  • This work represents the first complete migration of the "memory-type bias potential" from molecular dynamics to discrete neural samplers, providing an exploration mechanism that is agnostic to the underlying sampler architecture.
  • The natural emergence of the free energy surface as a training byproduct bridges the gap between raw sampling and thermodynamic analysis.
  • The use of "bias deposition steps" as a resource unit (per energy evaluation) provides a compelling metric for comparing neural samplers with traditional MCMC chains.
  • The introduction of the Cu-Au binary alloy benchmark connects machine learning sampling with real-world materials science, moving beyond toy models like Ising/Potts.

Limitations & Future Work

  • CV Design: Collective variables still require manual design, which is a bottleneck for complex systems like high-entropy alloys. Future work may explore automated CV discovery.
  • Dimensionality: The bias potential is subject to the curse of dimensionality regarding the number of CVs and bins, typically limiting the method to 2-3 dimensions.
  • Reweighting Bias: If the sampler \(q_\theta\) is not sufficiently expressive or converged, bias-based reweighting may introduce errors. Theoretical convergence guarantees remain an open question.
  • Computational Overhead: For systems where energy evaluation is extremely cheap (e.g., Potts), the overhead of neural network training makes MetaDNS slower in terms of wall-clock time compared to classical MCMC.
  • Comparison with MDNS/PDNS/LEAPS/DNFS: These methods rely on implicit exploration through loss alignment. MetaDNS provides an orthogonal enhancement by adding an explicit history-dependent bias.
  • Comparison with Continuous Samplers (e.g., WT-ASBS): While WT-ASBS applies similar concepts in continuous spaces, MetaDNS addresses the unique challenges of discrete spaces, such as CTMC training objectives and CV discretization.
  • Comparison with classical WT-MetaD: MetaDNS replaces MCMC/MD with a neural sampler, reducing the number of bias deposition steps required by leveraging independent samples.

Rating

  • Novelty: ⭐⭐⭐⭐
  • Experimental Thoroughness: ⭐⭐⭐⭐⭐
  • Writing Quality: ⭐⭐⭐⭐
  • Value: ⭐⭐⭐⭐