Mirror Mean-Field Langevin Dynamics¶

Conference: ICML2026
arXiv: 2505.02621
Code: Not released
Area: Optimization
Keywords: mean-field Langevin, mirror descent, constrained sampling, propagation of chaos, logarithmic Sobolev inequality

TL;DR¶

This paper merges mean-field Langevin dynamics (MFLD) with mirror Langevin dynamics (MLD) to create "Mirror Mean-Field Langevin Dynamics" (MMFLD). It provides the first global convergence algorithm for minimizing the entropy-regularized functional \(\mathcal{L}(\mu)=F(\mu)+\lambda\,\mathrm{Ent}(\mu)\) on a convex constrained domain \(X\subseteq\mathbb{R}^d\). In continuous time, it proves \(e^{-2C_{\mathrm{LSI}}\lambda t}\) linear convergence using uniform mirror LSI; for discretization, it provides uniform-in-time propagation of chaos using an \(N\)-particle system with Euler-Maruyama.

Background & Motivation¶

Background: The distribution optimization objective \(\mathcal{L}(\mu)=F(\mu)+\lambda\,\mathrm{Ent}(\mu)\) formulates many machine learning problems (infinite-width two-layer neural networks, tensor decomposition, sparse deconvolution, density estimation, discrepancy minimization) as convex optimization in Wasserstein space. When \(X=\mathbb{R}^d\), MFLD (McKean-Vlasov process \(dX_t=-\nabla\frac{\delta F(\mu_t)}{\delta \mu}(X_t)dt+\sqrt{2\lambda}dB_t\)) combined with uniform LSI already provides \(L(\mu_t)-L(\mu^\ast)\le e^{-2C_{\mathrm{LSI}}\lambda t}\) linear convergence and mature analysis for propagation of chaos.

Limitations of Prior Work: In practice, many \(X\) are bounded convex sets (trajectory inference requires a probability simplex, Wasserstein barycenter requires bounded support, mean-matching in discrepancy minimization is often constrained to a simplex or spectral shape, and norm-constrained neural networks require parameters within a ball). Simply adding projections to MFLD accumulates mass at the boundary \(\partial X\), while single-particle mirror Langevin cannot handle cases where \(F\) is a distribution functional (non-linear \(\frac{\delta F}{\delta\mu}\)). This leaves an open question: Is there a mean-field algorithm with global convergence guarantees for constrained distribution optimization objectives \(\mathcal{L}\)?

Key Challenge: The diffusion in MFLD is "full-space Gaussian," which inevitably forces mass out of \(X\). The mirror map in MLD changes the geometry to confine diffusion within \(X\), but it is designed to sample a fixed \(\mu^\ast\propto e^{-f/\lambda}\) and cannot handle mean-field coupling where the target distribution depends on the current distribution \(\mu_t\). These two mechanisms are disconnected.

Goal: (1) Propose a unified SDE where diffusion stays within \(X\) while the drift handles the mean-field term \(\frac{\delta F(\mu_t)}{\delta \mu}\); (2) prove global exponential convergence in continuous time using mirror LSI; (3) prove uniform-in-time propagation of chaos for the \(N\)-particle time-discretized algorithm, with the LSI constant decoupled from the particle number; (4) extend convergence proofs to include stochastic gradients.

Key Insight: The authors observe that the difference between the dual-space SDE of MLD \(dY_t=-\nabla f(X_t)dt+\sqrt{2\lambda\nabla^2\phi(X_t)}dB_t\) and MFLD is merely replacing \(\nabla f\) with \(\nabla\frac{\delta F(\mu_t)}{\delta\mu}\). By adopting this replacement, they obtain the mean-field version of mirror dynamics and then adapt the "configuration space + entropy sandwich" proof from Nitanda 2024 to the mirror geometry.

Core Idea: Treat the mirror map \(\nabla\phi\) as a tool to "fold" constrained geometry into diffusion. Upgrade all theoretical components of MFLD (Wasserstein gradient flow, entropy sandwich, uniform LSI, propagation of chaos) to the Hessian metric \(\nabla^2\phi\), resulting in a unified "Mirror MFLD."

Method¶

Overall Architecture¶

To minimize \(\mathcal{L}(\mu)=F(\mu)+\lambda\,\mathrm{Ent}(\mu)\) for \(\mu\in\mathcal{P}_2(X)\) and convex \(X\subseteq\mathbb{R}^d\), the authors choose a thrice-differentiable Legendre barrier \(\phi:X\to\mathbb{R}\) (typical choices: \(\phi(x)=\sum_i x_i\log x_i\) for the simplex, \(\phi(\Sigma)=\mathrm{Tr}(\Sigma\log\Sigma-\Sigma)\) for spectral shapes, and \(\phi(z)\propto-\log(1-\|z\|^2)\) for the ball). \(\phi\) exploding at \(\partial X\) ensures diffusion stays within \(X\). The continuous-time SDE of MMFLD is \(X_t=\nabla\phi^\ast(Y_t)\), \(dY_t=-\nabla\tfrac{\delta F(\mu_t)}{\delta\mu}(X_t)\,dt+\sqrt{2\lambda\nabla^2\phi(X_t)}\,dB_t\). Its Fokker-Planck equation is \(\partial_t\mu_t=\lambda\nabla\cdot(\mu_t[\nabla^2\phi]^{-1}\nabla\log(\mu_t/\hat\mu_t))\), where \(\hat\mu_t\propto\exp(-\tfrac{1}{\lambda}\tfrac{\delta F(\mu_t)}{\delta\mu})\) is the proximal Gibbs distribution. This form preserves mean-field coupling (\(\mu_t\) in the drift) and uses the Hessian metric to remain within \(X\). The \(N\)-particle algorithm (Algorithm 1) discretizes this SDE via mirror gradient + Euler-Maruyama: particles \(X_k^i\) enter the dual space via the mirror map, apply the \(-\eta_k\nabla\frac{\delta F(\mu_k)}{\delta\mu}(X_k^i)\) drift, simulate pure diffusion \(dY_t^i=\sqrt{2\lambda[\nabla^2\phi^\ast(Y_t^i)]^{-1}}dB_t\), and return via \(\nabla\phi^\ast\).

Key Designs¶

1. Continuous-time convergence: mirror entropy sandwich + uniform mirror LSI

The first part proves exponential convergence \(L(\mu_t)-L(\mu^\ast)\le e^{-2C_{\mathrm{LSI}}\lambda t}(L(\mu_0)-L(\mu^\ast))\) (Theorem 3.2) by upgrading MFLD’s convergence proof to constrained geometry. First, Assumption 5 (relative Lipschitz/smoothness with local norm \(\|\cdot\|_{[\nabla^2\phi(x)]^{-1}}\)) is used to prove that the unique minimizer satisfies the fixed-point condition \(\mu^\ast\propto\exp(-\tfrac{1}{\lambda}\frac{\delta F(\mu^\ast)}{\delta\mu})\) (Theorem 3.1). It is then assumed that the proximal Gibbs \(\hat\mu\) satisfies mirror LSI: for any \(\mu\in\mathcal{P}_2(X)\),

\[\mathrm{KL}(\mu\|\hat\mu)\le \frac{1}{2C_{\mathrm{LSI}}}\,\mathrm{FI}_\phi(\mu\|\hat\mu),\qquad \mathrm{FI}_\phi(\mu\|\nu)=\mathbb{E}_\mu\big[\langle\nabla\log(\mu/\nu),[\nabla^2\phi]^{-1}\nabla\log(\mu/\nu)\rangle\big].\]

Finally, the Nitanda–Chizat entropy sandwich (Lemma C.2) is used to bound \(L(\mu_t)-L(\mu^\ast)\) with \(\mathrm{KL}(\mu_t\|\hat\mu_t)\), and Lyapunov estimation of \(\frac{d}{dt}L(\mu_t)\) yields exponential decay. This framework is transferable because mirror LSI follows from classic LSI + \(\alpha\)-strong convexity of \(\phi\) (constant \(C_0/\alpha\)), and the entropy sandwich remains valid under constraints.

2. Discretization + uniform-in-time propagation of chaos

For practical \(N\)-particle systems, the challenge is that error usually explodes with \(N\). The configuration space approach defines \(L^{(N)}(\mu^{(N)})=N\mathbb{E}_{X\sim\mu^{(N)}}[F(\mu_X)]+\lambda\mathrm{Ent}(\mu^{(N)})\), with the Gibbs optimizer \(\mu^{(N)}_\ast\propto\exp(-\tfrac{N}{\lambda}F(\mathbf{x}))\). Theorem 4.1 provides an LSI-free approximation error \(\tfrac{1}{N}L^{(N)}(\mu^{(N)}_\ast)-L(\mu^\ast)\le \tfrac{LR^2}{2N}\). Combining Ahn–Chewi forward discretization with self-concordance \(|\nabla^3\phi^\ast[u,u,u]|\le 2c_1\langle u,\nabla^2\phi u\rangle^{3/2}\) and uniform-in-\(N\) mirror LSI, Theorem 4.2 controls the discretization bias \(\delta_\eta\). The key is the LSI-free term: \(1/N\) depends only on \(LR^2\), allowing the error to vanish uniformly as \(N\to\infty\). The stochastic gradient version (Theorem 4.3) maintains this structure with an added \(\sigma^2/c_2\) term.

3. Mirror geometry selection and boundary handling

Algorithm 1 selects mirror maps for various domains: entropy mirror \(\phi(x)=\sum_i x_i\log x_i\) for the simplex \(\Delta^d\), von Neumann mirror \(\phi(\Sigma)=\mathrm{Tr}(\Sigma\log\Sigma-\Sigma)\) for the spectraplex, and log-barrier for the unit ball. In each case, diffusion is simulated via \(dY_t=\sqrt{2\lambda[\nabla^2\phi^\ast(Y_t)]^{-1}}dB_t\). This design contrasts with "projection to \(X\)," which accumulates mass at \(\partial X\). Mirror maps ensure particles naturally avoid the exploding barrier at \(\partial X\), internalizing constraints within the geometry.

Loss & Training¶

Key hyperparameters include temperature \(\lambda\) (regularization strength), learning rate \(\eta_k\), and particle number \(N\). Constants \(c_1, c_2\) from self-concordance and strong-convexity determine the magnitude of the discretization bias \(\delta_\eta\).

Key Experimental Results¶

Experiments consist of qualitative sanity checks on low-dimensional synthetic domains.

Main Results¶

Experiment	Domain \(X\) / mirror map	Goal	MMFLD vs Projected MFLD
Simplex mean-matching	\(\Delta^3\) / \(\phi(x)=\sum x_i\log x_i\)	\(F(\mu)=\\|\mathbb{E}_\mu x-q\\|^2+\beta\mathbb{E}_\mu \sum\log(1/x_i)\)	MFLD accumulates mass at \(\partial\Delta^3\); MMFLD achieves lower loss and uniform distribution.
Spectraplex density matching	\(\{\Sigma\succeq 0:\mathrm{Tr}\Sigma=1\}\subset \mathcal{S}^{10}\) / von Neumann	\(F(\mu)=\tfrac12\\|\mathbb{E}_\mu \Sigma-\Sigma^\ast\\|_F^2+\tfrac{1}{2\gamma}\mathbb{E}_\mu\\|\Sigma\\|_F^2\)	Projected MFLD stalls; MMFLD continuously improves toward optimality.
Norm-constrained 2-layer ReLU	Unit ball / \(\phi(z)\propto-\log(1-\\|z\\|^2)\)	XOR classification with noise	MMFLD loss drops faster; neurons align with XOR; MFLD stalls after 30-50 epochs as neurons hit boundaries.

Ablation Study¶

Configuration	Key Findings	Description
Projected MFLD (baseline)	Mass accumulation at boundaries; zero progress on spectraplex; neurons hit \(\\|w\\|=1\).	Projection disrupts Wasserstein geometry.
Projected MFLD + boundary barrier	Particles are repelled from boundary, but distribution is worse than no barrier.	Ad-hoc repair; unstable effect.
MMFLD with one-step diffusion	No significant difference from multi-step simulation.	Forward discretization is sufficient; runtime ≈ MFLD.
MMFLD with stochastic gradient	Adds \(\sigma^2/c_2\) term; linear convergence remains.	Supports differential privacy / batched training.

Key Findings¶

Projections are ill-suited for mean-field optimization: each projection step can erase progress made in Wasserstein geometry. Mirroring internalizes constraints, allowing sustained improvement.
One-step discretization for diffusion is sufficient for maintaining convergence rates, meaning MMFLD has comparable runtime to projected MFLD.
MMFLD aligns neurons with decision boundaries in XOR classification, whereas MFLD causes neurons to diverge and hit the \(\|w\|=1\) constraint, providing clear visual evidence of superior representational geometry.

Highlights & Insights¶

Successfully integrates MFLD and MLD into a unified framework with quantified, verifiable LSI-free propagation of chaos.
Theorem 4.2 contains both an LSI-free \(LR^2/N\) term and a self-concordance-controlled \(\delta_\eta\) term, resulting in a clean and theoretical robust structure.
Conceptually, the "geometry folding" perspective is highly instructive for future work: any constrained mean-field problem (private synthesis, Wasserstein barycenters, entropic OT) can adopt this template to avoid ad-hoc projection or barrier method artifacts.

Limitations & Future Work¶

Experiments are restricted to low-dimensional synthetic tasks (\(d=2,3,10\)); scalability to large-scale deep MFNN remains unverified.
Convergence rates rely on the abstract uniform-in-\(N\) mirror LSI assumption; quantitative constants for complex constraints like spectraplex remain an open problem.
Discretization theory requires \(\eta\to 0\) for a bias of \(O(\eta)\), while empirical evidence suggests Euler-Maruyama is often sufficient.
Future work involves extending the analysis from mirror LSI to mirror Poincaré inequalities, completing the theoretical puzzle for slower convergence regimes.

vs Chewi et al. 2020 / Ahn & Chewi 2021: These focus on sampling from a fixed \(\mu^\ast\). Ours extends this to mean-field coupling where \(\mu^\ast\) depends on \(\mu_t\).
vs Nitanda et al. 2022 / 2024 / Suzuki et al. 2023: We inherit their entropy sandwich and LSI-free frameworks but upgrade all metrics to the Hessian metric \(\nabla^2\phi\) for constrained domains.
vs Hsieh et al. 2018: They address single-particle mirror Langevin with SGD; Theorem 4.3 in this paper is the first to apply SGD to mean-field mirror dynamics.

Rating¶

Novelty: ⭐⭐⭐⭐⭐ First clean integration of MFLD and MLD for constrained distribution optimization.
Experimental Thoroughness: ⭐⭐⭐ Sufficient for sanity checks, lacks large-scale MFNN verification.
Writing Quality: ⭐⭐⭐⭐⭐ Clear progression from preliminaries to the proposed framework.
Value: ⭐⭐⭐⭐ Provides a standard algorithm for applications like trajectory inference and constrained MFNN.