Diffusion Generative Modeling on Lie Group Representations¶
Conference: NeurIPS 2025 arXiv: 2502.02513 Code: None Area: Image Generation / Diffusion Models / Lie Groups Keywords: Lie group representations, generalized score matching, stochastic differential equations, molecular conformation generation, manifold diffusion
TL;DR¶
This paper proposes a novel theoretical framework for constructing diffusion processes on the representation space of Lie groups (rather than on the Lie groups themselves). By mapping the curved dynamics of non-Abelian Lie groups into Euclidean space via generalized score matching, the framework enables simulation-free training of Lie group diffusion models, and demonstrates that standard score matching is a special case corresponding to the translation group.
Background & Motivation¶
- Problem: Many scientific data types (molecular conformations, rotations, rigid-body transformations) are naturally distributed on curved manifolds (e.g., SO(3), SE(3)) rather than Euclidean space.
- Existing Difficulties: Diffusion algorithms on manifolds face two major challenges: (1) parameterizing vector fields on general manifolds remains unsolved; (2) Langevin updates require projection back onto the manifold to preserve geometric structure.
- Practical Gap: Even when data symmetry is well-defined (e.g., the torsion angle space of proteins), AlphaFold3 still performs diffusion in Cartesian coordinates, as manifold diffusion offers no clear performance advantage.
- Core Problem: Can the flat-space advantages of Euclidean geometry be preserved while exploiting the symmetry structure of Lie groups? This paper answers affirmatively by constructing diffusion on the representation space \(\text{Im}(\rho_X) \subseteq GL(X)\).
Method¶
Overall Architecture¶
Rather than performing diffusion directly on the Lie group \(G\) or on the data space \(X\), the framework leverages the action representation \(\rho_X: G \to GL(X)\) of \(G\) on \(X\):
- Parameterize data points \(\mathbf{x} \in X\) via flow coordinates \(\boldsymbol{\tau}\) (i.e., identify Lie group transformation parameters).
- Define the forward diffusion in flow coordinate space (Euclidean, amenable to Gaussian noise).
- Map flow coordinates back to data space via the Lie group action.
- Train a network to predict the generalized score (the derivative of the log-density along fundamental vector field directions).
- The reverse SDE is guided by the generalized score to generate samples in data space \(X\).
Key Designs¶
1. Generalized Score Matching¶
The standard score function \(\nabla \log p(\mathbf{x})\) is replaced by derivatives along the fundamental vector fields of the Lie algebra:
where \(\mathbf{\Pi}(\mathbf{x})\) is the fundamental vector field matrix. The generalized Fisher divergence is:
2. Three Sufficient Conditions¶
For generalized score matching to be applicable to Langevin dynamics, three conditions must be satisfied:
| Condition | Meaning | Mathematical Requirement |
|---|---|---|
| Completeness | \(\mathbf{\Pi}\) retains all information about the density | \(\dim(G/G_\mathbf{x}) \geq \dim(X)\) almost everywhere |
| Homogeneity | Any two points are connected by a \(G\)-transformation | \(X\) is a homogeneous space of \(G\) |
| Commutativity | Langevin updates in each direction are mutually independent | \([\mathcal{L}_A, \mathcal{L}_B]f(\mathbf{x}) = 0\) |
A key insight is that non-Abelian groups can also satisfy the commutativity condition: elements that do not commute in the Lie algebra may induce commuting flows on \(X\) (e.g., \(\mathfrak{so}(3)\) under certain representations).
3. Core Theorem: Paired SDEs¶
Theorem 3.1: There exists a pair of SDEs in which the forward process admits an exact solution and the reverse process is guided by the generalized score:
Forward SDE: $\(d\mathbf{x} = \left[\beta(t)\mathbf{\Pi}(\mathbf{x})\mathbf{f}(\mathbf{x}) + \frac{\gamma(t)^2}{2}\rho_X(\Omega)\right]dt + \gamma(t)\mathbf{\Pi}(\mathbf{x})d\mathbf{W}\)$
Exact Solution: \(\mathbf{x}(t) = \prod_{i=1}^n e^{\tau_i(t)A_i} \mathbf{x}(0)\), where \(\boldsymbol{\tau}(t)\) follows a simple SDE.
Reverse SDE contains three additional terms: - Quadratic Casimir element \(\rho_X(\Omega) = \sum_i A_i^2\): compensates for orbital deviation induced by curvature of the flow coordinates. - Divergence correction \(\mathbf{\Pi}\nabla^\top \cdot \mathbf{\Pi}\): corrects the probability flow of the non-constant-coefficient SDE. - Generalized score \(\mathbf{\Pi}\mathcal{L}\log p_t\): guides sampling.
4. Standard Score Matching as a Special Case¶
When \(G = T(n)\) (the translation group), \(\mathbf{\Pi} = I\), and the Casimir and divergence correction terms vanish, recovering standard DDPM.
5. \(SO(2) \times \mathbb{R}^+\) Example¶
Taking \(G = SO(2) \times \mathbb{R}_+\) acting on \(X = \mathbb{R}^2\): - Fundamental vector field matrix: \(\mathbf{\Pi}(\mathbf{x}) = \begin{pmatrix} x & -y \\ y & x \end{pmatrix}\) - The SDE decomposes along radial (scaling) and angular (rotation) directions. - The asymptotic distribution \(p_T\) takes the form \((e^{\eta_r}\cos\eta_\theta, e^{\eta_r}\sin\eta_\theta)\), obtainable by sampling two Gaussian variables.
Loss & Training¶
- The training objective follows the standard denoising score matching formulation, but targets the generalized score rather than the standard score:
- Since the conditional generalized score admits a closed-form solution \(-\Sigma(t)^{-1}\boldsymbol{\eta}_t\) under Gaussian assumptions, training reduces to noise prediction (analogous to DDPM).
- A variance-preserving scheduler is used; training and sampling algorithms are given in Algorithms 1 and 2.
Key Experimental Results¶
Main Results¶
2D/3D/4D Distribution Generation¶
- Using \(G = SO(d) \times \mathbb{R}_+\) on \(X = \mathbb{R}^d\).
- Successfully models mixture-of-Gaussians, tori, Möbius strips, and other distributions.
- Symmetry exploitation reduces the learning dimensionality (e.g., only the radial score needs to be learned for radially symmetric distributions).
Rotated MNIST Bridging¶
| Model | Mean Accuracy ↑ | Mean FID ↓ |
|---|---|---|
| GSM (Ours) | 0.96 ± 0.02 | 85.8 ± 15.7 |
| BBDM | 0.80 ± 0.10 | 133.4 ± 19.0 |
- Only a 1-dimensional score (rotation angle) is learned, substantially simplifying the problem.
- BBDM operates in the full pixel space and may generate incorrect digit identities.
QM9 Molecular Conformation Generation¶
- \(G = (SO(3) \times \mathbb{R}_+)^N\) acting on \(X = \mathbb{R}^{3N}\).
- UFF energies of generated conformations are close to those of ground-truth conformations and are marginally lower.
- Lie group diffusion: \(\Delta_\theta = -0.2159\) vs. standard diffusion: \(\Delta_\gamma = -0.2144\).
CrossDocked2020 Molecular Docking¶
| Method | RMSD (Å) ↓ |
|---|---|
| GSM (Ours, SE(3)) | 2.91 ± 1.0 |
| RSGM (Riemannian diffusion) | 5.6 ± 1.2 |
| BBDM (Euclidean) | 2.92 ± 1.57 |
Ablation Study¶
- In W2 distance comparisons on 2D mixture-of-Gaussians, Lie group diffusion and standard diffusion exhibit equivalent expressive capacity (both can model arbitrary distributions).
- However, when the chosen Lie group symmetry matches the data structure, the effective dimensionality is reduced and learning efficiency improves.
Key Findings¶
- An appropriate choice of Lie group can reduce the effective dimensionality (from 784 dimensions in rotated MNIST to 1 dimension).
- The framework supports bridging between non-trivial distributions (which BBDM cannot achieve).
- SE(3)-guided molecular docking substantially outperforms Riemannian diffusion methods.
- The framework extends naturally to flow matching (Appendix F).
Highlights & Insights¶
- Theoretical Unification: Standard diffusion (translation group), Riemannian diffusion (diffusion on the group), and the proposed method (diffusion in representation space) are all instances of the same unified framework.
- Geometric Intuition of the Casimir Element: It compensates for orbital deviation caused by curvature — in SO(2), tangential motion causes a point to drift off its circular orbit, and the Casimir element restores it.
- Dimensionality Compression: When the data structure aligns with the group symmetry, the effective dimensionality of score learning can be substantially reduced.
- Simulation-Free Training: The exact solution of the forward SDE avoids the numerical simulation difficulties associated with diffusion on non-Abelian groups.
Limitations & Future Work¶
- The commutativity condition restricts applicable group–space combinations, requiring careful selection.
- Molecular docking experiments are conducted at a relatively small scale (no direct comparison with methods such as DiffDock).
- Computational efficiency for high-dimensional groups (e.g., SO(n) for large \(n\)) remains to be verified.
- Current validation is limited to common groups such as SO(2), SO(3), and SE(3); more complex symmetry groups warrant further exploration.
- Generalization to non-homogeneous spaces (restricting generation within orbits) is theoretically supported but lacks experimental validation.
Related Work & Insights¶
- De Bortoli et al. (2022): Diffusion on Riemannian manifolds; the dynamics are formally similar to the proposed method in the Lie group setting, but require numerical simulation.
- Corso et al. (2023) DiffDock: SE(3) diffusion for molecular docking, operating directly on the group.
- Kim et al. (2022): Linearization of nonlinear problems via bijective mappings, conceptually related to the representation space perspective of this work.
- Insight: Selecting the appropriate symmetry group can reduce complex problems to lower-dimensional ones — a principle with broad applicability across scientific computing.
Rating¶
| Dimension | Score | Comment |
|---|---|---|
| Novelty | ★★★★★ | Entirely new theoretical framework unifying multiple diffusion paradigms with mathematical elegance |
| Technical Depth | ★★★★★ | Rigorous mathematical derivations; exact solutions for a novel class of SDEs are established |
| Experimental Thoroughness | ★★★☆☆ | Experiments are limited in scale; comparison with state-of-the-art methods is insufficient |
| Practical Value | ★★★★☆ | Promising applications in molecular conformation and docking, though engineering implementation is non-trivial |
| Writing Quality | ★★★★☆ | Mathematically rigorous but demanding to read; requires a solid background in differential geometry |