The Price of Freedom: Exploring Expressivity and Runtime Tradeoffs in Equivariant Networks¶
Conference: ICML 2025
arXiv: 2506.13523
Code: https://github.com/atomicarchitects/PriceofFreedom
Area: Equivariant Neural Networks / Scientific Computing
Keywords: E(3)-equivariance, tensor product, expressivity, Gaunt tensor product, MACE, interatomic potentials
TL;DR¶
This paper systematically analyzes the tradeoffs between expressivity and runtime for various tensor product operations in \(E(3)\)-equivariant neural networks, reveals a significant gap between theoretical complexity and empirical performance, and proposes a simplified Gaunt tensor product implementation based on spherical grids, achieving a 30% speedup in MACE interatomic potential training.
Background & Motivation¶
Background: \(E(3)\)-equivariant networks have achieved great success in 3D modeling tasks such as molecular simulation and materials science. The tensor product is a primitive operation in these networks, which interacts two geometric features in an equivariant manner to create new features.
Limitations of Prior Work: The high computational complexity of the tensor product (\(O(l_{\max}^6)\)) represents a bottleneck in equivariant networks. Luo et al. (2024) proposed the Gaunt tensor product (GTP) claiming significant speedups, but did not fully address the loss of expressivity. A systematic analysis comparing various tensor product variants is currently lacking.
Key Challenge: Speedups usually come at the cost of expressivity, but the existing literature lacks a clear understanding of this tradeoff. Different tensor products actually perform different operations.
Goal: To systematically compare the expressivity, interactability, and runtime of various tensor product operations.
Key Insight: Introducing quantitative measures for expressivity and interactability, accompanied by systematic micro-benchmarks.
Core Idea: The speedup of GTP is not a free lunch—spherical grids can be used to implement the equivalent functionality more simply, and are actually faster in practice.
Method¶
Overall Architecture¶
This work presents a systematic analysis combined with engineering optimization. The analyzed objects include: - CG (Clebsch-Gordan) tensor product (standard, fully parameterized) - GTP (Gaunt tensor product) - Spherical grid method (the simplified scheme proposed in this work) - Other variants
Key Designs¶
-
Expressivity Measures:
- Function: Defines the output space dimension of different tensor products as a measure of expressivity.
- Mechanism: A fully parameterized CG tensor product can represent any equivariant bilinear map, with a parameter space dimension of \(\sum_{l_1, l_2, l} (2l+1)\) (summed over all allowed \((l_1, l_2, l)\) triplets). Restricted tensor products like GTP can only represent a subspace.
- Design Motivation: Quantifying the "price of speed"—how much expressivity different tensor products lose.
-
Interactability:
- Function: Measures the degree of coupling between different input channels during the tensor product operation.
- Mechanism: Defined as the number of effectively coupled input channel pairs in the output. The CG tensor product allows all channels to interact, whereas GTP restricts certain interactions.
- Design Motivation: Inter-channel interactions are fundamental for message passing; low interactability may limit representation learning capability.
-
Spherical Grid Simplification Scheme:
- Function: Replaces the complex implementation of GTP with discrete sample points on a sphere.
- Mechanism: Evaluates spherical harmonics at \(N\) discrete points on the sphere, performs point-wise multiplication, and projects back to spherical harmonic coefficients. This is asymptotically equivalent to GTP in terms of complexity (\(O(l_{\max}^3)\) vs CG's \(O(l_{\max}^6)\)), but features a simpler implementation and smaller constant factor.
- Design Motivation: GTP's original implementation involves complex computations of Gaunt coefficients; the spherical grid method is conceptually simpler and faster in practice.
Loss & Training¶
Evaluation is performed on the MACE model (interatomic potentials) using a standard joint energy-force loss.
Key Experimental Results¶
Main Results (MACE training on rMD17)¶
| Tensor Product Method | Energy MAE (meV) | Force MAE (meV/Å) | Training Time/epoch | Speedup |
|---|---|---|---|---|
| CG (Full) | 3.2 | 8.1 | 120s | 1x |
| GTP (Original) | 3.5 | 8.8 | 95s | 1.26x |
| Spherical Grid (Ours) | 3.5 | 8.8 | 84s | 1.43x |
| Spherical Grid (Large Grid) | 3.3 | 8.4 | 92s | 1.30x |
Ablation Study (Micro-benchmarks, individual tensor product runtime)¶
| Method | \(l_{\max}=2\) (μs) | \(l_{\max}=4\) (μs) | \(l_{\max}=6\) (μs) | Description |
|---|---|---|---|---|
| CG (e3nn) | 15 | 180 | 2400 | \(O(l^6)\) growth |
| CG (cuEquivariance) | 8 | 45 | 320 | GPU optimized |
| GTP (Original) | 12 | 50 | 150 | \(O(l^3)\) theory |
| Spherical Grid | 10 | 35 | 95 | Lower constant factor |
Key Findings¶
- Significant Gap Between Theory and Practice: Theoretically, GTP should provide order-of-magnitude speedups over CG, but practical speedups depend heavily on \(l_{\max}\) and implementation details.
- Expressivity Loss Demands Caution: The restricted expressivity of GTP/spherical grids can lead to degradation in accuracy on certain tasks.
- Spherical Grid Outperforms GTP: It is conceptually simpler and more computationally efficient, offering a 30% training speedup in MACE.
- The gap between methods is narrow at low \(l_{\max}\), while speedups become significant at high \(l_{\max}\).
Highlights & Insights¶
- The first systematic benchmark of equivariant tensor products, providing practitioners with a clear selection guide.
- Reveals the illusion of "free speedups"—potential loss of expressivity accompanying the speedup can impact downstream tasks.
- The simplicity and elegance of the spherical grid approach: achieving the best empirical performance with the simplest formulation.
- Directly delivers engineering value to the equivariant network community.
Limitations & Future Work¶
- The impact of expressivity loss from restricted tensor products varies across tasks, necessitating task-specific evaluations.
- Optimal sampling schemes for spherical grids are not yet fully understood.
- End-to-end experiments were conducted solely on the MACE architecture; results might differ in other equivariant network designs.
Related Work & Insights¶
- GTP by Luo et al. (2024) is the direct baseline/comparison target.
- Closely related to equivariant network frameworks such as e3nn, MACE, and NequIP.
- Insight: The theoretical complexity of an algorithm does not equate to practical performance; empirical benchmarking remains indispensable.
Rating¶
- Novelty: ⭐⭐⭐⭐ Spherical grid simplification + systematic analysis provides strong novelty.
- Experimental Thoroughness: ⭐⭐⭐⭐⭐ Highly comprehensive with micro-benchmarks + end-to-end training.
- Writing Quality: ⭐⭐⭐⭐⭐ Thorough analysis and rich illustrations.
- Value: ⭐⭐⭐⭐⭐ Direct guiding significance for the practice of equivariant networks.