Deep Legendre Transform¶
Conference: NeurIPS 2025 arXiv: 2512.19649 Code: GitHub Area: Others (Numerical Methods / Convex Optimization / Deep Learning) Keywords: Convex conjugate, Legendre transform, ICNN, KAN, a posteriori error estimation, Hamilton-Jacobi equations
TL;DR¶
DLT exploits the implicit Fenchel representation of convex conjugates, \(f^*(\nabla f(x)) = \langle x, \nabla f(x) \rangle - f(x)\), to reformulate conjugate computation as a standard regression problem, thereby avoiding max/min-max optimization. The method also admits a posteriori error estimation, and when combined with KAN, yields exact closed-form solutions.
Background & Motivation¶
Background: The convex conjugate (Legendre–Fenchel transform) is a fundamental operation in convex analysis, optimization, physics, and economics, defined as \(f^*(y) = \sup_{x \in C}\{\langle x,y\rangle - f(x)\}\).
Limitations of Prior Work: - Classical grid-based methods have complexity \(O(N^d)\) and suffer from the curse of dimensionality, being practical only for \(d \leq 4\). - Neural network-based approaches (used in optimal transport) require solving complex max-min problems. - Learning the inverse map of \(\nabla f\) (Korotin et al.) demands additional optimization design.
Key Challenge: High-dimensional convex conjugate computation lacks efficient, scalable numerical methods with error guarantees.
Key Insight: For differentiable convex functions, the implicit identity \(f^*(\nabla f(x)) = \langle x, \nabla f(x)\rangle - f(x)\) holds—directly providing a training objective that requires no optimization problem to be solved.
Method¶
Implicit Fenchel Formula¶
For a differentiable convex function \(f: C \to \mathbb{R}\) (where \(C \subseteq \mathbb{R}^d\) is an open convex set):
This yields exact target values at points \(\nabla f(x)\) without requiring knowledge of the analytic form of \(f^*\).
DLT Training¶
A parametric model \(g_\theta: D \to \mathbb{R}\) is trained to approximate \(f^*\) by minimizing:
Key advantages: - Training targets are exact evaluations of \(f^*(\nabla f(x))\), with no approximation error. - The objective is standard MSE regression, solvable directly via SGD. - Using an ICNN as \(g_\theta\) guarantees a convex output.
A Posteriori Error Estimation¶
The same implicit formula enables estimation of the \(L^2\) error on an independent test set \(\mathcal{X}_{\text{test}}\):
where \(\nu = \mu \circ (\nabla f)^{-1}\) is the pushforward of the sampling measure. This allows accurate error evaluation even when \(f^*\) is unknown—a unique advantage of the proposed method.
Sampling in Gradient Space¶
When \(\nabla f\) severely distorts \(C\), uniform sampling over \(C\) leads to non-uniform coverage of \(D\). The proposed solution: 1. Train a network \(h_\vartheta: D \to \mathbb{R}^d\) to approximate the generalized inverse of \(\nabla f\). 2. Pre-training: \(\min_\vartheta \sum_{x \in \mathcal{W}_{\text{train}}} \|\nabla f(h_\vartheta(\nabla f(x))) - \nabla f(x)\|_2^2\) 3. Fine-tuning: add \(\min_\vartheta \sum_{y \in \mathcal{Y}_{\text{train}}} \|\nabla f(h_\vartheta(y)) - y\|_2^2\)
Model Selection¶
- MLP/ResNet: General-purpose approximation; ResNet achieves the best accuracy but does not guarantee convexity.
- ICNN: Guarantees convex output, suitable for downstream optimization tasks.
- KAN (Kolmogorov–Arnold Networks): Enables symbolic regression to obtain exact closed-form solutions.
Key Experimental Results¶
DLT vs. Direct Learning (\(d=10\) and \(d=50\))¶
| Function | Model | RMSE (DLT / Direct) \(d\)=10 | RMSE (DLT / Direct) \(d\)=50 |
|---|---|---|---|
| Quadratic | MLP | 2.96e-5 / 2.86e-5 | 3.55e-3 / 3.59e-3 |
| Quadratic | ResNet | 2.43e-5 / 3.00e-5 | 2.92e-3 / 3.25e-3 |
| Neg-Log | ResNet | 7.34e-4 / 9.22e-4 | 1.05e-1 / 1.23e-1 |
| Neg-Entropy | ResNet | 6.62e-4 / 7.41e-4 | 6.93e-2 / 7.27e-2 |
DLT achieves accuracy on par with or superior to direct learning (especially with ResNet), with comparable training time. ICNN exhibits larger errors in high dimensions, and MLP-ICNN (without skip connections) performs worst.
Architecture Comparison¶
| Architecture | Accuracy (\(d=10\)) | Convexity Guarantee | Symbolic Regression |
|---|---|---|---|
| MLP | High | ✗ | ✗ |
| ResNet | Highest | ✗ | ✗ |
| ICNN | Moderate | ✓ | ✗ |
| KAN | High | ✗ | ✓ |
Exact Closed-Form Solutions via KAN¶
For specific convex functions (e.g., \(f(x) = \frac{1}{2}\sum x_i^2\)), KAN combined with symbolic regression recovers the exact conjugate \(f^*(y) = \frac{1}{2}\sum y_i^2\)—a result rarely achievable by purely numerical methods.
Hamilton-Jacobi PDE Application¶
DLT is applied to compute initial conditions for Hamilton-Jacobi equations via the Hopf formula, demonstrating practical utility in PDE-related settings.
Highlights & Insights¶
- The implicit formula reframing is remarkably elegant: converting a classical \(\sup\) problem into MSE regression is conceptually simple yet previously overlooked.
- Free a posteriori error estimation is a distinctive advantage—the accuracy of any approximation can be assessed without knowledge of the true \(f^*\).
- KAN's ability to recover exact analytic solutions demonstrates the potential of deep learning in symbolic computation.
- Integration with ICNN guarantees convexity, offering direct value for downstream tasks such as optimal transport.
- The method escapes the curse of dimensionality, remaining effective at \(d=50\).
Limitations & Future Work¶
- Applicable only to differentiable convex functions; non-smooth convex functions require alternative approaches.
- Requires access to \(f(x)\) and \(\nabla f(x)\); black-box functions are not supported.
- Sampling strategies become more complex when the image \(D\) of \(\nabla f\) is unbounded.
- ICNN accuracy degrades notably in high dimensions (\(d=50\)), indicating a non-trivial cost for enforcing convexity.
- No end-to-end comparison with optimal transport solvers is provided.
Rating¶
- Novelty: ⭐⭐⭐⭐ The ML application of the implicit Fenchel formula offers a fresh perspective; exact solutions via KAN are a notable highlight.
- Experimental Thoroughness: ⭐⭐⭐⭐ Covers multiple architectures × dimensions × function types × HJ-PDE applications.
- Writing Quality: ⭐⭐⭐⭐⭐ Mathematically rigorous, clearly presented, with strong integration of theory and experiments.
- Value: ⭐⭐⭐⭐ Provides practical utility to the convex analysis and optimal transport communities.