Skip to content

Deep Legendre Transform

Conference: NeurIPS 2025 arXiv: 2512.19649 Code: GitHub Area: Others (Numerical Methods / Convex Optimization / Deep Learning) Keywords: Convex conjugate, Legendre transform, ICNN, KAN, a posteriori error estimation, Hamilton-Jacobi equations

TL;DR

DLT exploits the implicit Fenchel representation of convex conjugates, \(f^*(\nabla f(x)) = \langle x, \nabla f(x) \rangle - f(x)\), to reformulate conjugate computation as a standard regression problem, thereby avoiding max/min-max optimization. The method also admits a posteriori error estimation, and when combined with KAN, yields exact closed-form solutions.

Background & Motivation

Background: The convex conjugate (Legendre–Fenchel transform) is a fundamental operation in convex analysis, optimization, physics, and economics, defined as \(f^*(y) = \sup_{x \in C}\{\langle x,y\rangle - f(x)\}\).

Limitations of Prior Work: - Classical grid-based methods have complexity \(O(N^d)\) and suffer from the curse of dimensionality, being practical only for \(d \leq 4\). - Neural network-based approaches (used in optimal transport) require solving complex max-min problems. - Learning the inverse map of \(\nabla f\) (Korotin et al.) demands additional optimization design.

Key Challenge: High-dimensional convex conjugate computation lacks efficient, scalable numerical methods with error guarantees.

Key Insight: For differentiable convex functions, the implicit identity \(f^*(\nabla f(x)) = \langle x, \nabla f(x)\rangle - f(x)\) holds—directly providing a training objective that requires no optimization problem to be solved.

Method

Implicit Fenchel Formula

For a differentiable convex function \(f: C \to \mathbb{R}\) (where \(C \subseteq \mathbb{R}^d\) is an open convex set):

\[f^*(\nabla f(x)) = \langle x, \nabla f(x) \rangle - f(x), \quad x \in C\]

This yields exact target values at points \(\nabla f(x)\) without requiring knowledge of the analytic form of \(f^*\).

DLT Training

A parametric model \(g_\theta: D \to \mathbb{R}\) is trained to approximate \(f^*\) by minimizing:

\[\min_{\theta \in \Theta} \sum_{x \in \mathcal{X}_{\text{train}}} \left(g_\theta(\nabla f(x)) + f(x) - \langle x, \nabla f(x)\rangle\right)^2\]

Key advantages: - Training targets are exact evaluations of \(f^*(\nabla f(x))\), with no approximation error. - The objective is standard MSE regression, solvable directly via SGD. - Using an ICNN as \(g_\theta\) guarantees a convex output.

A Posteriori Error Estimation

The same implicit formula enables estimation of the \(L^2\) error on an independent test set \(\mathcal{X}_{\text{test}}\):

\[\frac{1}{|\mathcal{X}_{\text{test}}|}\sum_{x \in \mathcal{X}_{\text{test}}} (g(\nabla f(x)) - f^*(\nabla f(x)))^2 \to \|g - f^*\|^2_{L^2(D, \nu)}\]

where \(\nu = \mu \circ (\nabla f)^{-1}\) is the pushforward of the sampling measure. This allows accurate error evaluation even when \(f^*\) is unknown—a unique advantage of the proposed method.

Sampling in Gradient Space

When \(\nabla f\) severely distorts \(C\), uniform sampling over \(C\) leads to non-uniform coverage of \(D\). The proposed solution: 1. Train a network \(h_\vartheta: D \to \mathbb{R}^d\) to approximate the generalized inverse of \(\nabla f\). 2. Pre-training: \(\min_\vartheta \sum_{x \in \mathcal{W}_{\text{train}}} \|\nabla f(h_\vartheta(\nabla f(x))) - \nabla f(x)\|_2^2\) 3. Fine-tuning: add \(\min_\vartheta \sum_{y \in \mathcal{Y}_{\text{train}}} \|\nabla f(h_\vartheta(y)) - y\|_2^2\)

Model Selection

  • MLP/ResNet: General-purpose approximation; ResNet achieves the best accuracy but does not guarantee convexity.
  • ICNN: Guarantees convex output, suitable for downstream optimization tasks.
  • KAN (Kolmogorov–Arnold Networks): Enables symbolic regression to obtain exact closed-form solutions.

Key Experimental Results

DLT vs. Direct Learning (\(d=10\) and \(d=50\))

Function Model RMSE (DLT / Direct) \(d\)=10 RMSE (DLT / Direct) \(d\)=50
Quadratic MLP 2.96e-5 / 2.86e-5 3.55e-3 / 3.59e-3
Quadratic ResNet 2.43e-5 / 3.00e-5 2.92e-3 / 3.25e-3
Neg-Log ResNet 7.34e-4 / 9.22e-4 1.05e-1 / 1.23e-1
Neg-Entropy ResNet 6.62e-4 / 7.41e-4 6.93e-2 / 7.27e-2

DLT achieves accuracy on par with or superior to direct learning (especially with ResNet), with comparable training time. ICNN exhibits larger errors in high dimensions, and MLP-ICNN (without skip connections) performs worst.

Architecture Comparison

Architecture Accuracy (\(d=10\)) Convexity Guarantee Symbolic Regression
MLP High
ResNet Highest
ICNN Moderate
KAN High

Exact Closed-Form Solutions via KAN

For specific convex functions (e.g., \(f(x) = \frac{1}{2}\sum x_i^2\)), KAN combined with symbolic regression recovers the exact conjugate \(f^*(y) = \frac{1}{2}\sum y_i^2\)—a result rarely achievable by purely numerical methods.

Hamilton-Jacobi PDE Application

DLT is applied to compute initial conditions for Hamilton-Jacobi equations via the Hopf formula, demonstrating practical utility in PDE-related settings.

Highlights & Insights

  • The implicit formula reframing is remarkably elegant: converting a classical \(\sup\) problem into MSE regression is conceptually simple yet previously overlooked.
  • Free a posteriori error estimation is a distinctive advantage—the accuracy of any approximation can be assessed without knowledge of the true \(f^*\).
  • KAN's ability to recover exact analytic solutions demonstrates the potential of deep learning in symbolic computation.
  • Integration with ICNN guarantees convexity, offering direct value for downstream tasks such as optimal transport.
  • The method escapes the curse of dimensionality, remaining effective at \(d=50\).

Limitations & Future Work

  • Applicable only to differentiable convex functions; non-smooth convex functions require alternative approaches.
  • Requires access to \(f(x)\) and \(\nabla f(x)\); black-box functions are not supported.
  • Sampling strategies become more complex when the image \(D\) of \(\nabla f\) is unbounded.
  • ICNN accuracy degrades notably in high dimensions (\(d=50\)), indicating a non-trivial cost for enforcing convexity.
  • No end-to-end comparison with optimal transport solvers is provided.

Rating

  • Novelty: ⭐⭐⭐⭐ The ML application of the implicit Fenchel formula offers a fresh perspective; exact solutions via KAN are a notable highlight.
  • Experimental Thoroughness: ⭐⭐⭐⭐ Covers multiple architectures × dimensions × function types × HJ-PDE applications.
  • Writing Quality: ⭐⭐⭐⭐⭐ Mathematically rigorous, clearly presented, with strong integration of theory and experiments.
  • Value: ⭐⭐⭐⭐ Provides practical utility to the convex analysis and optimal transport communities.