Lagrangian neural ODEs: Measuring the existence of a Lagrangian with Helmholtz metrics¶

Conference: NeurIPS 2025 arXiv: 2510.06367 Code: GitHub Area: Physics-Informed Learning / Neural ODE Keywords: Neural ODE, Lagrangian mechanics, Helmholtz conditions, physics regularization, Euler-Lagrange equations

TL;DR¶

This paper proposes Helmholtz metrics — differentiable metrics derived from the Helmholtz conditions — to quantify how closely a given ODE approximates the Euler-Lagrange equations. These metrics are incorporated as regularization terms into second-order Neural ODE training, forming Lagrangian Neural ODEs that guide the model toward true physical laws with zero additional inference overhead.

Background & Motivation¶

Neural ODEs are powerful tools for learning dynamical systems from data, capable of learning ODEs of the form \(\dot{s} = h_\theta(t, s)\). However, not all ODEs carry physical meaning — the stationary action principle fundamental to physics requires that system trajectories satisfy the Euler-Lagrange equations. Standard Neural ODEs provide no mechanism to ensure that the learned ODE constitutes an Euler-Lagrange equation, potentially yielding unphysical solutions.

The core problem has two aspects: (1) the identification problem: how to differentiably quantify how closely an ODE approximates the Euler-Lagrange equations; and (2) the learning problem: how to guide Neural ODEs during training to converge toward true Euler-Lagrange equations.

Existing approaches such as Lagrangian Neural Networks (LNNs) directly predict a Lagrangian and derive the ODE from it, but this requires computing the Euler-Lagrange equations in both forward and backward passes, incurring high computational cost and poor stability. This paper adopts an inverse approach: learn the ODE directly, then verify whether it satisfies the Lagrangian structure via the Helmholtz conditions.

Method¶

Overall Architecture¶

The model consists of three networks: \(f_{\theta_1}\) models the acceleration \(\ddot{x}\), \(g_{\theta_2}\) learns the Hessian matrix of the Lagrangian, and \(\text{NN}_{\theta_3}\) predicts initial velocities from initial positions. Training jointly optimizes the regression loss \(\mathcal{L}_R\) and the Helmholtz metric regularization term \(\mathcal{L}_H\); at inference time, only \(f_{\theta_1}\) and \(\text{NN}_{\theta_3}\) are used.

Key Designs¶

Differentiable Implementation of Helmholtz Metrics:
- Function: Transforms the Helmholtz conditions into loss functions optimizable via neural networks.
- Mechanism: Defines auxiliary quantities \(\Phi\), parameterizes the Hessian matrix \(g\) with a neural network \(g_{\theta_2}\), and minimizes the MSE of residuals from three Helmholtz conditions. The minimum absolute eigenvalue \(\lambda_{\min}\) normalizes the residuals to prevent the network from "cheating" by learning small eigenvalues.
- Design Motivation: A differentiable, trainable metric is needed to quantify whether an ODE originates from a Lagrangian while avoiding degenerate solutions.
Multi-Objective Optimization Strategy:
- Function: Jointly optimizes the regression loss and the Helmholtz metric.
- Mechanism: The total loss is \(\mathcal{L}_{\text{tot}} = \mathcal{L}_R + \mathcal{L}_H\). Gradient clipping (clipping \(\|\nabla_{\theta_1} \mathcal{L}_H\|\) to \(c_1 \approx 0.05\)) ensures data dominates in the early training phase, preventing convergence to an incorrect Euler-Lagrange equation.
- Design Motivation: Overly strong regularization causes the model to converge to physical laws inconsistent with the data.
Zero Additional Inference Overhead:
- Function: Helmholtz metrics are used only during training and are entirely absent at inference time.
- Mechanism: \(g_{\theta_2}\) is computed and optimized only during training; at inference, only \(f_{\theta_1}\) is needed to evaluate the right-hand side of the ODE.
- Design Motivation: This is a core advantage over LNNs, which require computing Euler-Lagrange equations via automatic differentiation at inference time, incurring significant overhead.

Loss & Training¶

Regression loss: \(\mathcal{L}_R = \text{MSE}(x_{\text{pred}}, x_{\text{data}})\)
Helmholtz regularization: \(\mathcal{L}_H = \text{MSE}(\sum_i \mathcal{R}_i / \lambda_{\min})\)
Training techniques: progressive time step inclusion (gradually increasing the number of time steps to avoid local minima); the output of \(g_{\theta_2}\) is processed with a \(\sinh\) transformation to handle exponential behavior.
Network architecture: \(f_{\theta_1}\) (1 layer × 16), \(g_{\theta_2}\) (2 layers × 64), \(\text{NN}_{\theta_3}\) (3 layers × 16), Softplus activation; RAdam optimizer, batch size 128.

Key Experimental Results¶

Main Results¶

System	Helmholtz Metric Behavior	Remarks
Undamped oscillator	\(\mathcal{L}_H\) decreases significantly	Lagrangian exists
Kepler problem	\(\mathcal{L}_H\) decreases significantly	Lagrangian exists
Damped oscillator (time-independent \(g\))	\(\mathcal{L}_H\) fails to decrease	No time-independent Lagrangian exists
Damped oscillator (time-dependent \(g\))	\(\mathcal{L}_H\) decreases significantly	Time-dependent Lagrangian exists
Non-Lagrangian ODE	\(\mathcal{L}_H\) improves only marginally	Correctly identified as having no Lagrangian

Ablation Study¶

Comparison of 40 pairs of regularized vs. unregularized models (MSE ratio \(R = \exp(l_{\text{reg}} - l_{\text{unreg}})\)):

Evaluation Dimension	MSE Ratio \(R\)	Significance
Position \(x\) (in-distribution)	< 1	Significant (Welch's t-test)
Velocity \(\dot{x}\)	<< 1	Highly significant
Acceleration \(\ddot{x}\)	<< 1	Highly significant
Extrapolation (2× training time)	<< 1	Highly significant

Key Findings¶

Helmholtz metrics accurately distinguish between Lagrangian and non-Lagrangian systems.
The learned \(g\) closely matches the analytical Lagrangian Hessian (Kepler problem: median error \(3.7 \times 10^{-4}\)).
Regularization substantially improves learning accuracy for velocity and acceleration, with particularly notable gains in extrapolation performance.

Highlights & Insights¶

Elegant inverse approach: Rather than directly modeling the Lagrangian as in LNNs, this work learns the ODE and subsequently verifies the existence of a Lagrangian, avoiding the overhead of forward Euler-Lagrange computation.
Physical diagnostic capability: Beyond improving learning, the framework can diagnose whether a system is physical — the Helmholtz metric fails to converge for damped systems under time-independent settings, correctly reflecting the non-fundamental nature of damping.
Solid theoretical foundation: Grounded in Douglas's classical Helmholtz condition theory (1939/1941), the work bridges century-old mathematical tools with modern deep learning.

Limitations & Future Work¶

Validation is limited to low-dimensional (2D) toy systems; scalability to high-dimensional and more complex systems remains untested.
No systematic quantitative comparison against LNNs or Hamiltonian Neural Networks is provided.
Numerical stability may become problematic in higher dimensions (eigenvalue computation, robustness of gradient clipping).
The expressive capacity of \(g_{\theta_2}\) may be insufficient when the system's Lagrangian takes a highly complex form.

Lagrangian Neural Networks (LNNs): A forward approach — predicts the Lagrangian to derive the ODE; the present work takes the inverse approach.
Hamiltonian Neural Networks: An analogous idea formulated within the equivalent Hamiltonian framework.
Physics-Informed Neural Networks (PINNs): A broader paradigm for learning with physical constraints.
Implications for ML in the physical sciences: Helmholtz metrics can serve as a general diagnostic tool for learning physical systems.

Rating¶

Novelty: ⭐⭐⭐⭐ Innovatively applies classical Helmholtz conditions as regularization for Neural ODEs.
Experimental Thoroughness: ⭐⭐⭐ Validation systems are relatively simple; comparisons with competing methods are absent.
Writing Quality: ⭐⭐⭐⭐ Mathematical derivations are clear; physical intuition is richly conveyed.
Value: ⭐⭐⭐⭐ Introduces a new regularization paradigm for Physics-Informed ML.