Hamiltonian Neural PDE Solvers through Functional Approximation¶
Conference: NeurIPS 2025 arXiv: 2505.13275 Code: GitHub Area: Scientific Computing / Neural PDE Solvers Keywords: Hamiltonian mechanics, PDE solving, functional approximation, neural fields, energy conservation
TL;DR¶
Grounded in the Riesz representation theorem, this work approximates infinite-dimensional Hamiltonian functionals via learnable integral kernel functionals (IKF). Functional derivatives are obtained through automatic differentiation, yielding an energy-conserving neural PDE solver (HNS) that demonstrates superior stability and generalization on 1D/2D PDEs.
Background & Motivation¶
Background: Neural PDE solvers (FNO, U-Net, etc.) have achieved notable progress in parametric PDE solving, yet the vast majority operate within a Newtonian framework—directly predicting the next-step state or time derivative—without exploiting the conservation structure inherent in physical systems.
Limitations of Prior Work: Hamiltonian Neural Networks (HNNs) have demonstrated the ability to enforce conservation laws in discrete particle systems, but HNNs are limited to finite-dimensional settings (e.g., \(N\)-body problems), where the Hamiltonian is a function \(\mathcal{H}(\mathbf{q}, \mathbf{p}): \mathbb{R}^{2n} \to \mathbb{R}\). Most practical PDEs, however, describe continuous fields (fluids, waves, elasticity) and require infinite-dimensional Hamiltonian mechanics, in which the Hamiltonian is a functional \(\mathcal{H}[u]: \mathcal{F}(\Omega) \to \mathbb{R}\) and evolution is governed by the functional derivative \(\delta\mathcal{H}/\delta u\).
Key Challenge: Extending the Hamiltonian framework to PDEs presents two fundamental challenges: (a) one must approximate a mapping from a function space to a scalar (a functional), for which traditional neural networks are not designed; and (b) the approximated functional must admit accurate functional derivatives to drive the temporal evolution via Hamilton's equations.
Goal: Design a neural network architecture capable of learning Hamiltonian functionals and accurately computing their functional derivatives, thereby constructing a neural PDE solver that respects the Hamiltonian structure.
Key Insight: The Riesz representation theorem from functional analysis—any continuous linear functional can be expressed as an inner product \(\mathcal{H}[u] = \langle u, \kappa_\theta \rangle\)—reduces the problem of functional approximation to one of function approximation, the latter being a natural strength of neural networks.
Core Idea: Approximate the Hamiltonian functional via an integral kernel parameterized by a neural field; obtain functional derivatives through automatic differentiation; and construct a conservation-respecting PDE solver.
Method¶
Overall Architecture¶
The HNS (Hamiltonian Neural Solver) proceeds as follows:
- Forward pass: Given the current field state \(\mathbf{u}^t\) and coordinates \(\mathbf{x}\), compute the Hamiltonian scalar \(\mathcal{H}_\theta\) via the Integral Kernel Functional (IKF).
- Backward pass: Compute the functional derivative \(\delta\mathcal{H}_\theta / \delta u\) via automatic differentiation.
- Temporal evolution: Apply the known linear operator \(\mathcal{J}\) (e.g., \(\partial_x\)) to obtain the time derivative \(\partial u / \partial t = \mathcal{J}(\delta\mathcal{H}_\theta / \delta u)\).
- Numerical integration: Advance the state \(\mathbf{u}^{t+1}\) using a second-order Adams–Bashforth scheme.
Key Designs¶
1. Integral Kernel Functional (IKF)¶
- Function: Represents the functional \(\mathcal{H}[u]\) in an integral kernel form.
- Mechanism: By the Riesz representation theorem, \(\mathcal{H}[u] = \int_\Omega \kappa_\theta(x, u(x)) \cdot u(x) \, dx\), discretized via Riemann summation as \(\mathcal{H}_\theta \approx \sum_i \kappa_\theta(x_i, u_i) \cdot u_i \cdot \mu_i \Delta x\).
- Design Motivation: While superficially similar to the integral kernel operators in neural operators, the IKF is fundamentally distinct—operator outputs are functions requiring per-query summation (necessitating Fourier truncation for efficiency), whereas the IKF outputs a scalar computed by a single Riemann sum, preserving full accuracy.
2. SIREN + FiLM-Conditioned Kernel Parameterization¶
- Function: Parameterizes the kernel function \(\kappa_\theta\) using a neural field architecture.
- Mechanism: A sinusoidal representation network (SIREN) serves as the kernel backbone, with each layer conditioned via FiLM (Feature-wise Linear Modulation): \(\kappa_\theta^{(l)}(x_i, u_i) = \gamma_\theta^{(l)}(u_i) \sin(Wx_i + b) + \beta_\theta^{(l)}(u_i)\).
- Design Motivation: SIREN not only fits functions effectively but also yields accurate gradient representations (the derivative of a sine remains a sine), which facilitates precise functional derivative computation. FiLM enables local or global conditioning, allowing the kernel to depend on the input field.
3. Local vs. Global Conditioning¶
- Local conditioning: FiLM parameters depend solely on \(u_i = u(x_i)\); suitable for PDEs whose Hamiltonians contain only local terms (e.g., Advection).
- Global conditioning: FiLM parameters depend on the entire field \(\mathbf{u}^t\) (via a shallow 1D CNN); required for PDEs with nonlocal terms such as \(u_{xx}\) (e.g., KdV, SWE).
4. Implicit Functional Derivative Learning¶
- Key Finding: Training the IKF solely on functional values \(\mathcal{H}[u]\) (scalar supervision) implicitly yields accurate functional derivatives \(\delta\mathcal{H}/\delta u\) through automatic differentiation.
- Intuition: For a linear IKF, \(\nabla_\mathbf{u} \sum_i \kappa_\theta(x_i) u_i \mu_i \Delta x = [\kappa_\theta(x_1), \ldots, \kappa_\theta(x_n)]\); the functional derivative is precisely the discretization of the learned kernel.
Loss & Training¶
- Training loss: \(\mathcal{L} = \|\delta\mathcal{H}_\theta / \delta u - \delta\mathcal{H} / \delta u\|^2\), optimizing directly in the functional derivative domain.
- Supervision targets are obtained from analytic functional derivative expressions of the true Hamiltonian, evaluated via finite differences.
- Training in the functional derivative domain rather than in the temporal domain is preferred, as it directly optimizes the kernel \(\kappa_\theta\).
Key Experimental Results¶
Main Results¶
Toy: Linear/Nonlinear Functional Approximation (Table 1)¶
| Metric | MLP | FNO | IKF |
|---|---|---|---|
| Linear functional \(\mathcal{F}_l[u]\) (Base) | 2.47e-5 | 2.76e-4 | 3.00e-16 |
| Linear functional derivative (Base) | 0.083 | 0.066 | 1.15e-3 |
| Linear functional (OOD) | 0.046 | 0.268 | 2.13e-7 |
| Nonlinear functional \(\mathcal{F}_{nl}[u]\) (Base) | 0.029 | 0.016 | 2.05e-3 |
| Nonlinear functional derivative (Base) | 1.33 | 2.10 | 0.045 |
IKF substantially outperforms MLP and FNO on both functional values and functional derivatives, while using the fewest parameters (4.3K vs. MLP 7.5K vs. FNO 10.9K).
1D PDEs: Advection + KdV (Table 2)¶
| Model | Adv Rollout Err ↓ | KdV Corr Time ↑ |
|---|---|---|
| FNO | 0.83 ± 0.17 | 68.0 ± 10.5 |
| Unet | 0.40 ± 0.29 | 134.0 ± 10.6 |
| FNO(du/dt) | 0.048 ± 0.007 | 77.6 ± 3.6 |
| Unet(du/dt) | 0.057 ± 0.024 | 113.3 ± 15.0 |
| HNS | 0.0039 ± 0.0008 | 151.1 ± 3.0 |
HNS surpasses all baselines with roughly half the parameters (Adv: 32K vs. 65K; KdV: 87K vs. 135K). The Advection rollout error is an order of magnitude lower than the best baseline.
2D Shallow Water Equations (Table 3)¶
| Model | Sines (in-dist) ↓ | Pulse (OOD) ↓ | Params |
|---|---|---|---|
| Transolver | 0.084 | 0.122 | 4M |
| FNO | 0.057 | 0.117 | 7M |
| PINO | 0.053 | 0.114 | 7M |
| Unet | 0.010 | 0.042 | 3M |
| HNS | 0.026 | 0.021 | 3M |
HNS achieves a substantial lead in OOD generalization on Pulse initial conditions (0.021 vs. Unet 0.042) while maintaining energy conservation.
Ablation Study¶
| Ablation | Key Finding |
|---|---|
| Kernel type (linear vs. nonlinear) | Nonlinear kernels are necessary for complex Hamiltonians |
| Conditioning (local vs. global) | Global conditioning is required for PDEs containing \(u_{xx}\) terms (e.g., KdV) |
| SIREN vs. MLP kernel | SIREN yields more accurate gradient representations and superior performance |
| Quadrature method | The trapezoidal rule provides sufficient accuracy |
Key Findings¶
- Energy conservation: The Hamiltonian along HNS-predicted trajectories remains nearly constant over time, while those of FNO/U-Net exhibit significant drift.
- Temporal extrapolation: Trained on short time windows (Adv: 0–4 s; KdV: 0–25 s), HNS stably extrapolates to much longer horizons (Adv: 0–20 s; KdV: 0–100 s).
- OOD generalization: On 2D SWE, HNS maintains Hamiltonian conservation even under OOD initial conditions, demonstrating a strong inductive bias.
- Parameter efficiency: Kernel weights are shared across all input points \((x_i, u_i)\), resulting in approximately half the parameter count of baselines.
Highlights & Insights¶
- Elegant theoretical foundation: Reducing functional approximation to function approximation via the Riesz representation theorem is a natural and profound insight.
- Implicit derivative learning: Accurate functional derivatives are attained from scalar supervision alone—a capability unavailable to MLP/FNO architectures—making the Hamiltonian framework viable for PDE solving.
- Unified perspective: IKF simultaneously connects the literature on functional analysis, neural fields, and neural operators.
- Practical parameter efficiency: The weight-sharing mechanism enables HNS to achieve better performance with fewer parameters.
- Instructive negative result: Adding an auxiliary loss on the Hamiltonian value is detrimental (the model may learn a trivial constant mapping), revealing subtle constraints in enforcing Hamiltonian conservation.
Limitations & Future Work¶
- Incomplete theory for nonlinear functionals: The Riesz theorem guarantees approximation only for linear functionals; the nonlinear case lacks rigorous theoretical support.
- Inference overhead: Computing functional derivatives via backpropagation and evaluating the linear operator \(\mathcal{J}\) makes inference slower than direct prediction methods.
- Scalability: Reliance on numerical integration (Riemann summation) raises concerns about scaling to high-resolution grids.
- Restricted to Hamiltonian systems: The method cannot directly handle dissipative systems; extensions (e.g., the GENERIC framework) require additional work.
- Numerical error in \(\mathcal{J}\): Finite-difference approximations of \(\mathcal{J}\) may introduce numerical errors, particularly for higher-order derivative terms.
- Future directions: Integration with symplectic integrators; extension to 3D or more complex geometries; exploration of learning \(\mathcal{J}\).
Related Work & Insights¶
- HNN family: Greydanus et al. (2019) introduced discrete HNNs; the present work is a natural extension to the infinite-dimensional setting.
- Neural operators: The integral kernel operator in FNO shares a deep connection with IKF (IKF can be viewed as a functional version of an operator evaluated at a single point).
- Neural fields (SIREN, NeRF, etc.): Provide mature tools for kernel parameterization.
- ML for DFT: Learning energy functionals in density functional theory is conceptually analogous to IKF.
- Insights: The Hamiltonian framework not only provides conservation guarantees but also embodies a paradigm of "learning energy rather than forces," with potentially significant implications for molecular dynamics, atmospheric simulation, and related fields.
Rating¶
- Novelty: ⭐⭐⭐⭐⭐ — First extension of the Hamiltonian framework to infinite-dimensional neural PDE solving; theoretically novel perspective.
- Experimental Thoroughness: ⭐⭐⭐⭐ — Progressive validation from toy problems to 2D SWE, though 3D experiments and larger-scale validation are absent.
- Writing Quality: ⭐⭐⭐⭐⭐ — Theoretical derivations are clear; the narrative from discrete to continuous is exceptionally coherent.
- Value: ⭐⭐⭐⭐ — Introduces a new paradigm for physics-informed ML, though practical applicability is limited to Hamiltonian systems.