Integration Matters for Learning PDEs with Backward SDEs¶
Conference: NeurIPS 2025 arXiv: 2505.01078 Code: https://github.com/sungje-park/heunbsde Area: Scientific Computing / PDE Solving Keywords: BSDE, PDE solving, Stratonovich integration, Heun method, discretization bias
TL;DR¶
This paper identifies the root cause of why standard BSDE methods underperform PINNs — an irreducible discretization bias introduced by Euler-Maruyama integration — and proposes Heun-BSDE based on the Stratonovich formulation to fully eliminate this bias, achieving competitive performance against PINNs on high-dimensional PDEs.
Background & Motivation¶
Background: Two major deep learning approaches exist for solving high-dimensional PDEs: PINNs (directly minimizing PDE residuals) and BSDE methods (reformulating PDEs as forward-backward SDEs and simulating trajectories). BSDE methods have natural advantages for problems with underlying dynamics, such as stochastic optimal control.
Limitations of Prior Work: Empirically, BSDE methods perform significantly worse than PINNs, yet the underlying reason has remained unclear. Prior work [33] proposed a hybrid interpolation loss to narrow the gap, but introduced hyperparameters requiring tuning without explaining the root cause.
Key Challenge: When the standard BSDE discretizes the one-step consistency loss using Euler-Maruyama (EM) integration, it introduces an irreducible bias term independent of the step size \(\tau\): \(\text{Bias}(\theta) = \frac{1}{2T}\int_0^T \mathbb{E}\text{tr}((H \cdot \nabla^2 u_\theta)^2)dt\), causing the optimization objective to deviate from the true solution.
Goal: (1) Identify the root cause of the BSDE vs. PINNs performance gap; (2) Propose a bias-free integration scheme.
Key Insight: Reinterpreting the BSDE as a Stratonovich SDE (rather than an Itô SDE) and applying stochastic Heun integration (which converges to the Stratonovich solution) to eliminate the bias.
Core Idea: Replace Itô + EM integration with Stratonovich + Heun integration to eliminate the discretization bias in the BSDE loss.
Method¶
Overall Architecture¶
The forward-backward SDE system for PDE solving is reformulated in Stratonovich form and discretized using the stochastic Heun method (a two-step predictor-corrector scheme), yielding an unbiased one-step consistency loss.
Key Designs¶
-
Bias Analysis (Theorem 4.2):
- Function: Theoretically proves that the bias in the EM one-step loss is irreducible.
- Mechanism: \(\tau^{-2} \cdot \ell_{\text{EM},\tau}(\theta,x,t) = (R[u_\theta])^2 + \frac{1}{2}\text{tr}((H \cdot \nabla^2 u_\theta)^2) + O(\tau^{1/2})\), where the second term is a \(\tau\)-independent bias that persists even as \(\tau \to 0\).
- Design Motivation: Explains why reducing the step size fails to improve EM-BSDE.
-
Stratonovich-Heun BSDE Loss (Theorem 4.4):
- Function: Proposes an unbiased loss function.
- Mechanism: \(L_{\text{Heun},\tau}(\theta) = \frac{1}{T}\int_0^T \mathbb{E}[(R[u_\theta])^2]dt + O(\tau^{1/2})\) — the bias term is only \(O(\tau^{1/2})\) and can be eliminated by reducing the step size.
- Design Motivation: The Heun method is a second-order predictor-corrector scheme that naturally converges to the Stratonovich solution.
-
Efficient Sub-sampling Implementation:
- Function: Accelerates training.
- Mechanism: A full forward SDE trajectory is rolled out (with stop gradient), followed by random sub-sampling of \(B\) time steps to compute the loss, rather than using all \(N\) steps.
- Design Motivation: Reduces per-step computational cost while maintaining performance.
Loss & Training¶
- Heun discretization requires one additional function evaluation (predictor step + corrector step) but permits larger step sizes.
- The one-step loss (\(k=1\)) suffices; multi-step loss hyperparameter tuning is unnecessary.
Key Experimental Results¶
Main Results (100D PDE, one-step loss)¶
| PDE | PINNs | FS-PINNs | EM-BSDE | Heun-BSDE |
|---|---|---|---|---|
| 100D HJB | 0.1260 | 0.0737 | 0.3626 | 0.0493 |
| 100D BSB | 1.5066 | 0.0497 | 0.3735 | 0.0535 |
| 10D BZ | 3.8566 | 0.0351 | 0.1903 | 0.0228 |
Ablation Study (varying step count \(k\))¶
| Steps \(k\) | EM-BSDE (100D HJB) | Heun-BSDE (100D HJB) |
|---|---|---|
| \(k=1\) | 0.3626 | 0.0493 |
| \(k=5\) | 0.2117 | 0.0640 |
| \(k=50\) | 0.0858 | 0.0601 |
Key Findings¶
- The EM-BSDE bias is indeed the root cause of the performance gap: EM-BSDE is substantially outperformed by Heun-BSDE across all experiments.
- Heun-BSDE is competitive with PINNs: It surpasses PINNs on HJB and BZ, and approaches PINNs on BSB.
- EM-BSDE requires multiple steps to partially mitigate bias, but multi-step formulations introduce optimization difficulties; Heun-BSDE achieves strong results with a single step.
- Sub-sampling incurs negligible performance loss while substantially accelerating training.
Highlights & Insights¶
- A neglected algorithmic detail determines overall method success: The choice of integration scheme (EM vs. Heun) has never been seriously studied in the BSDE literature, yet it is the root cause of the performance gap. This serves as a reminder that implementation details may matter more than methodological innovations.
- Theory-driven algorithmic improvement: The problem is identified and the solution designed through rigorous bias analysis rather than empirical trial-and-error, with theory and experiments in perfect agreement.
- The importance of Stratonovich vs. Itô in numerical implementation: Although the two formulations are equivalent in the continuous limit, the Stratonovich form is more amenable to numerical discretization.
Limitations & Future Work¶
- The Heun method requires two function evaluations per step (vs. one for EM), doubling computational cost.
- Experiments are conducted on only three standard PDE benchmarks and have not been validated on more complex real-world problems.
- All methods perform poorly on the 100D BZ problem (RL2 > 1.7), indicating that high-dimensional coupled FBSDEs remain highly challenging.
- Integration with adaptive step-size strategies is not discussed.
Related Work & Insights¶
- vs. PINNs: PINNs directly minimize PDE residuals and do not suffer from integration bias, but require explicit knowledge of the PDE; BSDE methods can learn from simulation.
- vs. interpolation loss in [33]: [33] mitigates bias by tuning the optimal number of steps; Heun-BSDE eliminates bias fundamentally, requiring no such tuning.
Rating¶
- Novelty: ⭐⭐⭐⭐ Identifies and explains a previously overlooked yet important issue with an elegant solution.
- Experimental Thoroughness: ⭐⭐⭐⭐ Theory and experiments are mutually corroborating, with thorough ablations.
- Writing Quality: ⭐⭐⭐⭐⭐ Mathematically rigorous and clearly presented.
- Value: ⭐⭐⭐⭐ Makes an important contribution to the BSDE-PDE community.