One-Shot Transfer Learning for Nonlinear PDEs with Perturbative PINNs¶
Conference: NeurIPS 2025 arXiv: 2511.11137 Code: None Area: Scientific Computing Keywords: PINNs, perturbation theory, transfer learning, partial differential equations, closed-form solution
TL;DR¶
By combining perturbation theory with PINNs, this work decomposes nonlinear PDEs into a sequence of linear subproblems. After learning the latent space of the linear operator via a Multi-Head PINN, transfer to new PDE instances is achieved through a closed-form solution within 0.2 seconds, attaining errors on the order of \(10^{-3}\).
Background & Motivation¶
Background: PINNs solve PDEs by embedding physical laws into neural networks, but each new instance typically requires retraining from scratch.
Limitations of Prior Work: Although strategies such as Multi-Head PINNs enable shared computation across multiple instances, their generalization to nonlinear PDEs remains limited, and cross-instance transfer still requires iterative optimization.
Goal: Extend one-shot transfer learning for nonlinear PDEs from ODEs to PDEs.
Core Idea: Treat the nonlinear term \(\epsilon P(u)\) as a perturbation, expand in powers of \(\epsilon\) to obtain a sequence of linear subproblems, and exploit a shared latent-space representation to enable closed-form transfer.
Method¶
Overall Architecture¶
For a nonlinear PDE \(\mathcal{D}u + \epsilon P(u) = f(x,t)\), where \(\mathcal{D}\) is a linear operator, \(P(u)\) is a polynomial perturbation, and \(\epsilon < 1\), the solution is expanded as \(u \approx \sum_{i=0}^p \epsilon^i u_i\), yielding \(p+1\) linear subproblems \(\mathcal{D}u_j = f_j\), where each \(f_j\) depends only on \(u_0, \ldots, u_{j-1}\). A Multi-Head PINN solves these linear subproblems and learns a latent-space representation of the operator \(\mathcal{D}\); the network backbone is then frozen, and closed-form weights for new instances are computed directly.
Key Designs¶
-
Perturbative Expansion:
- Function: Systematically reduces a nonlinear PDE to a sequence of linear subproblems.
- Mechanism: The solution is expanded in powers of the perturbation parameter \(\epsilon\) as \(u = \sum_{i=0}^p \epsilon^i u_i\). Substituting into the original equation and collecting terms of equal order yields \(p+1\) linear PDEs of the form \(\mathcal{D}u_j = f_j\). Initial and boundary conditions are imposed entirely on the zeroth-order problem \(\mathcal{D}u_0 = f\); higher-order problems use homogeneous conditions.
- Design Motivation: All linear subproblems share the same linear operator \(\mathcal{D}\), enabling reuse of its latent-space representation.
- Theoretical Basis: When \(\epsilon < 1\) and the polynomial coefficients satisfy \(P_l \leq 1\), higher-order terms decay progressively and the series converges.
-
Multi-Head PINN + Closed-Form Transfer:
- Function: Learns a reusable latent space and enables zero-iteration transfer.
- Mechanism: The network outputs \(u_k = H(x,t)W_k\), where \(H\) is the shared hidden-layer activation and \(W_k\) are head-specific weights. For a new instance \((f^*, g^*, B^*)\), \(H\) is frozen and the loss with respect to \(W^*\) constitutes a quadratic optimization problem. Since every term is convex in \(W^*\), setting \(\partial\mathcal{L}/\partial W^* = 0\) yields the closed-form solution \(W^* = M^{-1}(\cdot)\).
- Key Advantage: The matrix \(M\) depends only on \(\mathcal{D}\) and the sampling strategy, not on the specific \((f^*, g^*, B^*)\); therefore \(M^{-1}\) can be precomputed and reused across all new instances.
- Design Motivation: Eliminates the need to retrain for each new instance, enabling sub-second adaptation.
-
Iterative Solve and Combination:
- Function: Constructs the final nonlinear solution.
- Mechanism: \(u_0\) is first obtained via MH-PINN or closed-form transfer, then substituted into the expression for \(f_1\) to solve for \(u_1\), and so forth. The final solution is \(u = \sum_{i=0}^p \epsilon^i u_i\).
- Design Motivation: Each step requires solving only one linear problem, keeping the computational cost tractable.
Loss & Training¶
The physics-informed losses for individual heads are summed with weighting: \(\mathcal{L}_k = w_{pde}(\mathcal{D}u_k - f_k)^2 + w_{IC}(u_k(x,0) - g_k(x))^2 + w_{BC}\sum_\mu (u_k(\mu,t) - B_{\mu,k}(t))^2\), and the total loss is \(\mathcal{L} = \frac{1}{K}\sum_{k=0}^K \mathcal{L}_k\).
Key Experimental Results¶
Main Results¶
| PDE Type | Parameters | Relative Error | Transfer Time | Notes |
|---|---|---|---|---|
| KPP-Fisher | \(n_1=n_2=1\) | \(1.1 \times 10^{-3}\) | 0.149 s | \(p=10\), canonical parameters |
| KPP-Fisher | varying \(n_1,n_2\) | \(1.2 \times 10^{-3}\) | ~0.15 s | Different dynamics |
| KPP-Fisher | larger \(\epsilon\) | \(1.9 \times 10^{-2}\) | ~0.15 s | Near the boundary of validity |
| Wave equation | standard | comparable accuracy | ~0.15 s | Validates cross-operator capability |
Ablation Study¶
| Configuration | Key Finding | Notes |
|---|---|---|
| \(p\) vs. error | Error decreases then plateaus as \(p\) increases | Fully converged at \(p=10\) |
| \(\epsilon\) vs. error | A clear threshold exists | Solution diverges beyond the threshold |
| Classical solver comparison | Comparable accuracy | SciPy solve_ivp: 0.162 s |
Key Findings¶
- When \(\epsilon\) exceeds a threshold, the error grows sharply; the threshold correlates with solution amplitude—larger amplitudes require smaller \(\epsilon\).
- High-degree polynomials (>10) require smaller \(\epsilon\) values because perturbation terms grow faster.
- Transfer speed (0.149 s) is comparable to classical numerical solvers (0.162 s), with the advantage lying in reusing precomputed quantities across instances.
- Different values of \(n_1\) and \(n_2\) affect the propagation speed toward the equilibrium state: increasing \(n_2\) accelerates propagation toward the equilibrium value of 1, while increasing \(n_1\) decelerates it.
Highlights & Insights¶
- Closed-form transfer is the central contribution: once the latent space of the linear operator has been learned, adapting to a new instance requires only a matrix inversion with no gradient updates. This represents an important step toward advancing transfer learning from "few-shot fine-tuning" to "zero-shot computation."
- The method's scope of applicability is well-defined: \(\epsilon\) must be sufficiently small, and failure cases are easy to identify (solution divergence), a self-diagnostic property of considerable value in scientific computing.
- The approach extends to transfer between related operators: the latent space remains reusable when coefficients \(a_\alpha(t)\) vary slightly.
Limitations & Future Work¶
- Restricted to polynomial perturbation terms; nonlinear terms involving derivatives are not yet supported.
- Validated only on 2D PDEs (1D space + time); extension to higher dimensions is a future direction.
- The effective range of \(\epsilon\) is problem-dependent.
Rating¶
- Novelty: ⭐⭐⭐⭐ The combination of perturbation theory and PINN-based transfer is original.
- Experimental Thoroughness: ⭐⭐⭐ Validation scenarios are limited.
- Writing Quality: ⭐⭐⭐⭐ Mathematical derivations are clear.
- Value: ⭐⭐⭐ Offers specific value within the scientific computing domain.