Supervised Metric Regularization Through Alternating Optimization for Multi-Regime PINNs¶

Metadata¶

Conference: ICLR 2026
arXiv: 2602.09980
Code: Not released
Area: Scientific Computing / Physics-Informed Neural Networks
Keywords: PINN, metric learning, alternating optimization, bifurcation systems, Duffing oscillator, topology-aware

TL;DR¶

This paper proposes a Topology-Aware PINN (TAPINN) that structures the latent space via supervised metric regularization (Triplet Loss) and stabilizes training through an alternating optimization schedule. On the multi-regime Duffing oscillator benchmark, TAPINN reduces physics residuals by approximately 49% (0.082 vs. 0.160) and gradient variance by 2.18× compared to baselines.

Background & Motivation¶

Physics-Informed Neural Networks (PINNs) have shown promise for solving parameterized dynamical systems, but face fundamental challenges in systems with sharp regime transitions (e.g., bifurcations):

Spectral Bias: Standard MLPs struggle to approximate discontinuous or non-smooth solution dependencies on system parameters.

Mode Collapse: Networks tend to average across distinct physical behaviors rather than discriminating among them.

Jacobian Singularity at Bifurcation Points: Leads to ill-conditioned optimization.

Limitations of existing approaches: - HyperPINNs: Hypernetwork-generated weights incur high parameter counts (39,169 vs. 8,003). - MoE: Routing instability. - Both introduce additional architectural complexity.

Core Idea: Rather than adopting more complex architectures, TAPINN structures the latent space via metric learning so that it mirrors the separation of physical regimes.

Method¶

Overall Architecture¶

TAPINN = LSTM Encoder \(E\) + PINN Generator \(G\)

Encoder: \(z = E(\mathbf{x}_{\text{obs}})\), maps an observation window (first 100 time steps) to a latent vector \(z\).
Generator: \(\hat{\mathbf{x}}(t) = G(t, z)\), a 4-layer MLP (32 hidden units, tanh activation).

Key distinction: TAPINN infers regime information solely from the observation window, without requiring the known parameter \(\lambda\) (unlike parameterized baselines and HyperPINN).

Composite Loss Function¶

\[\mathcal{L}_{\text{total}} = \mathcal{L}_{\text{data}} + \alpha \mathcal{L}_{\text{physics}} + \beta \mathcal{L}_{\text{metric}}\]

Data Loss \(\mathcal{L}_{\text{data}}\): Reconstruction error over the observation window.
Physics Loss \(\mathcal{L}_{\text{physics}} = \frac{1}{N_c}\sum\|\mathcal{N}[\hat{\mathbf{x}}(t_i);\lambda]\|^2_2\): ODE residual evaluated at \(N_c = 10^4\) collocation points.
Metric Loss \(\mathcal{L}_{\text{metric}} = \max(0, d(z_a, z_p) - d(z_a, z_n) + m)\): Triplet Loss with margin \(m = 0.2\).

Alternating Optimization (AO) Schedule¶

To mitigate gradient conflicts between the metric and physics objectives:

Phase I (Metric Alignment, 5 epochs): Optimize the encoder only using the Triplet Loss to organize the latent space.
Phase II (Physics Reconstruction, 20 epochs): Freeze the encoder and optimize the generator only.
Alternating Joint Fine-tuning: Joint updates every \(k=5\) batches (~20% of steps) over \(\mathcal{L}_{\text{total}}\).

Intuition: Stabilize the latent manifold first (separating embeddings of different regimes), then train the solver conditioned on a stable \(z\).

Triplet Construction¶

The driving amplitude \(F_0\) serves as a proxy for regime similarity: - Anchor/Positive: Share the same \(F_0\). - Negative: Differ in \(F_0\). - Constructed within each batch using Euclidean distance, without hard or semi-hard mining.

Experiments¶

Test Problem: Duffing Oscillator¶

\[\ddot{x} + \delta\dot{x} + \alpha x + \beta x^3 = F_0 \cos(\omega t)\]

Standard parameters \(\delta=0.3, \alpha=-1, \beta=1, \omega=1\); \(F_0 \in [0.3, 0.8]\) spans periodic to chaotic regimes.

Main Results¶

Method	Physics Res. ↓	# Params	Data MSE ↓
Parametric Baseline	0.160	8,577	0.392
Multi-Output (Sobolev)	0.192	8,069	0.426
HyperPINN	0.158	39,169	0.281
TAPINN (Ours)	0.082	8,003	0.425

Key Findings¶

Lowest Physics Residual: TAPINN achieves a physics residual of 0.082, 49% lower than the parametric baseline (0.160).
Parameter Efficiency: Only 8,003 parameters vs. HyperPINN's 39,169 (~5× fewer).
HyperPINN Overfitting: HyperPINN achieves the lowest Data MSE (0.281) but a high physics residual (0.158), indicating memorization of data at the expense of physical consistency.
Training Stability: Gradient norm mean is 2.14× lower and variance is 2.18× lower compared to the multi-output baseline.
Latent Space Structure: t-SNE visualization reveals well-separated clusters for distinct regimes; linear probe regression of \(F_0\) achieves MSE of only \(3.5 \times 10^{-4}\).
Necessity of AO: Joint training without AO yields physics residuals of ~0.158, comparable to standard baselines, demonstrating that metric regularization alone is insufficient.

Highlights & Insights¶

Elegant formulation: metric learning is used to structure the latent space for regime transitions rather than increasing architectural complexity.
The AO schedule is well-motivated and effectively resolves gradient conflicts between metric and physics objectives.
TAPINN achieves the best physics residual with only 1/5 the parameters of HyperPINN.
Reveals a "memorization pathology" in HyperPINN: data fitting at the cost of physical law violation.

Limitations & Future Work¶

Validation is limited to the Duffing oscillator (1D ODE); no experiments on PDE systems or higher-dimensional problems.
Lacks statistical validation across multiple random seeds.
Sensitivity to observation window length is not analyzed.
Hyperparameters \(\alpha\) and \(\beta\) are selected via grid search without an adaptive strategy.
No comparison against domain decomposition methods (XPINNs) or operator learning frameworks (Fourier Neural Operator).
Although physics residuals are low, Data MSE is higher than HyperPINN, leaving trajectory reconstruction accuracy to be further verified.

Parameterized PINNs: Standard approaches that take \(\lambda\) as input directly fail near bifurcations.
HyperPINNs: Almeida et al. — weight generation for regime transitions, at the cost of high parameter counts.
MoE-PINN: Bischof & Kraus — mixture-of-experts routing, subject to routing instability.
PINN Optimization Pathologies: Krishnapriyan et al. — characterizes failure modes of PINNs.
Gradient Pathology Mitigation: Wang et al. — gradient flow ill-conditioning in PINNs.

Rating¶

Novelty: ⭐⭐⭐⭐ — The combination of metric learning and PINNs is novel.
Technical Depth: ⭐⭐⭐⭐ — Method design is principled with sufficient ablation evidence.
Experimental Thoroughness: ⭐⭐⭐ — Only one test problem (Duffing oscillator); limited scale.
Value: ⭐⭐⭐⭐ — Provides a lightweight solution for multi-regime PINNs.