Thompson Sampling in Function Spaces via Neural Operators¶

Conference: NeurIPS 2025 arXiv: 2506.21894 Code: None Area: Reinforcement Learning / Bayesian Optimization Keywords: Thompson Sampling, Function Space, Neural Operators, Bayesian Optimization, PDE

TL;DR¶

This paper extends Thompson Sampling (TS) from finite-dimensional parameter spaces to infinite-dimensional function spaces, leveraging neural operators as approximate samplers of Gaussian process posteriors to efficiently solve functional optimization problems involving partial differential equations (PDEs).

Background & Motivation¶

Functional Optimization Problems¶

Many scientific and engineering problems can be formulated as optimizing a known functional $J$ applied to the output of an unknown operator $\mathcal{G}$. Examples include: - Designing initial temperature distributions that optimize heat conduction - Finding boundary conditions that minimize fluid drag - Optimizing control functions that maximize chemical reaction yield

These problems share several characteristics: - High query cost: Evaluating the operator $\mathcal{G}$ (e.g., running high-fidelity simulators or physical experiments) is extremely expensive - Cheap functional evaluation: Once the output of $\mathcal{G}$ is known, computing $J$ is inexpensive - Infinite-dimensional inputs and outputs: The optimization space is infinite-dimensional

Limitations of Thompson Sampling¶

Traditional Thompson Sampling has been widely and successfully applied in finite-dimensional parameter spaces, but extending it to function spaces poses fundamental challenges: - Defining and sampling from posterior distributions in infinite-dimensional spaces is non-trivial - Gaussian process (GP) inference in function spaces incurs prohibitive computational costs - No established theoretical connection exists between neural operators and GPs

Method¶

Overall Architecture¶

The algorithm adopts a sample-then-optimize strategy:

Round t:
1. Train neural operator f_t ≈ sample from GP posterior p(G|D_{1:t})
2. Select the next query function u_{t+1} by optimizing J(f_t(·)) over the sampled operator f_t
3. Query the true operator: v_{t+1} = G(u_{t+1})
4. Update dataset: D_{1:t+1} = D_{1:t} ∪ {(u_{t+1}, v_{t+1})}

Key Designs¶

Neural Operators as GP Posterior Samplers¶

The core contribution is treating a trained neural operator as an approximate sample from the GP posterior:

Theoretical connection: Under appropriate prior distributions and training procedures, a neural operator with random initialization trained to a specified accuracy can be interpreted as an approximate sample from the GP posterior
Avoiding explicit uncertainty quantification: There is no need to maintain a full GP posterior (which is computationally prohibitive); training the neural operator suffices
Source of stochasticity: Different random initializations are used in each round to introduce the necessary exploratory behavior

Theoretical Framework in the Infinite-Dimensional Setting¶

The paper establishes the following theoretical results:

Proposition (Neural Operator–GP Connection): Let $\mathcal{G}: \mathcal{U} \to \mathcal{V}$ be an unknown operator and $f_\theta$ a neural operator with parameters $\theta$. Under specific regularity conditions and prior distributions, the trained $f_\theta$ approximately follows the marginal distribution of the GP posterior.

Theorem (Regret Bound): The cumulative regret of the proposed algorithm satisfies sublinear growth: $$R_T = \sum_{t=1}^T [J(\mathcal{G}(u^*)) - J(\mathcal{G}(u_t))] \leq \tilde{O}(\sqrt{T \gamma_T})$$ where $\gamma_T$ denotes the maximum information gain.

Functional Optimization Sub-step¶

Given the sampled neural operator $f_t$, the inner optimization problem is: $$u_{t+1} = \arg\max_{u \in \mathcal{U}} J(f_t(u))$$ Since $f_t$ is a differentiable neural network, this can be solved directly via gradient-based optimization.

Loss & Training¶

Neural operator training objective: Minimize prediction error on observed data $$\mathcal{L}(\theta) = \sum_{i=1}^{t} \| f_\theta(u_i) - v_i \|_{\mathcal{V}}^2 + \lambda \|\theta - \theta_0\|^2$$ where the second term is a regularizer and $\theta_0$ denotes the randomly initialized parameters
Neural operator architecture: DeepONet or Fourier Neural Operator (FNO)
Retraining each round: Ensures independence across successive samples

Key Experimental Results¶

Main Results¶

Comparison with Bayesian optimization baselines on PDE functional optimization tasks:

Method	Darcy Flow (obj. ↑)	Heat Conduction (obj. ↑)	Burgers' Eq. (obj. ↑)	Query Budget
Random Search	0.32	0.28	0.25	100
GP-UCB (finite-dim)	0.58	0.53	0.47	100
EI (finite-dim)	0.61	0.55	0.49	100
Functional GP-TS	0.67	0.62	0.54	100
Ours (NO-TS)	0.78	0.73	0.65	100

Sample efficiency comparison (number of queries to reach objective value 0.6):

Method	Darcy Flow	Heat Conduction	Burgers' Eq.	Avg. Speedup
GP-UCB	78	85	92	1.0×
EI	72	80	88	1.1×
Functional GP-TS	55	63	70	1.4×
Ours (NO-TS)	35	42	48	2.1×

Ablation Study¶

Component	Darcy Flow (obj.)	Query Efficiency Change
Full NO-TS	0.78	Baseline
Fixed initialization (no stochasticity)	0.52	−33%
No regularization	0.71	−9%
DeepONet → FNO	0.76	−3%
Reduced training epochs	0.68	−13%
Increased regularization $\lambda$	0.74	−5%

Key Findings¶

Substantial improvement in sample efficiency: The proposed method reduces the number of required queries by approximately 50% compared to conventional Bayesian optimization approaches
Random initialization is critical: Fixing the initialization eliminates exploratory behavior and leads to significant performance degradation
Dual role of regularization: Regularization simultaneously controls overfitting and maintains the approximation quality between the neural operator and GP posterior sampling
Architectural robustness: DeepONet and FNO yield comparable performance, indicating that the method is insensitive to the choice of neural operator architecture

Highlights & Insights¶

Deep integration of theory and practice: The paper establishes a rigorous mathematical connection between neural operators and GPs in the infinite-dimensional setting, and derives formal regret bounds
Avoiding explicit uncertainty quantification: Training stochasticity is cleverly leveraged as a surrogate for costly posterior inference
Broad applicability: The framework applies to any operator optimization problem amenable to neural operator modeling
An important bridge for scientific computing: Modern Bayesian optimization methods are brought to bear on PDE-constrained optimization
Sublinear regret guarantee: Theoretical convergence guarantees are provided

Limitations & Future Work¶

Full retraining each round: The computational overhead is non-trivial; incremental training strategies could be explored
Prior assumptions: Theoretical results rely on specific prior distributions and regularity conditions that may not be fully satisfied in practice
High-dimensional function spaces: When the input function space has very high dimensionality, the inner optimization sub-step may become intractable
Lack of adaptive exploration–exploitation balance: Unlike UCB-type methods, the proposed approach does not offer explicit control over exploration intensity
Limited experimental scale: Validation is conducted on only a few PDE benchmarks; performance on larger-scale problems remains to be examined

Neural Operators: DeepONet, FNO, and related architectures provide practical tools for learning in function spaces
Bayesian Optimization: Classical methods such as GP-UCB and EI perform well in finite dimensions but face significant challenges when generalized to function spaces
Thompson Sampling Theory: Theoretical analyses by Russo & Van Roy and others provide the foundation for the regret bounds derived in this work
Future Directions: Extending this framework to multi-objective or constrained optimization settings

Rating¶

Novelty: ⭐⭐⭐⭐⭐ — First work to combine TS with neural operators for function-space optimization
Theoretical Depth: ⭐⭐⭐⭐⭐ — Establishes a rigorous infinite-dimensional theoretical framework with regret bounds
Experimental Thoroughness: ⭐⭐⭐⭐ — Validated across multiple PDE tasks
Practical Impact: ⭐⭐⭐⭐ — Significant potential value for scientific computing and engineering optimization
Writing Quality: ⭐⭐⭐⭐ — Theoretical sections are somewhat dense, but overall presentation is clear