Parametric Pareto Set Learning for Expensive Multi-Objective Optimization¶

Conference: AAAI 2026 arXiv: 2511.05815 Code: None Area: Model Compression Keywords: Pareto Set Learning, Multi-Objective Bayesian Optimization, Hypernetwork, LoRA, Parametric Multi-Objective Optimization

TL;DR¶

This paper proposes the PPSL-MOBO framework, which employs a hypernetwork + LoRA architecture to learn a unified mapping from preference vectors and extrinsic parameters to Pareto-optimal solutions. Combined with Gaussian process surrogate models and hypervolume improvement acquisition strategies, the framework efficiently addresses expensive parametric multi-objective optimization problems.

Background & Motivation¶

State of the Field¶

Pareto Set Learning (PSL) has achieved significant progress in recent years, enabling the learning of continuous mappings from preference vectors to Pareto-optimal solutions. Multi-Objective Bayesian Optimization (MOBO) leverages surrogate models and acquisition functions to efficiently tackle optimization problems with expensive evaluations.

Limitations of Prior Work¶

Parametric Multi-Objective Optimization (PMO) is largely overlooked: Existing PSL methods are limited to fixed problem instances and cannot handle scenarios where objective functions vary with external parameters.

Traditional methods are inefficient: Each new parameter value requires re-optimization from scratch, which is prohibitively costly under expensive evaluation budgets.

Lack of real-time adaptability: When parameters (e.g., operating conditions, patient characteristics, time) change, no mechanism exists for instant Pareto set inference.

Root Cause¶

PMO problems require learning a Pareto set across the entire parameter space under limited evaluation budgets, while demanding generalization to unseen parameter values.

Paper Goals¶

Design a unified framework that, after a single training phase, can instantly infer the complete Pareto set for arbitrary parameter values, substantially reducing the number of expensive evaluations.

Starting Point¶

PMO is formulated as learning the mapping $(\boldsymbol{\lambda}, \boldsymbol{t}) \mapsto \boldsymbol{x}^\star$, where $\boldsymbol{\lambda}$ is the preference vector and $\boldsymbol{t}$ is the extrinsic parameter. A hypernetwork is used to generate parameter-specific PS models, integrated with Bayesian optimization for efficient data acquisition.

Core Idea¶

Leveraging hypernetworks + LoRA to learn shared Pareto set structure across the parameter space, transforming "per-instance independent optimization" into "unified learning + instant inference".

Method¶

Overall Architecture¶

PPSL-MOBO consists of three tightly coupled components: 1. Hypernetwork-LoRA Architecture: Generates parameter-specific PS models. 2. Gaussian Process Surrogate Training: Employs GP surrogate models for scalable optimization. 3. Intelligent Data Acquisition: Parameter-space exploration based on hypervolume improvement.

The system forms a closed loop in which newly acquired data continuously refines both the surrogate models and the parametric PS representation.

Key Design 1: Hypernetwork + LoRA Architecture¶

Function: Efficiently adapts PS models to different parameter values.

Mechanism: For each layer $l$ of the PS model, the weights are decomposed into shared base weights and a low-rank adaptation: $$\boldsymbol{\theta}_{\text{ps}}^l(\boldsymbol{t}) = \boldsymbol{\theta}_0^l + \boldsymbol{B}^l(\boldsymbol{t}) \boldsymbol{A}^l(\boldsymbol{t})$$

where $\boldsymbol{B}^l(\boldsymbol{t}) \in \mathbb{R}^{d^l \times r}$, $\boldsymbol{A}^l(\boldsymbol{t}) \in \mathbb{R}^{r \times k^l}$, and rank $r \ll \min(d^l, k^l)$. The hypernetwork generates only the low-rank matrices: $$\boldsymbol{\theta}_{\text{lora}}(\boldsymbol{t}) = g_{\boldsymbol{\theta}_{\text{hn}}}(\boldsymbol{t})$$

Design Motivation: Directly generating full weights via a hypernetwork suffers from severe dimensional mismatch (low-dimensional parameters $\to$ high-dimensional weights), making training difficult and prone to overfitting. LoRA provides a suitable inductive bias: Pareto sets across different parameters share most of their structure, with differences confined to a low-rank subspace. Parameter count is reduced from $d^l k^l$ to $r(d^l + k^l)$.

Key Design 2: Augmented-Space Gaussian Process Surrogate¶

Function: Constructs a parameter-aware surrogate model to replace expensive objective function evaluations.

Mechanism: An augmented input space $\mathcal{Z} = \mathcal{X} \times \mathcal{T}$ is defined, with $\boldsymbol{z} = [\boldsymbol{x}, \boldsymbol{t}]$. An independent GP is established for each objective: $$f_i(\boldsymbol{z}) \sim \mathcal{GP}(\mu_i(\boldsymbol{z}), k_i(\boldsymbol{z}, \boldsymbol{z}'))$$

The lower confidence bound (LCB) serves as the surrogate objective: $$\hat{\boldsymbol{f}}(\boldsymbol{x}; \boldsymbol{t}) = \hat{\boldsymbol{\mu}}(\boldsymbol{z}) - \beta \hat{\boldsymbol{\sigma}}(\boldsymbol{z})$$

Design Motivation: The input augmentation strategy allows the kernel $k_i$ to automatically learn the influence of parameters $\boldsymbol{t}$ on the objective functions and the interaction effects between $\boldsymbol{x}$ and $\boldsymbol{t}$. LCB provides a natural exploration–exploitation balance.

Key Design 3: Smooth Tchebycheff-Based Surrogate Training¶

Function: End-to-end training of the hypernetwork and base weights using GP surrogate models.

Mechanism: The expected surrogate STCH loss is minimized: $$\hat{\mathcal{L}}(\boldsymbol{\theta}_{\text{hn}}, \boldsymbol{\theta}_0) = \mathbb{E}_{\boldsymbol{t} \sim P_{\boldsymbol{t}}, \boldsymbol{\lambda} \sim P_{\boldsymbol{\lambda}}} \left[ \hat{l}_{\text{stch}}(h_{\boldsymbol{\theta}_{\text{ps}}(\boldsymbol{t})}(\boldsymbol{\lambda}) \mid \boldsymbol{\lambda}, \boldsymbol{t}) \right]$$

where STCH is a smooth approximation of the Tchebycheff scalarization: $$l_{\text{stch}}(\boldsymbol{x} | \boldsymbol{\lambda}, \nu) = \nu \log\left(\sum_{j=1}^m e^{\lambda_j(f_j(\boldsymbol{x}) - (z_j^\star - \varepsilon))/\nu}\right)$$

Design Motivation: The classical Tchebycheff scalarization can recover all (weakly) Pareto-optimal solutions (Theorem 1), but the max operator is non-differentiable. The smooth approximation STCH preserves the complete PS recovery guarantee (Theorem 2) while supporting efficient gradient backpropagation.

Key Design 4: Hypervolume Improvement Data Acquisition¶

Function: Intelligently selects new evaluation points in the parameter space.

Mechanism: 1. A candidate pool $\mathcal{C} = \{(\boldsymbol{x}_p, \boldsymbol{t}_p)\}$ is generated from the trained model. 2. Points that maximize the marginal hypervolume improvement are greedily selected for the evaluation batch: $$\text{HVI}(\hat{\mathcal{Y}}_+, \mathcal{Y}) = \text{HV}(\hat{\mathcal{Y}}_+ \cup \mathcal{Y}) - \text{HV}(\mathcal{Y})$$

Design Motivation: Hypervolume is the only standard quality indicator in multi-objective optimization that simultaneously measures both convergence and diversity. Generating candidates from the PS model ensures that sampled points lie in promising regions of the parameter–decision space.

Loss & Training¶

The global training objective is to minimize the expected surrogate STCH loss (Equation 21). The expectation is approximated via Monte Carlo sampling, and gradients are backpropagated to update $\boldsymbol{\theta}_0$ and $\boldsymbol{\theta}_{\text{hn}}$.

Key Experimental Results¶

Main Results: Multi-Objective Optimization with Shared Components¶

Hypervolume comparison on the RE21 problem (four decision variables) under different sharing configurations:

Shared Variables	NSGA-II	qParEGO	qEHVI	PSL-MOBO	PPSL-MOBO
$(x_1)$	6.52e-1	6.96e-1	7.32e-1	7.34e-1	7.33e-1
$(x_1,x_2)$	5.92e-1	6.19e-1	6.23e-1	6.23e-1	6.23e-1
$(x_2,x_3,x_4)$	5.11e-1	5.15e-1	5.18e-1	5.18e-1	5.19e-1

Key observation: Baseline methods require 100 evaluations per parameter configuration (totaling 1,000 evaluations across 10 configurations), whereas PPSL-MOBO requires only 200 evaluations in total, and inference for new parameters takes only milliseconds.

Dynamic Multi-Objective Optimization¶

On the DF1/DF2 benchmarks, PPSL-MOBO instantly generates solution distributions that approximate the true Pareto front, while DNSGA-II fails to converge within a two-generation update window.

Ablation Study¶

Ablation studies confirm the contribution of each component: the LoRA adaptation, GP surrogate training, and HVI acquisition strategy are all individually necessary.

Key Findings¶

Over 5× evaluation efficiency: Comparable performance is achieved with 200 evaluations versus 1,000.
Instant inference: PS inference for new parameter values requires only milliseconds without retraining.
The choice of LoRA rank $r$ significantly affects performance — too small a rank fails to capture parameter variation, while too large leads to overfitting.
The advantage of PPSL-MOBO is more pronounced in high-dimensional shared-variable configurations, where generalization across a larger parameter space is more critical.

Highlights & Insights¶

Elegant architectural design: Hypernetwork + LoRA reformulates parametric PS learning as an efficient low-rank adaptation problem.
Closed-loop system: Surrogate training $\leftrightarrow$ data acquisition $\leftrightarrow$ model update form a positive feedback cycle.
Two high-impact applications: Shared-component design and dynamic optimization are both core requirements in practical engineering.
Unified framework: A single model handles the entire parameter space, fundamentally departing from the traditional paradigm of per-instance optimization.
Solid theoretical foundation: Complete PS recovery guarantees based on STCH (Theorems 1 and 2).

Limitations & Future Work¶

GP surrogate models may suffer from reduced efficiency in high-dimensional input spaces (concatenated $\boldsymbol{x}$ and $\boldsymbol{t}$).
The current framework addresses only unconstrained multi-objective optimization; extension to constrained settings remains future work.
Theoretical guarantees on sample complexity are absent.
Training stability of the hypernetwork + LoRA combination may require careful hyperparameter tuning in practice.
Test problems in the experiments are relatively low-dimensional (RE21 is 4-dimensional); scalability to higher-dimensional problems is not verified.
In dynamic optimization experiments, comparison with methods specifically designed for DMOPs is not entirely fair, as the proposed method benefits from a global view of the parameter space.

PSL (Lin et al. 2022): Learns a mapping from preferences to Pareto solutions; this paper generalizes the idea to the parametric setting.
LoRA (Hu et al. 2022): Parameter-efficient fine-tuning for large models; innovatively applied here to compress hypernetwork outputs.
MOBO (qEHVI, qParEGO): Single-instance multi-objective Bayesian optimization methods; their core ideas are extended to the parameter space in this work.
Insight: The "shared base + low-rank adaptation" paradigm of LoRA may generalize to broader optimization scenarios requiring cross-instance generalization.

Rating¶

⭐⭐⭐⭐ (4/5)

Strengths: Novel method design (an elegant combination of PSL + LoRA + MOBO), two practically valuable application scenarios, and significant improvement in evaluation efficiency.

Weaknesses: Lack of theoretical guarantees, experiments conducted at relatively small problem scales, and the dynamic optimization comparison is not entirely fair.

Shared Variables	NSGA-II	qParEGO	qEHVI	PSL-MOBO	PPSL-MOBO
\((x_1)\)	6.52e-1	6.96e-1	7.32e-1	7.34e-1	7.33e-1
\((x_1,x_2)\)	5.92e-1	6.19e-1	6.23e-1	6.23e-1	6.23e-1
\((x_2,x_3,x_4)\)	5.11e-1	5.15e-1	5.18e-1	5.18e-1	5.19e-1