Skip to content

Task-free Adaptive Meta Black-box Optimization

Conference: ICLR 2026 arXiv: 2601.21475 Code: None Area: Remote Sensing Keywords: Black-box optimization, meta-learning, evolutionary algorithms, adaptive parameter learning, zero-shot optimization

TL;DR

This paper proposes ABOM—a task-free adaptive meta black-box optimizer that eliminates the need for predefined training task distributions. By parameterizing evolutionary operators (selection, crossover, mutation) as differentiable attention modules and leveraging self-generated data for online parameter updates during optimization, ABOM achieves competitive zero-shot performance on synthetic benchmarks and UAV path planning tasks.

Background & Motivation

Background: Black-box optimization (BBO) is widely applied in scenarios such as hyperparameter tuning and neural architecture search. Traditional evolutionary algorithms (EAs) rely on hand-crafted operators and parameters. Meta-BBO methods automate optimizer configuration via meta-learning, but require pre-training on a manually designed training task distribution \(\mathcal{F}\).

Limitations of Prior Work: The core limitation of Meta-BBO methods lies in their dependence on hand-crafted training task distributions. In practice, the distribution of target tasks is often unknown or unique (e.g., specific engineering optimization problems), making it infeasible to obtain suitable training task sets.

Key Challenge: The NFL theorem establishes that no universally optimal algorithm exists, necessitating adaptation. However, existing adaptive methods either require domain knowledge to design rules (traditional adaptive EAs) or require training task distributions (Meta-BBO). The fundamental question is how to achieve adaptation without domain knowledge or training tasks.

Goal: (a) Eliminate dependence on predefined training task distributions; (b) replace discrete algorithm selection spaces with continuous differentiable parameter spaces; (c) enable online parameter learning using self-generated data produced during optimization.

Key Insight: Parameterize evolutionary operators as attention mechanisms to make them differentiable, then use "encouraging offspring to approximate the elite archive" as a supervision signal for online parameter updates.

Core Idea: Parameterize evolutionary operators via attention mechanisms, transforming the meta-learning paradigm from "pre-train then deploy" to a closed-loop adaptive "learn while optimizing" framework.

Method

Overall Architecture

The input is a black-box objective function \(f_T(\mathbf{x})\) (query-only), and the output is an approximate optimal solution \(\mathbf{x}^*\). The ABOM optimization loop consists of five steps: (1) initialize the population via Latin hypercube sampling; (2) generate offspring using parameterized operators \(\pi_\theta\); (3) evaluate offspring fitness; (4) retain the top \(N\) individuals via elitism; (5) update operator parameters \(\theta\) via gradient descent. The entire process requires no pre-training and directly performs "optimize while learning" on the target task.

Key Designs

  1. Dual-path Attention Selection:

    • Function: Compute the selection matrix \(\mathbf{A}^{(t)} \in \mathbb{R}^{N \times N}\), determining which individuals participate in crossover.
    • Mechanism: The spatial relationships and fitness rankings of solutions in the search space are encoded separately through two sets of Query-Key projections and fused into attention weights via softmax. \(\mathbf{A}^{(t)} = \text{softmax}\left(\frac{(\mathbf{P}\mathbf{W}^{QP})(\mathbf{P}\mathbf{W}^{KP})^\top + (\mathbf{F}\mathbf{W}^{QF})(\mathbf{F}\mathbf{W}^{KF})^\top}{\sqrt{d_A}}\right)\)
    • Design Motivation: Traditional selection relies solely on fitness ranking (e.g., tournament selection), neglecting spatial relationships between solutions. The dual-path design jointly considers "who is better" and "who is closer," enabling more targeted recombination.
  2. Differentiable Crossover:

    • Function: Generate intermediate population \(\mathbf{P}'^{(t)} = \mathbf{P}^{(t)} + \text{MLP}_{\theta_c}(\mathbf{A}^{(t)}\mathbf{P}^{(t)})\).
    • Mechanism: \(\mathbf{A}^{(t)}\mathbf{P}^{(t)}\) first performs attention-weighted mixing of parent individuals (an attention-weighted crossover pool), and the MLP further transforms this to generate offsets. Dropout (probability \(p_C\)) remains active during inference to provide continuous exploration stochasticity.
    • Design Motivation: The residual connection preserves parent information, the MLP learns nonlinear crossover patterns, and dropout replaces the crossover probability hyperparameter in traditional EAs.
  3. Gene-dimension Attention Mutation:

    • Function: Compute a mutation matrix \(\mathbf{M}_i^{(t)} \in \mathbb{R}^{d \times d}\) for each individual, modeling interactions between gene dimensions.
    • Mechanism: \(\mathbf{M}_i^{(t)}\) computes inter-dimensional dependency strengths via self-attention; \(\hat{\mathbf{p}}_i = \mathbf{p}'_i + \text{MLP}_{\theta_m}(\mathbf{M}_i\mathbf{p}'_i)\), enabling mutation to account for inter-variable correlations.
    • Design Motivation: Traditional mutation (e.g., Gaussian perturbation) treats each dimension independently, ignoring variable coupling. The attention mutation matrix can learn patterns such as "when dimension \(j\) is modified, dimension \(k\) should be adjusted accordingly."
  4. Adaptive Parameter Learning:

    • Function: Online update of all parameters \(\theta\).
    • Mechanism: The loss function is \(\mathcal{L}^{(t)} = \|\hat{\mathbf{P}}^{(t)} - \mathbf{E}^{(t)}\|^2\), encouraging offspring to approximate the elite archive. Parameters are updated via AdamW: \(\theta \leftarrow \theta - \eta \nabla_\theta \mathcal{L}^{(t)}\).
    • Design Motivation: The elite archive encodes information about currently known optimal solutions; encouraging offspring to move toward the elite direction implements a gradient-based version of "survival of the fittest."

Loss & Training

  • Loss function: \(\mathcal{L}^{(t)} = \|\hat{\mathbf{P}}^{(t)} - \mathbf{E}^{(t)}\|^2\), the L2 distance between offspring and the elite archive.
  • No pre-training; parameters are randomly initialized and learned online during optimization.
  • Theoretical guarantee: Under compact search spaces and continuous objective functions, ABOM guarantees global convergence.

Key Experimental Results

Main Results (BBOB Synthetic Benchmark, \(d=500\))

Comparison against 10 baselines on 16 test functions (30 independent runs, Wilcoxon signed-rank test):

Method Category Representative Methods Win/Tie/Loss vs. ABOM Notes
Traditional EA RS/PSO/DE 0/0/16 ABOM significantly outperforms on all functions
Adaptive EA CMAES/JDE21 2–3/1–2/11–13 ABOM significantly superior overall
MetaBBO GLEET/RLDEAFL/LES/GLHF 1–4/1–3/9–14 ABOM matches or surpasses without training tasks

Main Results (UAV Path Planning — 28 Problems)

Metric ABOM Best MetaBBO Best Adaptive EA
Normalized cost convergence speed Fastest Moderate Slow
Final normalized cost Lowest Moderate Higher
Runtime GPU-accelerated, among fastest Requires pre-training CPU-bound

Ablation Study

Configuration BBOB \(d=500\) Ranking Notes
ABOM (full) Best Selection + crossover + mutation + adaptive learning
w/o adaptive learning Significant drop Fixed random parameters; degenerates to random search
w/o selection attention Drop Uniform selection, similar to random recombination
w/o mutation attention Drop Independent per-dimension mutation

Key Findings

  • ABOM matches or surpasses MetaBBO methods that require training task distributions, without using any training tasks.
  • Visualization reveals that the selection matrix automatically learns a "survival of the fittest" pattern (higher weights for high-fitness individuals) while not always selecting the best individual (preserving diversity).
  • The mutation matrix evolves from random initialization into structured patterns, reflecting problem-specific gene interaction modes.
  • Parameters are moderately sensitive to dropout rates \(p_C, p_M\): values that are too low lead to premature convergence, while values that are too high result in excessively random search.

Highlights & Insights

  • Transforming meta-learning from "pre-train then deploy" to "learn while optimizing" is the core innovation: by using offspring-to-elite-archive approximation as a supervision signal, the unsupervised BBO problem is reformulated as online supervised learning. This idea is transferable to other meta-learning scenarios requiring online adaptation.
  • The analogy of attention mechanisms as evolutionary operators is highly natural: selection = inter-individual attention weights; crossover = weighted recombination + MLP transformation; mutation = inter-dimensional self-attention. A key detail is that dropout remains active during inference to maintain exploratory behavior.
  • The paper provides theoretical guarantees of global convergence, although practical convergence speed depends on problem structure.

Limitations & Future Work

  • Computational complexity is \(O(d^3)\) (where \(d\) is the search space dimensionality), making the method impractical for very high-dimensional problems (\(d > 1000\)).
  • The elite archive approximation loss may lead to loss of population diversity, as no explicit diversity preservation mechanism is incorporated.
  • Validation is limited to BBOB synthetic functions and UAV path planning; broader real-world application scenarios remain unexplored.
  • Performance gaps relative to traditional adaptive EAs (e.g., CMA-ES) persist on certain functions.
  • vs. CMA-ES: CMA-ES adapts search directions via covariance matrix adaptation but requires domain knowledge for design. ABOM automatically learns analogous search strategies through attention mechanisms.
  • vs. GLHF/RLDEAFL: These MetaBBO methods require pre-training on training task distributions, a dependency that ABOM completely eliminates.
  • vs. EvoTorch/OpenELM: Existing differentiable evolutionary frameworks focus on GPU acceleration, whereas ABOM further achieves operator parameterization and online learning.

Rating

  • Novelty: ⭐⭐⭐⭐ — The idea of fully parameterizing evolutionary operators as differentiable attention modules is novel.
  • Experimental Thoroughness: ⭐⭐⭐⭐ — BBOB across three dimensionalities + UAV application + ablation study + visualization.
  • Writing Quality: ⭐⭐⭐⭐ — The derivation from Meta-BBO to ABOM is clearly presented.
  • Value: ⭐⭐⭐⭐ — A significant contribution to the meta black-box optimization field.