DesignX: Human-Competitive Algorithm Designer for Black-Box Optimization¶

Conference: NeurIPS 2025 arXiv: 2505.17866 Code: GitHub Area: Medical Imaging / Optimization Algorithm Design Keywords: Black-Box Optimization, Automated Algorithm Design, Dual-Agent Reinforcement Learning, MetaBBO, Transformer

TL;DR¶

This paper proposes DesignX, the first automated algorithm design framework that jointly learns two sub-tasks—optimizer workflow generation and dynamic hyperparameter control—through dual Transformer agents pre-trained at scale on 10k synthetic problems. DesignX surpasses human-designed optimizers on both synthetic benchmarks and real-world tasks including protein docking, AutoML, and UAV path planning.

Background & Motivation¶

Background: Black-box optimization (BBO) is a core problem in science and industry. Evolutionary computation (EC) is the dominant gradient-free paradigm, having produced a large family of variants—GA, DE, PSO, CMA-ES, etc.—over decades, each requiring expert-crafted adaptive operators and hyperparameter controllers.

Limitations of Prior Work: - Manually redesigning optimizers for each new BBO problem does not scale. - Although MetaBBO (Meta-Black-Box Optimization) introduces learning-based paradigms, existing methods learn only a single sub-task—either algorithm selection/workflow generation or hyperparameter control—and the separation leads to suboptimal designs. - LLM-based approaches can generate algorithm code but likewise handle only one sub-task at a time.

Key Challenge: Algorithm design inherently involves two coupled sub-tasks (workflow structure + dynamic hyperparameters); optimizing them separately cannot achieve joint optimality.

Key Insight: Construct a modular algorithm space (Modular-EC) and a dual-agent RL system for end-to-end joint learning.

Core Idea: Agent-1 autoregressively generates valid optimizer workflows; Agent-2 dynamically controls hyperparameters. Both agents are meta-trained on a distribution of 10k problems through a cooperative training objective.

Method¶

Overall Architecture¶

The input to the framework is a feature vector characterizing a BBO problem instance (dimensionality, search range, ELA statistical features, etc.). Agent-1 (Transformer) autoregressively generates a valid optimizer workflow—selecting from 116 modules in Modular-EC—conditioned on the problem features. Agent-2 (Transformer) dynamically adjusts the hyperparameters of all controllable modules in response to real-time feedback during optimization. The two agents are jointly trained via a cooperative reward objective.

Key Designs¶

Modular-EC: Modular Algorithm Space
- Function: Decomposes EC optimizers into 10 module types (Initialization, Mutation, Crossover, Selection, Niching, …) with 116 module variants in total.
- Mechanism: Each module has a unique 16-bit encoding and topology rules (defining valid successor modules), enabling autoregressive generation of valid workflows. Compared to its predecessor Modular-BBO (primarily targeting DE), Modular-EC adds ES/GA/PSO operators and the Other_Update module type.
- Design Motivation: To unify decades of expert-designed algorithm components into a single encoding, providing the learning agent with a search space of millions of possible workflows.
Agent-1: Workflow Generation
- Function: Given problem features $\mathcal{F}_p$ (13-dimensional, comprising 4 basic attributes and 9 ELA features), autoregressively samples a module sequence.
- Mechanism: Based on a GPT-2 architecture, it ensures topological validity via masked softmax sampling: $$P(\mathcal{A}_p^{m+1} | \text{start}, \mathcal{A}_p^1, ..., \mathcal{A}_p^m) \sim \text{Softmax}(\text{mask}(\mathcal{A}_p^m) \odot (\mathcal{W}_\text{sample}^T \cdot H^{(m)}))$$ The mask vector zeroes out probabilities of illegal modules according to the current module's topology rules.
- Design Motivation: The Transformer's sequence modeling capacity is naturally suited to the ordered generation of workflows, and masked sampling guarantees that generated optimizers are always valid and executable.
Agent-2: Dynamic Hyperparameter Control
- Function: At each optimization step, generates hyperparameter values for all controllable modules based on the observation $\mathcal{O}_t$ (a 9-dimensional progress feature vector).
- Mechanism: Module IDs and observations are concatenated and encoded; another stack of GPT-2 blocks outputs the parameters of a normal distribution: $$\mu = \mathcal{W}_\mu^T \cdot H_{dec}, \quad \Sigma = \mathcal{W}_\Sigma^T \cdot H_{dec}$$ Hyperparameters are then sampled from the predicted distribution: $C_t^m \sim \mathcal{N}(\mu^{(m)}, \Sigma^{(m)})$
- Design Motivation: Hyperparameters directly govern the exploration–exploitation trade-off in EC optimizers; dynamic control allows adaptive adjustment across different stages of optimization.
Cooperative Training Objective
- Agent-1 is trained with REINFORCE (delayed reward); Agent-2 is trained with PPO (dense reward).
- Unified objective: $\mathcal{J}(\phi, \theta) = \mathbb{E}_{p \sim \mathcal{D}_{train}}[\sum_{t=1}^T r_t]$
- Per-step reward: $r_t = \frac{f_p^{t-1,*} - f_p^{t,*}}{f_p^{0,*} - f_p^*}$ (normalized optimization progress)

Loss & Training¶

12,800 synthetic problem instances (9,600 training / 3,200 test), constructed by combining 32 base functions through single/composition/hybrid schemes.
Training runs for 6 days; the primary bottleneck is BBO simulation rather than neural network computation (optimization loops run on CPU).
At inference time, DesignX requires approximately 5.5 s per problem, comparable to CMA-ES (5.0 s).

Key Experimental Results¶

Main Results: Synthetic Test Set (Selected)¶

Problem	before 00	00s	10s	after 20	MetaBBO	DesignX
F1 (50D, 30K FEs)	6.60E+00	1.64E+00	1.27E+00	5.32E+00	2.80E+00	2.89E-01
F2068 (20D)	3.79E+01	2.32E+00	1.46E+01	1.65E+01	3.72E+01	5.16E-01
F2390 (10D)	3.93E+00	2.78E+00	6.34E+00	1.54E+00	2.04E+01	1.85E-03
Normalized Mean	2.94E-01	1.96E-01	1.54E-01	1.46E-01	1.32E-01	8.26E-02

DesignX ranks first on nearly all test instances; its normalized mean is 37% lower than that of the best MetaBBO baseline.

Ablation Study¶

Configuration	Description	Normalized Performance
w/o A1+A2	Random dual agents	Worst
w/o A1	Train Agent-2 only	Poor
w/o A2	Train Agent-1 only	Moderate
SBS	Static workflow	Poor
DesignX	Dual-agent cooperation	Best

Key Findings¶

Agent-1 (workflow generation) contributes more to performance than Agent-2 (hyperparameter control), yet their cooperation yields results significantly superior to either sub-task alone.
The design strategies learned by DesignX are interpretable: it favors composite mutation strategies for multimodal problems and population-reduction mechanisms for small search ranges.
Notably, DesignX assigns negligible importance to the initialization strategy—a finding that may not align with conventional human intuition.
DE-related modules are selected most frequently by DesignX, suggesting that DE operator combinations offer the strongest general-purpose utility.
DesignX maintains its advantage on out-of-distribution real-world tasks including protein docking, AutoML, and UAV path planning.

Highlights & Insights¶

First end-to-end framework to jointly learn both algorithm design sub-tasks: unifying workflow generation and hyperparameter control breaks the single-sub-task bottleneck prevalent in the MetaBBO literature.
Masked Softmax for validity guarantees: the topology rules combined with the mask mechanism elegantly ensure that autoregressively generated optimizer workflows are always valid and executable.
Interpretability analysis is valuable: module importance factors and sub-module distribution analysis reveal non-trivial design principles learned by DesignX, providing reverse insights for human optimizer design.
The paradigm of large-scale training on synthetic data followed by zero-shot transfer to real tasks is worth adopting more broadly.

Limitations & Future Work¶

Modular-EC currently supports only EC-family optimizers (DE/PSO/GA/ES) and does not cover other BBO paradigms such as Bayesian optimization.
Training requires 6 days of CPU computation; scaling-law experiments are constrained by available compute.
Under rank-based comparison, DesignX performance remains close to CMA-ES, indicating room for further improvement.
Problem features are represented by only 13-dimensional ELA features, which may be insufficient to characterize high-dimensional or complex problems.
Training is conducted only at the smallest model configuration (1-layer GPT-2); the potential of larger models and larger training sets remains largely unexplored.

vs. ConfigX: ConfigX handles only DE hyperparameter control (single sub-task); DesignX extends this to dual sub-tasks and upgrades Modular-BBO to Modular-EC.
vs. ALDes: ALDes performs workflow generation but not dynamic hyperparameter control; DesignX unifies both.
vs. GLHF: GLHF simulates DE operators via gradient descent; DesignX directly learns module combinations through RL.
vs. LLM-based approaches: LLMs generate algorithm code but process only one sub-task per query and incur high inference costs; DesignX achieves more efficient end-to-end design with a compact model.

Rating¶

Novelty: ⭐⭐⭐⭐⭐ First end-to-end dual-agent automated algorithm design framework, with innovations in both theoretical design and engineering implementation.
Experimental Thoroughness: ⭐⭐⭐⭐⭐ 3,200 synthetic tests + 3 real-world scenarios + ablation + scaling law + interpretability analysis—extremely comprehensive.
Writing Quality: ⭐⭐⭐⭐ Clear structure with rich visualizations, though the heavy notation requires repeated cross-referencing.
Value: ⭐⭐⭐⭐⭐ Represents a paradigm-level advance for both MetaBBO and automated algorithm design.