ICLR 2026 Image Generation antimicrobial peptide design multi-agent collaboration closed-loop reinforcement learning multi-objective optimization LLM agent

MAC-AMP: A Closed-Loop Multi-Agent Collaboration System for Multi-Objective Antimicrobial Peptide Design¶

Conference: ICLR 2026 arXiv: 2602.14926 Code: GitHub Area: Image Generation Keywords: antimicrobial peptide design, multi-agent collaboration, closed-loop reinforcement learning, multi-objective optimization, LLM agent

TL;DR¶

This paper proposes MAC-AMP, the first closed-loop multi-agent collaboration system that reformulates antimicrobial peptide (AMP) design as a coordinated multi-agent optimization problem, achieving multi-objective optimization through AI-simulated peer review and adaptive reward design.

Background & Motivation¶

Antimicrobial Resistance (AMR) Crisis: Directly responsible for approximately 1.14 million deaths in 2021, with projections exceeding 39 million direct deaths between 2025 and 2050.
Limitations of Existing AMP Design Models:
- Most optimize only for antimicrobial activity, neglecting toxicity, stability, and novelty.
- Multi-objective optimization is unstable; static weights readily cause reward hacking or diversity collapse.
- Outputs are typically scattered scores or text, making it difficult to convert them into reproducible learning signals.
Limitations of Existing Multi-Agent Systems:
- Outputs are primarily in natural language, lacking trainable optimization signals.
- Most are open-loop systems that rely on human intervention.

Method¶

Overall Architecture¶

MAC-AMP comprises six interconnected modules: Input Module → Property Prediction → AI-Simulated Peer Review → RL Refinement → Peptide Generation → Output Module. Users need only provide the target bacterial name and an example dataset.

1. Property Prediction Module¶

Evaluates multiple AMP attributes, categorized into two types: - Explicit Reward Signals $S$: Antimicrobial activity score $S_a$ (MIC predictor fine-tuned on ProtBERT), AMP likelihood score $S_b$ (Macrel 1.5). - Auxiliary Evidence $V$: Toxicity score $V_a$ (ToxinPred 3.0), structural reliability $V_b$ (OmegaFold), physicochemical properties $V_c$ (ProtParam), template similarity $V_d$ (Foldseek).

2. AI-Simulated Peer Review Module¶

Three Independent Reviewer Agents (GPT-5, Gemini 2.5, Perplexity) evaluate peptides across four dimensions: efficacy, safety, developmental structure, and originality.
Each dimension is associated with a weighted vocabulary sub-table, using the tag format $\text{ID}(\text{State}, \text{Weight})$ for structured annotation.
Area Chair Agent: Aggregates review results, resolves semantic conflicts, computes dimension-level meta-scores, and outputs meta-review text $T$ and average meta-score $S_c$.

CS-Based Reward Design Agent: Optimizes the reward function based on observable signals and mathematical properties.
Biomedical Reward Alignment Agent: Analyzes meta-review text and proposes revision recommendations grounded in domain knowledge.
Candidate rewards are filtered by a rule-based validator → short-term sandbox training → Pareto optimization to select the optimal reward function.
Phase-Adaptive Optimization: The reward function is redesigned every 15 epochs over 3 iterations.

4. PPO Optimization¶

Normalized advantage: $A = \text{norm}(R - \bar{V}_\phi)$

Clipped surrogate loss:

\[L_{policy}(\theta) = \mathbb{E}[\min(r(\theta)A, \text{clip}(r(\theta), 1-\epsilon, 1+\epsilon)A)]\]

Total loss:

\[L = L_{policy} + c_v L_{value} - c_e L_{ent}\]

where $L_{value}$ is the value regression loss and $L_{ent}$ is the entropy regularization term.

Key Experimental Results¶

Main Results: Target-Specific AMP Evaluation¶

Model	Antimicrobial Activity (↑)	AMP Likelihood (↑)	Toxicity (↓)	Structural Reliability (↑)
MAC-AMP	0.943±0.008	0.797±0.012	0.154±0.008	0.873±0.009
AMP-Designer	0.807±0.021	0.811±0.011	0.251±0.024	0.817±0.017
BroadAMP-GPT	0.831±0.025	0.821±0.018	0.246±0.033	0.763±0.023
PepGAN	0.823±0.023	0.572±0.035	0.247±0.064	0.637±0.026
Diff-AMP	0.822±0.006	0.554±0.036	0.235±0.072	0.752±0.020

Results on E. coli target

Broad-Spectrum Activity Evaluation¶

Model	E. coli	S. aureus	P. aeruginosa	K. pneumoniae	E. faecium
MAC-AMP	0.94	0.81	0.94	0.98	0.95
AMP-Designer	0.81	0.81	0.85	0.96	0.96
PepGAN	0.82	0.89	0.91	0.98	0.96

Key Findings¶

MAC-AMP comprehensively outperforms baselines in antimicrobial activity, toxicity, and structural reliability.
AMPs designed for E. coli generalize well to other Gram-negative bacteria (which share outer membrane structures).
Strong generalization is also demonstrated for E. faecium (a Gram-positive bacterium).
Training cost: 47.61 GPU hours, 853 API calls, API expenditure of $36.56.

Highlights & Insights¶

First Closed-Loop Multi-Agent System: Converts natural-language review consensus into executable RL reward signals, bridging the gap between output format and training signal.
End-to-End Interpretability: Overcomes black-box limitations through transparent logging, replay trajectories, and consensus-aware decision tracking.
Cross-Domain Transferability: The framework's generality is validated on English table-to-text generation tasks.
Multi-Objective Balance: Achieves multi-objective optimization through structured agent consensus rather than manually specified static weights.

Limitations & Future Work¶

Generated peptides have not yet been validated through in vitro experiments.
API call costs may limit large-scale deployment.
The peer review module relies on specific commercial LLMs, constraining reproducibility.
The phase interval (15 epochs) and iteration count (3 rounds) are hyperparameters that may require task-specific tuning.

AMP Generation: AMPGAN v2, Diff-AMP, AMP Designer
LLM Multi-Agent Collaboration: Virtual Lab, CAMEL, AutoGen, ReviewAgents
LLM-Augmented RL: RLAIF, Eureka

Rating¶

Novelty: ⭐⭐⭐⭐ — First framework to apply closed-loop multi-agent collaboration to molecular design.
Technical Depth: ⭐⭐⭐⭐ — Modular design is sophisticated with thorough multi-level validation.
Experimental Thoroughness: ⭐⭐⭐⭐ — Five bacterial targets, four baselines, and multi-dimensional ablation studies.
Practical Value: ⭐⭐⭐⭐ — Extensible to other molecular design tasks.