Skip to content

ProtoCycle: Reflective Tool-Augmented Planning for Text-Guided Protein Design

Conference: ACL 2026 (Findings) arXiv: 2604.16896 Code: N/A Area: Medical Imaging Keywords: protein design, text-guided, reflective planning, tool augmentation, reinforcement learning

TL;DR

ProtoCycle proposes a reflective agent framework that positions an LLM as a planner coupled with a lightweight tool environment for text-guided protein sequence design. Through a multi-round feedback-driven decision loop and online reinforcement learning training, the framework achieves strong language alignment while maintaining competitive foldability.

Background & Motivation

Background: Designing proteins that satisfy natural-language functional requirements is a central goal of protein engineering. A straightforward approach is to fine-tune general-purpose instruction-tuned LLMs as text-to-sequence generators, but this is both data- and compute-intensive.

Limitations of Prior Work: (1) Direct text-to-sequence methods require substantial supervised data and computational resources; (2) Under limited supervision, LLMs can generate coherent textual plans but cannot reliably translate them into protein sequences—a plan-execute gap exists; (3) Protein design inherently requires iterative trial-and-error, whereas most existing methods perform single-shot generation.

Key Challenge: LLMs excel at understanding natural-language functional descriptions and generating plans, but struggle to map directly from text to valid protein sequences, especially under limited training data.

Goal: Construct a protein design framework that exploits the planning capabilities of LLMs while compensating for their weakness in sequence generation.

Key Insight: Drawing on the iterative workflow of human protein engineers—rather than one-step generation, adopting a "plan → execute → feedback → revise" multi-round loop that positions the LLM as a planner rather than a generator.

Core Idea: Couple an LLM planner with a lightweight tool environment that provides sequence manipulation and evaluation capabilities. The LLM iteratively refines its design by reflecting on tool feedback, and agent capability is improved through supervised trajectory training followed by online reinforcement learning.

Method

Overall Architecture

ProtoCycle adopts an agent architecture: the LLM Planner receives a natural-language functional description → generates a design plan → invokes protein design/evaluation tools in the tool environment → receives feedback (e.g., structure prediction results, functional evaluation scores) → the LLM reflects on the feedback and revises its plan → the loop repeats until requirements are satisfied. Training proceeds in two stages: supervised learning (from expert trajectories) and online reinforcement learning (learning optimization strategies from tool feedback).

Key Designs

1. Reflective Multi-Round Decision Loop

  • Function: Simulates the iterative trial-and-error process of human protein engineers.
  • Mechanism: In each round, the LLM generates actions based on the current state and historical feedback (e.g., selecting templates, mutation sites, or modification strategies); the tool environment executes these actions and returns results (e.g., predicted structures, energy scores); the LLM reflects on result quality and determines the next operation.
  • Design Motivation: Protein design is inherently an iterative optimization process; single-shot generation rarely satisfies complex functional requirements. The LLM-driven reflection mechanism enables the agent to learn from failures and adjust strategies accordingly.

2. Lightweight Tool Environment

  • Function: Provides core operations and evaluation capabilities required for protein design.
  • Mechanism: The tool environment encapsulates protein sequence manipulation tools (e.g., mutation, splicing) and evaluation tools (e.g., structure prediction, functional assessment), emulating the experimental and computational tools used by human engineers.
  • Design Motivation: LLMs excel at high-level planning but are ill-suited for low-level sequence operations; the tool environment compensates for this weakness while making the design process interpretable and traceable.

3. Supervised + Online Reinforcement Learning Training

  • Function: Trains the agent's planning and reflection capabilities in stages.
  • Mechanism: The first stage applies supervised fine-tuning on expert design trajectories, enabling the agent to learn basic tool invocation and design workflows; the second stage applies online RL, allowing the agent to explore autonomously within the tool environment and learn optimization strategies from feedback signals (language alignment, foldability, etc.).
  • Design Motivation: Supervised learning provides a cold-start capability; RL further optimizes strategies beyond the level of expert demonstrations.

Key Experimental Results

Main Results

Evaluation Dimension ProtoCycle Performance
Language Alignment Strong (high functional match between text descriptions and generated sequences)
Foldability Competitive (generated sequences fold into stable 3D structures)
vs. Direct LLM Fine-tuning Outperforms direct text-to-sequence approaches

Ablation Study

Ablated Component Effect
Without reflection mechanism Significant drop in sequence quality
Supervised learning only (no RL) Lower performance than full method
RL only (no supervised pre-training) Unstable training

Key Findings

  1. Reflection is critical: Ablation experiments show that removing the LLM-driven reflection mechanism significantly degrades sequence quality, confirming that iterative reflection is essential to design quality.
  2. Two-stage training is complementary: Supervised learning provides foundational capabilities while RL further improves performance; neither stage is dispensable.
  3. LLM as planner outperforms LLM as direct generator: Positioning the LLM as a planner driving the decision loop, rather than as a generator directly outputting sequences, better leverages the LLM's reasoning capabilities.
  4. Informativeness of tool feedback: Structured tool feedback (e.g., specific energy scores, structural deviations) provides richer learning signals than end-to-end loss functions.

Highlights & Insights

  1. Cross-domain paradigm transfer: The "planning + tool invocation + reflection" paradigm from NLP/AI agent research is successfully transferred to protein design, demonstrating the cross-domain potential of agent frameworks.
  2. Bridging the plan-execute gap: The work explicitly identifies the "can plan but cannot execute" problem of LLMs in protein design and provides an elegant solution through the tool environment.
  3. Iterative optimization vs. one-shot generation: Protein design is ill-suited to single-step generation; multi-round feedback loops better align with actual domain workflows.
  4. Supervised + RL training strategy: Balancing imitation learning and exploratory learning in agent training constitutes an effective paradigm for training complex agents.

Limitations & Future Work

  1. Full text unavailable in HTML: This note is based on abstract-level information; specific experimental data and methodological details are pending further review.
  2. Computational cost: Multi-round tool invocations and LLM inference may result in high per-design computational overhead.
  3. Tool environment fidelity: The accuracy of computational tools (e.g., structure prediction) directly impacts the quality of agent decision-making.
  4. Sequence space coverage: The sequence space explored by RL is limited and may miss high-quality designs far from the training distribution.
  5. Findings-only acceptance: As a Findings paper, certain aspects (e.g., experimental scale or baseline comparisons) may be less comprehensive than main-track publications.
  6. Absence of wet-lab validation: No experimental validation of generated proteins in vitro is reported.
  1. Protein LLMs (ProtGPT2, ESM, etc.): Methods that directly generate protein sequences using LLMs; ProtoCycle instead positions the LLM as a planner.
  2. AlphaFold: A protein structure prediction tool that can serve as an evaluation component within ProtoCycle's tool environment.
  3. ReAct/OctoTools and related agent frameworks: Agent framework paradigms from NLP that ProtoCycle adapts for protein design.
  4. RLHF/Online RL: The training methodology draws on the RLHF paradigm from NLP, substituting tool feedback for human feedback.

Rating

  • Novelty: ⭐⭐⭐⭐ — Introducing the agent paradigm into protein design is an interesting cross-domain attempt; the reflective iterative design aligns well with domain intuition.
  • Experimental Thoroughness: ⭐⭐⭐ — Based on available information, experiments demonstrate method effectiveness, though full-text details are absent (ablations confirm the importance of reflection).
  • Writing Quality: ⭐⭐⭐⭐ — Problem definition is clear (plan-execute gap); framework design is intuitive.
  • Value: ⭐⭐⭐⭐ — Demonstrates the application potential of LLM agent frameworks in scientific discovery and establishes a new paradigm for protein design.

Highlights & Insights

To be supplemented upon full reading of the paper.

Limitations & Future Work

To be supplemented upon full reading of the paper.

To be supplemented upon full reading of the paper.

Rating

  • Novelty: Pending
  • Experimental Thoroughness: Pending
  • Writing Quality: Pending
  • Value: Pending