Gödel Agent: A Self-Referential Agent Framework for Recursive Self-Improvement¶
Conference: ACL 2025
arXiv: 2410.04444
Code: https://github.com/Arvid-pku/Godel_Agent
Area: LLM Agent
Keywords: Self-Referential Agent, Recursive Self-Improvement, Monkey Patching, Meta-Learning, Agent Design Space Search
TL;DR¶
Introduces Gödel Agent, a self-referential agent framework inspired by the Gödel Machine, which can read and modify its own code (including modifying its own modification logic) at runtime via Python monkey patching to achieve recursive self-improvement. It outperforms hand-crafted and meta-learning-optimized agents on DROP, MGSM, MMLU, and GPQA.
Background & Motivation¶
Background: LLM agent systems are divided into two categories—hand-crafted agents with fixed workflows (e.g., CoT, Self-Refine, LLM Debate) and meta-learning optimized agents (e.g., Meta Agent Search, DSPy). The former relies entirely on human priors, while the latter can automatically optimize policies but employs a fixed meta-learning algorithm.
Limitations of Prior Work: (1) Hand-crafted agents cannot evolve once deployed; (2) Meta-learning agents are constrained by fixed meta-algorithms, leaving search spaces artificially restricted—for instance, only optimizing prompts or only collecting demonstrations; (3) Neither can search the complete agent design space.
Key Challenge: The agent's "optimization method" is itself an object that can be optimized. If the optimization algorithm is fixed, it remains impossible to discover better optimization algorithms.
Goal: Can an agent autonomously determine its own operational logic, module design, and even modify its own optimization methods?
Key Insight: Inspired by Schmidhuber’s Gödel Machine theory—a system that can modify any part of itself (including the modification logic itself) can theoretically find the global optimum.
Core Idea: Implementing complete self-reference—the agent can read all of its own code, modify any part (including decision functions and execution functions), and then recursively continue to improve itself.
Method¶
Overall Architecture¶
Gödel Agent = initial policy \(\pi_0\) + self-referential learning algorithm \(I_0\). Formalized as \(\pi_{t+1}, I_{t+1} = I_t(\pi_t, I_t, r_t, g)\), meaning the optimization algorithm \(I\) can modify not only \(\pi\) but also itself. The main logic is implemented using recursive functions (rather than loops) and dynamically modifies code at runtime via monkey patching.
Key Designs¶
-
Self-Awareness:
- Function: The agent reads all of its own code by inspecting the Python runtime memory (local/global variables)
- Mechanism: The
SELF_INSPECT()function extracts all variables, functions, and class definitions of the agent - Design Motivation: The prerequisite of self-reference is knowing one's current state, including the code logic
-
Self-Modification:
- Function: The agent generates new code using LLMs and writes it into runtime memory via monkey patching
- Mechanism: The
self_updateaction prompts the LLM to analyze the current policy and feedback, generate modified code, and dynamically replace running functions/classes - Design Motivation: Monkey patching allows replacing any function without restarting the program, achieving "modifying oneself while running"
-
Recursive Self-Improvement:
- Function: The main function is implemented using recursion instead of loops, with each deeper recursion applying the latest modifications
- Mechanism: The
SELF_IMPROVEfunction recursively calls itself; within each recursion layer, the agent can modify theSELF_IMPROVEfunction itself, which will be utilized by the next recursion layer - Design Motivation: The recursive structure allows modifications of the modification logic to take effect in the subsequent call, which is key to "recursive self-improvement"
-
Four Initial Actions:
self_inspect: Inspects its own codeinteract: Interacts with the environment to obtain feedback \(r = U(\mathcal{E}, \pi)\)self_update: Modifies its own codecontinue_improve: Recursively calls self-improvement- The action set itself can also be modified and expanded by the agent
Auxiliary Designs¶
- Think Before Acting: Allows the agent to reason before taking action, enhancing decision quality
- Error Handling: Instead of terminating when a code modification fails, the execution proceeds to the next iteration along with the error traceback
- Extra Tools: Executing Python/Bash code, calling LLM APIs
Loss & Training¶
- No traditional training—the agent iteratively self-improves on a validation set
- 6 independent cycles per task, with a maximum of 30 iterations per cycle
- The initial policy is CoT, and GPT-3.5 is used for all tests
Key Experimental Results¶
Main Results¶
| Method | DROP (F1) | MGSM (Acc) | MMLU (Acc) | GPQA (Acc) |
|---|---|---|---|---|
| CoT | 64.2 | 28.0 | 65.4 | 29.2 |
| CoT-SC | 64.4 | 28.2 | 65.9 | 30.5 |
| Self-Refine | 59.2 | 27.5 | 63.5 | 31.6 |
| LLM Debate | 60.6 | 39.0 | 65.6 | 31.4 |
| Meta Agent Search | 79.4 | 53.4 | 69.6 | 34.6 |
| Gödel-base (GPT-3.5) | 80.9 | 64.2 | 70.9 | 34.9 |
| Gödel-free (Unconstrained) | 90.5 | 90.6 | 87.9 | 55.7 |
Ablation Study (MGSM)¶
| Configuration | Accuracy |
|---|---|
| Full | 64.2 |
| w/o think | 50.8 (-13.4) |
| w/o error handling | 49.4 (-14.8) |
| w/o code running | 57.1 (-7.1) |
| w/o LLM calling | 60.4 (-3.8) |
Key Findings¶
- Outperforms Meta Agent Search by 11 percentage points on MGSM (64.2 vs 53.4), suggesting mathematical reasoning tasks offer larger room for self-improvement
- Performance explodes in unconstrained mode: The agent spontaneously requests GPT-4o assistance, elevating GPQA score from 34.9 to 55.7
- Error Handling is extremely critical: Performance drops by 14.8 points when removed—since LLM-generated code often contains errors, fault-tolerance mechanisms are essential for sustained optimization
- Only 14% of the optimization trials ultimately failed: Although temporary performance degradation occurred in 92% of the trials, they eventually outperformed the initial policy
- Game of 24 Case Study: The agent autonomously switched from LLM reasoning to search algorithms, achieving 100% accuracy—completely breaking free from the initial methodology
Highlights & Insights¶
- "The capacity to modify the modification of itself"—True Meta-Recursion: Unlike meta-learning which only optimizes policies, Gödel Agent can modify the optimizer itself, theoretically approaching the global optimum infinitely
- Clever Utilization of Monkey Patching: Elegantly implements the abstract concept of "self-modification" using Python runtime features, rendering it simple and viable from an engineering perspective
- Insights from the Unconstrained Mode: The agent autonomously deciding to call stronger models is a "smart" strategy, hinting that future agents need to autonomously manage computing resources
Limitations & Future Work¶
- Constrained by the code generation capability of current LLMs—the agent struggles to invent entirely new algorithms that surpass SOTA (e.g., initialized from CoT, it cannot independently invent ToT)
- 4% of the trials terminated unexpectedly (usually because modifying the recursive self-improvement module itself prevented continuation)
- Performance gains in the unconstrained mode mainly stem from calling stronger LLMs rather than algorithmic innovation
- Safety Issues: Self-modifying agents may introduce unpredictable behaviors
- Tested only on GPT-3.5; the self-improvement headroom on stronger base models remains unknown
Related Work & Insights¶
- vs Meta Agent Search: MAS optimizes agent modules using a fixed meta-search algorithm. Gödel Agent can self-modify even the meta-search algorithm itself, yielding a larger search space
- vs Self-Refine/Reflexion: They can only modify outputs, but not their own reasoning logic
- vs Gödel Machine (Schmidhuber 2003): A theoretical pioneer, but the original version requires formal mathematical proofs to carry out self-modification. Gödel Agent substitutes formal proofs with the heuristic capabilities of LLMs
Rating¶
- Novelty: ⭐⭐⭐⭐⭐ The first fully self-referential LLM agent framework, featuring a highly breakthrough concept
- Experimental Thoroughness: ⭐⭐⭐⭐ 4 benchmarks + detailed ablation studies + Game of 24 case analysis
- Writing Quality: ⭐⭐⭐⭐⭐ Clear theoretical formalization and accurate analogies to the Gödel Machine
- Value: ⭐⭐⭐⭐⭐ Opens up a completely new direction for agent self-improvement