LLMs Can Simulate Standardized Patients via Agent Coevolution¶
Conference: ACL 2025
arXiv: 2412.11716
Code: https://github.com/ZJUMAI/EvoPatient
Area: Medical NLP
Keywords: Standardized Patients, Multi-Agent Coevolution, Medical Education, Role-Playing, Dialogue Simulation
TL;DR¶
EvoPatient proposes a multi-agent coevolution framework. Through autonomous simulated dialogues between patient and doctor agents, LLMs learn to simulate standardized patients (SP) without human supervision, surpassing existing reasoning methods by more than 10% in requirement alignment.
Background & Motivation¶
Background: Standardized Patients (SPs) are critical in medical education, but training human SPs is costly and can negatively impact their mental health.
Limitations of Prior Work: - Rule-based digital patients lack authenticity and diversity. - Existing LLM-based SPs focus only on data retrieval or manual feedback-driven prompt adjustment, ignoring the critical need to learn standardized expression patterns. - LLMs must play a dual role of possessing professional medical knowledge while behaving like a medical layperson, which cannot be solved by simple prompt engineering alone.
Key Challenge: How to enable LLMs to autonomously learn to transform clinical record data into human-like standardized patient responses under zero human supervision.
Goal: To design a framework that allows patient agents to automatically accumulate experience through simulated diagnostic processes, evolving from novices into qualified SPs.
Key Insight: Coevolution between doctor and patient agents—the more professional the doctor's questions become, the more standardized the patient's responses are, forming a positive feedback loop.
Core Idea: Empowering patient agents to automatically accumulate standardized expression experience through a coevaluation mechanism in multi-agent simulated diagnostic processes.
Method¶
Overall Architecture¶
The input consists of real clinical records → Three core modules: (1) Simulated Flow models the diagnostic process into multi-stage dialogues → (2) Simulated Agent Pair performs autonomous multi-round dialogues (patient + multiple doctors) → (3) Coevolution Mechanism validates and stores high-quality dialogues to construct an attention library and a trajectories library → Output the trained virtual SP for human doctor training.
Key Designs¶
-
Simulated Flow (模拟流程):
- Function: Models the real diagnostic process into a structured, multi-stage dialogue sequence.
- Mechanism: Divides the diagnostic workflow into phases such as chief complaint generation, triage, inquiry, and conclusion \(\mathcal{F} = \langle \mathcal{S}^1, \mathcal{S}^2, \dots \rangle\), where each phase contains multi-round dialogues between doctor and patient.
- Design Motivation: Structured workflows make the simulation resemble real-world scenarios while allowing flexible customization of different phases.
-
Simulated Agent Pair (模拟Agent对):
- Function: Enables autonomous multi-round diagnostic dialogues between the patient agent and multiple doctor agents.
- Mechanism: The patient agent is equipped with 5,000 diverse background profiles (family, education, personality, etc.) + RAG retrieval of clinical records; the doctor agent dynamically recruits agents from other departments via a multi-disciplinary consultation mechanism, forming a DAG topological structure to prevent information backflow and increase question diversity.
- Design Motivation: Questions from multi-department doctors cover different perspectives, prompting the patient agent to learn more comprehensive standardized expressions.
-
Coevolution Mechanism (协同进化机制):
- Function: Automatically validates, stores, and utilizes high-quality dialogue experience.
- Mechanism: Two experience libraries work collaboratively:
- Attention Library: Decomposes SP requirements into question-level attention requirements, storing
quadruples. At inference time, it retrieves similar questions to obtain refined requirements and few-shot demonstrations. - Trajectories Library: Stores high-quality dialogue trajectories \(t_i = \{(q_{j-1}, a_{j-1}, q_j, a_j)\}\), helping doctor agents learn more professional inquiry pathways.
- Attention Library: Decomposes SP requirements into question-level attention requirements, storing
- Design Motivation: Enables both sides to learn from past experiences—the patient learns standardized expressions, while the doctor learns professional questioning, forming a positive feedback loop.
Loss & Training¶
Training-free. Based on GPT-4 / GPT-3.5-turbo. The coevolution process requires approximately 200 cases and reaches an optimal balance in 10 hours, demonstrating strong generalization capability.
Key Experimental Results¶
Main Results¶
| Method | Relevance | Faithfulness | Robustness | Ability (Comprehensive) |
|---|---|---|---|---|
| CoT | 0.716 | 0.557 | 0.671 | 0.648 |
| ToT | 0.747 | 0.714 | 0.771 | 0.744 |
| Self-Align | 0.721 | 0.727 | 0.815 | 0.754 |
| Few-shot | 0.725 | 0.742 | 0.821 | 0.763 |
| EvoPatient | 0.759 | 0.879 | 0.941 | 0.860 |
Ablation Study¶
| Configuration | Description |
|---|---|
| w/o Attention Library | Faithfulness and Robustness drop significantly |
| w/o Trajectories Library | Doctor inquiry quality degrades, indirectly affecting patient evolution |
| w/o Multi-disciplinary Consultation | Question diversity decreases, slowing down the improvement rate of SP capability |
Key Findings¶
- Robustness improves most significantly (0.941 vs. second-best 0.821), proving that the coevolution mechanism effectively prevents the patient agent from leaking excessive medical information.
- Multi-disciplinary consultation contributes positively to the evolution of the patient agent—doctors from more disciplines ask more diverse questions.
- The evolution process stabilizes after about 200 cases, with reasonable resource consumption.
- In human doctor preference evaluation, EvoPatient achieves the highest score.
Highlights & Insights¶
- The coevolution intuition is compelling: the doctor and patient push each other to improve, eliminating the need for human annotation.
- The Attention Library decomposes SP requirements into question-level attention requirements, which is more practical than global requirements.
- The DAG topology design of the multi-disciplinary consultation is elegant, avoiding information backflow while ensuring question diversity.
- The framework is highly generalizable and can be transferred to other role-playing simulation scenarios such as education and law.
Limitations & Future Work¶
- Reliance on high-quality real clinical records, which poses ethical and privacy hurdles for data acquisition.
- Evaluation metrics (Relevance/Faithfulness/Robustness) are primarily based on automatic evaluations, with a limited scale for human trials.
- The performance of open-source models has not been fully explored (only GPT-4/3.5 were evaluated), resulting in high costs.
- Emotional expressions and non-verbal behaviors of patient agents are not yet addressed.
Related Work & Insights¶
- vs. Agent Hospital: Agent Hospital performs self-evolution in a simulated world but lacks training for standardized expressions; EvoPatient focuses explicitly on the standardization capabilities of SPs.
- vs. ExpeL: ExpeL accumulates experience from successful trajectories but works as a single agent, whereas EvoPatient coevolves multiple agents to improve both sides simultaneously.
- vs. Self-Align: Self-Align uses principle-driven reasoning for alignment, whereas EvoPatient accumulates experience through actual simulated dialogues.
Rating¶
- Novelty: ⭐⭐⭐⭐ The idea of multi-agent coevolution to train SPs is novel and practically meaningful.
- Experimental Thoroughness: ⭐⭐⭐⭐ Extensive baseline comparisons + ablation studies + human evaluations + resource analysis.
- Writing Quality: ⭐⭐⭐⭐ Clear structure with standardized formal formulation.
- Value: ⭐⭐⭐⭐ A highly practical framework in the field of medical education, with transferable coevolution paradigms.