Beyond the Individual: Virtualizing Multi-Disciplinary Reasoning for Clinical Intake via Collaborative Agents¶

Conference: ACL 2026 arXiv: 2604.08927 Code: GitHub Area: Medical AI / Multi-Agent Systems Keywords: multidisciplinary team consultation, multi-agent systems, clinical intake, SOAP notes, dynamic topology

TL;DR¶

This paper proposes Aegle, a graph-structured multi-agent framework that virtualizes multidisciplinary team (MDT) consultation for clinical intake. By introducing decoupled parallel reasoning and dynamic topology into the outpatient interview workflow, Aegle surpasses state-of-the-art models across 53 metrics spanning 24 clinical departments.

Background & Motivation¶

Background: Clinical intake is a critical stage in medical decision-making, requiring physicians to transform unstructured patient narratives into SOAP-formatted Initial Progress Notes (IPN). Current LLM-assisted approaches fall into two categories: document generation (e.g., Med-PaLM 2) and interactive consultation (e.g., AMIE), both of which rely on single-model architectures.

Limitations of Prior Work: (1) Individual physicians or models under time pressure are prone to anchoring bias, over-focusing on salient symptoms while overlooking subtle cues; (2) existing interactive systems largely function as passive recipients, lacking the ability to ask proactive exclusionary questions; (3) while MDT consultation mitigates cognitive bias, it is costly and difficult to scale to routine outpatient settings.

Key Challenge: The tension between the systematic reasoning depth afforded by MDT-level consultation and the resource constraints of real-time outpatient practice. Additionally, multi-agent systems face the "flawed consensus" problem, wherein agents may mutually reinforce biases and suppress correct minority opinions.

Goal: To virtualize the cognitive advantages of MDT at low cost, enabling multi-perspective collaborative reasoning in real-time outpatient settings.

Key Insight: A graph-structured multi-agent architecture simulates MDT collaboration — decoupled parallel reasoning preserves hypothesis diversity, dynamic topology activates specialist agents on demand, and SOAP-structured state ensures traceable reasoning.

Core Idea: A three-tier architecture comprising an Orchestrator that dynamically activates specialist agents, specialist agents that perform decoupled parallel reasoning, and an Aggregator that integrates outputs and updates the structured clinical state — collectively virtualizing the MDT consultation process.

Method¶

Overall Architecture¶

Aegle is built on DeepSeek-V3.2 and employs a two-stage finite state machine to conduct clinical interviews. Stage I performs iterative history-taking (evidence collection), while Stage II handles diagnostic synthesis (generating diagnoses after the evidence set is frozen). Throughout the process, an incrementally updated structured clinical state \(\mathcal{S}_t = [\mathcal{F}_t, \mathcal{P}_t]\) is maintained, where \(\mathcal{F}\) corresponds to the S+O components of SOAP (factual evidence) and \(\mathcal{P}\) corresponds to the A+P components (diagnosis and plan).

Key Designs¶

Structured Clinical State:
- Function: Serves as a shared "blackboard" for all agents, separating evidence collection from diagnostic reasoning.
- Mechanism: The SOAP schema is formalized as \(\mathcal{S}_t = [\mathcal{F}_t, \mathcal{P}_t]\), where \(\mathcal{F}\) (Case Features) accumulates verifiable facts including demographic information, history of present illness, past medical history, and physical examination findings; \(\mathcal{P}\) (Diagnosis & Plan) is generated only after \(\mathcal{F}\) has stabilized. A strict unidirectional dependency is enforced: \(\mathcal{F} \to \mathcal{P}\).
- Design Motivation: To prevent premature commitment to immature diagnostic hypotheses and ensure that clinical conclusions remain traceable to specific evidence.
Dynamic Multi-Agent Graph Topology:
- Function: Activates specialist agents on demand, avoiding unnecessary expert involvement.
- Mechanism: Three node types collaborate — the Orchestrator applies a routing policy \(\pi_{orch}\) to select an active specialist subset \(A_{sub}\) based on dialogue history and current evidence; Specialist Agents independently and in parallel analyze the case (decoupled reasoning); the Aggregator follows a "write-before-speak" protocol to integrate specialist recommendations, update the state \(\mathcal{S}_{t+1}\), and then generate patient-facing dialogue.
- Design Motivation: Decoupled parallel reasoning preserves hypothesis diversity (avoiding groupthink), while dynamic activation mirrors the on-demand expert recruitment characteristic of real MDT settings.
Sequential Clinical Execution:
- Function: Strictly separates evidence acquisition from diagnostic reasoning, serving as an explicit bias control mechanism.
- Mechanism: In Stage I, the Orchestrator iteratively activates specialist agents to propose follow-up questions; the Aggregator integrates these suggestions to generate the next round of inquiry. Once sufficient evidence is gathered, the process transitions to Stage II, where \(\mathcal{F}\) is frozen and \(\mathcal{P}\) (diagnosis and treatment plan) is generated from the complete evidence set.
- Design Motivation: To prevent premature commitment to a diagnostic direction under incomplete evidence, emulating the real MDT practice of thoroughly discussing clinical findings before reaching consensus.

Loss & Training¶

Aegle is an inference-time framework rather than a trained model. It leverages the zero-shot capabilities of DeepSeek-V3.2 through structured prompting and role assignment. No additional training is required.

Key Experimental Results¶

Main Results¶

Dataset	Metric	Aegle	DeepSeek-V3.2	GPT-4o	Gain
ClinicalBench	IDEA	63.80	50.51	41.05	+13.3
ClinicalBench	SOAP	53.42	38.64	29.38	+14.8
ClinicalBench	READ	76.20	71.73	67.66	+4.5
RAPID-IPN	IDEA	67.31	54.35	44.70	+13.0
RAPID-IPN	SOAP	60.09	47.39	34.79	+12.7
RAPID-IPN	READ	80.18	72.14	69.89	+8.0

Evaluation covers 24 clinical departments and 53 fine-grained metrics.

Ablation Study¶

Configuration	IDEA	SOAP	Notes
Aegle (full)	63.80	53.42	Complete framework
Single agent (DeepSeek-V3.2)	50.51	38.64	No MDT collaboration
MiniMax-M2	57.78	46.18	Strongest single-model baseline

Key Findings¶

Aegle consistently outperforms all baselines across all 53 metrics, including closed-source models such as GPT-4o and Gemini 2.5.
Even when sharing the same backbone (DeepSeek-V3.2), the multi-agent framework yields a +13.3-point gain on IDEA, demonstrating the intrinsic value of the collaborative architecture.
Gains are more pronounced on the real-world clinical dataset RAPID-IPN, indicating strong generalization of the framework to realistic settings.

Highlights & Insights¶

Decoupled Parallel Reasoning: Independent analysis by each specialist agent avoids the "flawed consensus" problem, offering a safer and more controllable alternative to debate-based multi-agent systems. This design is transferable to other domains requiring multi-perspective analysis, such as legal review or financial risk assessment.
SOAP-Structured State as Shared Blackboard: The framework elevates a clinical documentation standard into a reasoning control mechanism — functioning not merely as a recording format but as a bias control instrument. This "structure as constraint" paradigm offers broad methodological inspiration.
Write-Before-Speak Protocol: The Aggregator updates internal state before generating patient-facing dialogue, ensuring a clean separation between technical precision and patient communication, which is critical for the deployability of medical AI systems.

Limitations & Future Work¶

The framework relies entirely on the zero-shot capabilities of DeepSeek-V3.2; fine-tuning for clinical scenarios remains unexplored.
Multi-agent invocation substantially increases inference cost (multiplicative API call overhead), requiring careful consideration of latency and cost in real deployments.
Evaluation is primarily based on Chinese clinical data; cross-lingual and cross-cultural generalizability remains to be validated.
Integration of multimodal information — including imaging and laboratory test results — is not addressed.

vs. AMIE: AMIE is a single-model interactive consultation system susceptible to anchoring bias; Aegle expands the hypothesis space through multi-agent parallel reasoning.
vs. MDAgents: MDAgents adapts topology based on task complexity, but agent interactions remain opaque; Aegle explicitly constrains the reasoning chain via SOAP-structured state.
vs. MedAgents: MedAgents employs debate-style collaboration, which risks producing flawed consensus; Aegle's decoupled parallel reasoning with independent analysis prevents inter-agent interference.

Rating¶

Novelty: ⭐⭐⭐⭐ The framework design for virtualizing MDT is novel; the formalization of SOAP-structured state as a reasoning mechanism is creative.
Experimental Thoroughness: ⭐⭐⭐⭐⭐ Evaluation spans 24 departments, 53 metrics, two datasets (ClinicalBench and a real-world dataset), and multiple SOTA baselines.
Writing Quality: ⭐⭐⭐⭐ The framework is clearly described, though the heavy use of formal notation in places could be further simplified.