Skip to content

👥 Multi-Agent

🔬 ICLR2026 · 15 paper notes

📌 Same area in other venues: 🧪 ICML2026 (15) · 💬 ACL2026 (39) · 🤖 AAAI2026 (26) · 🧠 NeurIPS2025 (17)

🔥 Top topics: Agents ×13 · LLM ×4

AgentTrace: Causal Graph Tracing for Root Cause Analysis in Deployed Multi-Agent Systems

This paper proposes AgentTrace, a framework that constructs causal graphs from execution logs of multi-agent systems and localizes root cause nodes via backward tracing combined with lightweight feature-based ranking (a weighted linear combination of five feature groups). On 550 synthetic fault scenarios, AgentTrace achieves Hit@1 of 94.9% with a latency of 0.12 seconds—69× faster than LLM-based analysis.

Auditing Cascading Risks in Multi-Agent Systems via Semantic–Geometric Co-evolution

This paper proposes SCCAL, a framework that models semantic–geometric co-evolution in multi-agent systems (MAS) by coupling semantic flow with the Ollivier–Ricci curvature (ORC) of interaction graphs. The joint prediction residual between the two modalities serves as an early warning signal for cascading risks, enabling anomaly detection several rounds before semantic violations become observable.

Completing Missing Annotation: Multi-Agent Debate for Accurate and Scalable Relevance Assessment

This paper proposes DREAM — a multi-agent, multi-round debate framework with opposing-stance initialization for IR relevance annotation: cases with consensus are automatically labeled, while disagreements are escalated to human annotators (aided by debate history). DREAM achieves 95.2% balanced accuracy with only 3.5% human escalation. Based on this framework, the BRIDGE benchmark is constructed, uncovering 29,824 missing relevant annotations absent from existing benchmarks (428% of the original annotations), and correcting ranking bias in retrieval systems as well as retrieval-generation performance misalignment in RAG evaluation.

HAMLET: A Hierarchical and Adaptive Multi-Agent Framework for Live Embodied Theatre

This paper proposes HAMLET, a multi-agent framework that decouples AI theatrical creation and live performance into an offline planning phase and an online performance phase. Through a narrative blueprint, a Perceive And Decide (PAD) module, and a hierarchical control system, HAMLET enables an AI theatre experience characterized by proactivity, physical environment interaction, and improvisational freedom.

KVComm: Enabling Efficient LLM Communication through Selective KV Sharing

This paper proposes KVComm, a framework that enables efficient inter-LLM communication via selective KV pair sharing. It identifies an "information concentration bias" in hidden states that renders them unsuitable for cross-model transfer, and designs a layer selection strategy combining attention importance scores with a Gaussian prior. Transmitting only 30% of layers suffices to outperform most baselines.

LH-Deception: Simulating and Understanding LLM Deceptive Behaviors in Long-Horizon Interactions

This paper proposes LH-Deception, the first simulation framework for LLM deceptive behaviors in long-horizon interactions. It adopts a three-role multi-agent architecture comprising a performer, a supervisor, and a deception auditor, combined with a social-science-theory-driven probabilistic event system. Across 11 frontier models, the framework systematically quantifies deception frequency, severity, type distribution, and trust erosion effects, revealing an emergent "chain of deception" phenomenon that static single-turn evaluations are entirely unable to capture.

MAC-AMP: A Closed-Loop Multi-Agent Collaboration System for Multi-Objective Antimicrobial Peptide Design

This paper proposes MAC-AMP, the first closed-loop multi-agent collaboration system that reformulates antimicrobial peptide (AMP) design as a coordinated multi-agent optimization problem, achieving multi-objective optimization through AI-simulated peer review and adaptive reward design.

MMedAgent-RL: Optimizing Multi-Agent Collaboration for Multimodal Medical Reasoning

This paper proposes MMedAgent-RL, a multi-agent system that simulates clinical consultation workflows (triage → specialist → attending physician) optimized via reinforcement learning. The core innovation is Curriculum-guided Multi-Agent Reinforcement Learning (C-MARL) with entropy-aware exploration, enabling the attending physician agent to adopt differentiated explore–exploit strategies when faced with correct, conflicting, or erroneous specialist opinions. The system achieves state-of-the-art performance on 5 medical VQA benchmarks spanning both in-domain and out-of-domain settings.

Multi-agent Coordination via Flow Matching

This paper proposes MAC-Flow, which first learns a centralized joint behavior distribution via Flow Matching, then distills it into decentralized single-step policies through IGM (Individual-Global-Max) decomposition combined with Q-value maximization for behavior-regularized training. Evaluated across 4 benchmarks, 12 environments, and 34 datasets, MAC-Flow achieves approximately 14.5× inference speedup over diffusion-based methods while maintaining coordination performance comparable to diffusion policies.

Multi-Agent Design: Optimizing Agents with Better Prompts and Topologies

This paper proposes Multi-Agent System Search (MASS), a framework that automatically discovers high-performance multi-agent system (MAS) designs through a three-stage interleaved strategy of prompt and topology optimization: local prompt optimization → topology search → global prompt optimization.

Stochastic Self-Organization in Multi-Agent Systems

This paper proposes SelfOrg, a framework that dynamically constructs directed acyclic communication graphs (DAGs) based on semantic similarity of agent responses and Shapley value contribution estimates, enabling self-organized collaboration in multi-agent systems. The approach is particularly effective in weak-model settings.

Stop Wasting Your Tokens: Towards Efficient Runtime Multi-Agent Systems

This paper proposes SupervisorAgent, a lightweight real-time adaptive supervision framework that actively intervenes at critical interaction nodes (error correction, guidance provision, observation purification) via an LLM-free adaptive filter, reducing token consumption of Smolagent on the GAIA benchmark by 29.68% without sacrificing success rate.

UIS-Digger: Towards Comprehensive Research Agent Systems for Real-world Unindexed Information Seeking

This paper identifies and formalizes the problem of Unindexed Information Seeking (UIS)—dynamic web pages, embedded files, and interactive content that cannot be directly retrieved by search engines—and proposes the first UIS benchmark UIS-QA (110 questions) along with the multi-agent framework UIS-Digger. A ~30B parameter model trained with SFT+RFT achieves 27.27% accuracy, surpassing systems integrating O3/GPT-4.1.

When Agents "Misremember" Collectively: Exploring the Mandela Effect in LLM-based Multi-Agent Systems

This paper presents the first systematic study of the Mandela effect (collective false memory) in LLM-based multi-agent systems. It introduces the ManBench benchmark (4,838 questions, 5 interaction protocols), demonstrates that all 13 evaluated LLMs are susceptible to this effect, and proposes prompt-level and model-level mitigation strategies that reduce false memory by 74.40% on average.

Which LLM Multi-Agent Protocol to Choose?

This paper introduces ProtocolBench and ProtocolRouter, presenting the first systematic comparison of multi-agent communication protocols (A2A, ACP, ANP, Agora, etc.) across four dimensions—task success rate, latency, message overhead, and robustness—and proposes a learnable protocol router for scenario-adaptive protocol selection, reducing fault recovery time by up to 18.1%.