Skip to content

📄 multi_agent

💬 ACL2026 · 12 paper notes

📌 Same area in other venues: 🧪 ICML2026 (6)

🔥 Top topics: Agents ×12 · LLM ×4 · Reasoning ×2

BookAgent: Orchestrating Safety-Aware Visual Narratives via Multi-Agent Cognitive Calibration

BookAgent is a safety-aware multi-agent framework that generates high-quality, character-consistent, and content-safe picture books end-to-end from user drafts through a three-stage closed-loop architecture: Value-Aligned Storyboard (VAS) + Iterative Cross-Modal Refinement (ICR) + Temporal Cognitive Calibration (TCC).

Collaborative Multi-Agent Scripts Generation for Enhancing Imperfect-Information Reasoning in Murder Mystery Games

This paper proposes a collaborative multi-agent framework for automatically generating high-quality murder mystery game scripts and training data. Through a two-stage training strategy (CoT fine-tuning + GRPO reinforcement learning with ScoreAgent reward shaping), it enhances VLM multi-hop reasoning under imperfect information, achieving significant improvements on WhodunitBench in narrative reasoning, fact extraction, and deception resistance.

Diversity Collapse in Multi-Agent LLM Systems: Structural Coupling and Collective Failure in Open-Ended Idea Generation

Through evaluating over 10,000 research proposals, this paper systematically reveals the phenomenon of "diversity collapse" in multi-agent LLM systems across three levels — model intelligence, agent cognition, and system dynamics. Stronger models, authority-driven role assignments, and dense communication topologies all suppress semantic diversity, with the root cause residing in interaction structure rather than insufficient model capability.

MAGEO: From Experience to Skill — Multi-Agent Generative Engine Optimization via Reusable Strategy Learning

This paper reframes Generative Engine Optimization (GEO) from per-instance heuristic optimization to a strategy learning problem, proposing the MAGEO multi-agent framework. The execution layer consists of four collaborating agents — preference, planning, editing, and evaluation — operating in an iterative Generate-Evaluate-Select loop, while the learning layer distills validated edit patterns into reusable engine-specific strategy skills. A Twin Branch causal evaluation protocol and the DSV-CF dual-axis metric are introduced, achieving substantial improvements over heuristic baselines across three mainstream generative engines.

From Query to Counsel: Structured Reasoning with a Multi-Agent Framework and Dataset for Legal Consultation

This paper introduces JurisCQAD—a large-scale dataset of 43,000+ real Chinese legal consultations—and proposes the JurisMA multi-agent framework, which performs structured task decomposition via a legal element graph and dynamic multi-agent collaboration (Manager Agent + Format Check + Law Search), achieving significant improvements over both general-purpose and law-specialized LLMs on LawBench.

MATA: Multi-Agent Framework for Reliable and Flexible Table Question Answering

This paper proposes MATA, a multi-agent framework for table question answering that employs a scheduler to prioritize reasoning paths (CoT/PoT/text2SQL), a confidence checker to filter candidate answers, and a judge agent for arbitration. The framework achieves model-agnostic, efficient, and accurate TableQA, with an average EM improvement of 40.1% across 10 LLMs.

Memory-Augmented LLM-based Multi-Agent System for Automated Feature Generation on Tabular Data

This paper proposes MALMAS, a memory-augmented LLM-based multi-agent system for automated feature generation on tabular data. It employs six specialized agents to explore different dimensions of the feature space in parallel, coordinated by a Router Agent, and leverages a three-tier memory mechanism (procedural/feedback/conceptual) for cross-iteration experience accumulation and strategy refinement. MALMAS outperforms existing baselines on 16 classification and 7 regression datasets.

Preference Estimation via Opponent Modeling in Multi-Agent Negotiation

This paper proposes a preference estimation method that integrates LLM-extracted natural language preference signals into a Bayesian opponent modeling framework. In multi-party, multi-issue negotiations, it fuses qualitative cues and quantitative bid information via a linguistic likelihood function, improving the full agreement rate (FAR) from 37% to 62%.

SILO-BENCH: A Scalable Environment for Evaluating Distributed Coordination in Multi-Agent LLM Systems

This paper introduces SILO-BENCH, a role-agnostic benchmark for evaluating distributed coordination in multi-agent LLM systems. It comprises 30 algorithmic tasks across three communication complexity levels, with 54 configurations yielding 1,620 experiments. The benchmark reveals a critical communication-reasoning gap: agents spontaneously form reasonable communication topologies and actively exchange information, yet systematically fail to integrate distributed state into correct answers.

To Trust or Not to Trust: Attention-Based Trust Management for LLM Multi-Agent Systems

This paper proposes the first comprehensive definition of "trustworthiness" for LLM multi-agent systems (LLM-MAS), grounded in six orthogonal dimensions derived from Grice's Cooperative Principle. It demonstrates that LLM attention patterns can distinguish different types of trustworthiness violations, and on this basis introduces A-Trust, a lightweight attention-based evaluation method, and an end-to-end Trust Management System (TMS) that achieves malicious message detection rates of 77–90% across diverse attack scenarios.

Towards Robust Real-World Spreadsheet Understanding with Multi-Agent Multi-Format Collaboration

This paper proposes SpreadsheetAgent, a two-stage multi-agent framework that achieves robust real-world spreadsheet understanding through progressive region-based reading and cross-validation across three formats—code execution, vision, and LaTeX—without exceeding LLM context limits.

Towards Self-Improving Error Diagnosis in Multi-Agent Systems

This paper proposes ErrorProbe, a framework that achieves self-improving semantic fault attribution in multi-agent systems through MAST taxonomy-driven structured decomposition, symptom-driven backward tracing, and a verified memory mechanism. The approach substantially outperforms baselines, particularly in step-level error localization.