MASS-RAG: Multi-Agent Synthesis Retrieval-Augmented Generation¶
Conference: ACL 2026 arXiv: 2604.18509 Code: None Area: Information Retrieval / RAG Keywords: Multi-Agent RAG, Evidence Synthesis, Training-Free, Multi-Perspective Filtering, Heterogeneous Evidence Fusion
TL;DR¶
This paper proposes MASS-RAG, a training-free multi-agent synthesis RAG framework that employs three specialized filtering agents—Summarizer, Extractor, and Reasoner—to process retrieved documents from complementary perspectives, followed by a Synthesis Agent that integrates multi-perspective evidence or candidate answers, consistently outperforming strong baselines across four benchmarks.
Background & Motivation¶
Background: RAG enhances the factuality of LLMs by introducing external knowledge at inference time. However, when retrieved contexts are noisy, incomplete, or heterogeneous, a single generation process struggles to effectively coordinate evidence.
Limitations of Prior Work: (1) Existing multi-agent RAG systems (e.g., Chang et al. 2024) rely on a single judge agent to filter context from a single perspective, failing to capture complementary or heterogeneous factual evidence; (2) irrelevant or redundant retrieved information degrades generation quality; (3) for questions requiring cross-document aggregation of complementary evidence, a single perspective is particularly inadequate.
Key Challenge: Retrieved documents may contain relevant evidence in different forms—some requiring summarization, some requiring precise extraction, and some requiring inferential reasoning—making a single filtering strategy insufficient to handle all cases.
Goal: To design a multi-perspective evidence filtering and synthesis mechanism that enables RAG systems to process and integrate retrieved documents from complementary angles.
Key Insight: Evidence processing is decomposed into three complementary perspectives—summarization (compressing while preserving semantics), extraction (verbatim extraction of precise evidence), and reasoning (inferring implicit relationships)—realized through a division of labor among multiple agents.
Core Idea: Different types of questions require different types of evidence processing. MASS-RAG generates multiple evidence views in parallel through multiple agents and then produces a more robust final answer via explicit comparison and integration.
Method¶
Overall Architecture¶
MASS-RAG operates in three stages: (1) Evidence Distillation—three agents (Summarizer, Extractor, Reasoner) independently extract denoised, query-relevant evidence from retrieved documents; (2) Candidate Answer Generation (optional)—an Answer Agent independently generates candidate answers based on each filtered result; (3) Final Synthesis—a Synthesis Agent integrates the three evidence views or three candidate answers to produce a unified final prediction.
Key Designs¶
-
Three-Perspective Filtering Agent Design:
- Function: Extract query-relevant evidence from three complementary perspectives.
- Mechanism: The Summarizer compresses retrieved documents into concise, semantically coherent summaries \(R_i^{(s)} = \mathcal{A}_{\text{sum}}(q_i, D)\); the Extractor verbatim extracts precise factual spans \(R_i^{(e)} = \mathcal{A}_{\text{ext}}(q_i, D)\); the Reasoner infers implicit cross-document relationships \(R_i^{(r)} = \mathcal{A}_{\text{rea}}(q_i, D)\).
- Design Motivation: Different question types are suited to different evidence processing strategies—factoid questions call for precise extraction, synthesis questions call for reasoning, and informational questions call for summarization. Multi-perspective coverage ensures that at least one strategy captures the correct evidence.
-
Optional Answer Agent + Synthesis:
- Function: Reconcile competing hypotheses through explicit comparison of intermediate candidate answers.
- Mechanism: When the Answer Agent is enabled, each filtered result independently generates a candidate answer \(A_i^{(j)} = \mathcal{A}_{\text{ans}}(q_i, R_i^{(j)})\), and the Synthesis Agent then compares and integrates the three candidates; when disabled, the three evidence representations are integrated directly.
- Design Motivation: For factoid QA, candidate answers carry rich semantic signals and different perspectives may yield complementary or competing hypotheses. For multiple-choice tasks, intermediate candidate answer signals are limited and this step can be skipped.
-
Training-Free Role Specialization:
- Function: Achieve agent role differentiation without fine-tuning.
- Mechanism: Each agent is specialized through carefully designed role prompts and output constraints—the Summarizer is constrained to compress, the Extractor to verbatim extraction, and the Reasoner to generate intermediate inferential representations.
- Design Motivation: A training-free design enables plug-and-play deployment on any LLM, lowering the barrier to adoption.
Loss & Training¶
MASS-RAG is a training-free framework. All agents share the same LLM backbone (Llama-3-8B and Llama-2-7B/13B in experiments) and are differentiated solely through role prompts.
Key Experimental Results¶
Main Results¶
Accuracy on Four Benchmarks (Llama-3-8B + Retrieval)
| Benchmark | Vanilla RAG | Chang et al. (Single-Agent Filtering) | MASS-RAG |
|---|---|---|---|
| TriviaQA | ~70 | ~72 | ~74 |
| PopQA | ~48 | ~52 | ~55 |
| ARC-C | ~60 | ~63 | ~66 |
Key Findings¶
- MASS-RAG shows the greatest advantage in scenarios requiring cross-document aggregation of complementary evidence.
- The Answer Agent provides significant gains for factoid QA (TriviaQA/PopQA) but offers limited benefit for multiple-choice tasks (ARC-C).
- Among the three filtering agents, the Reasoner contributes most individually, indicating that cross-document reasoning is the primary bottleneck.
Highlights & Insights¶
- The multi-perspective filtering approach is intuitively well-motivated—different questions genuinely require different evidence processing strategies, and a one-size-fits-all filtering approach inevitably discards certain types of evidence.
- The training-free design enables plug-and-play deployment directly on any LLM.
- The optional Answer Agent provides task-adaptive flexibility.
Limitations & Future Work¶
- The multi-agent design increases inference cost (3× filtering + synthesis).
- All agents share the same LLM backbone, with role differentiation implemented solely through prompting, limiting the degree of specialization achievable.
- Fair comparison with training-based RAG methods (e.g., Self-RAG) under equivalent conditions is absent.
- Performance on long-document QA scenarios is insufficiently validated.
Related Work & Insights¶
- vs. Self-RAG: A training-based method; MASS-RAG is training-free but incurs additional inference overhead.
- vs. Chang et al.: Single-agent filtering; MASS-RAG compensates for the blind spots of a single perspective through multi-perspective coverage.
- vs. REPLUG/Self-RAG: Those works focus on retrieval strategy optimization; MASS-RAG focuses on post-retrieval evidence processing optimization.
Rating¶
- Novelty: ⭐⭐⭐ The multi-agent filtering idea is valuable but not a breakthrough.
- Experimental Thoroughness: ⭐⭐⭐⭐ Four benchmarks, ablations, and multiple model sizes, though fair comparison with training-based methods is lacking.
- Writing Quality: ⭐⭐⭐⭐ Clear framework presentation with well-motivated design choices.
- Value: ⭐⭐⭐⭐ Offers practical improvements to evidence processing in RAG systems.