🔍 Information Retrieval & RAG¶

🧪 ICML2025 · 6 paper notes

📌 Same area in other venues: 🔬 ICLR2026 (81) · 💬 ACL2026 (73) · 🧪 ICML2026 (26) · 🤖 AAAI2026 (21) · 🧠 NeurIPS2025 (25) · 📹 ICCV2025 (5)

🔥 Top topics: RAG ×3

Don't Lag, RAG: Training-Free Adversarial Detection Using RAG: This paper proposes the VRAG framework, which constructs a training-free pipeline using an adversarial patch database + Vision Retrieval-Augmented Generation (VRAG) + VLM inference. It achieves highly efficient detection of various adversarial patch attacks, with Gemini-2.0 reaching 98% accuracy and the open-source model UI-TARS-72B-DPO reaching 95%.
FedRAG: A Framework for Fine-Tuning Retrieval-Augmented Generation Systems: FedRAG proposes a fine-tuning framework for RAG systems that supports both centralized and federated architectures. It fills the gap of lacking unified fine-tuning tools in the RAG ecosystem and achieves seamless transition from centralized to federated training through lightweight abstractions.
POQD: Performance-Oriented Query Decomposer for Multi-Vector Retrieval: POQD, a performance-oriented query decomposition framework, is proposed. It utilizes an LLM-based Prompt Optimizer to iteratively optimize query decomposition prompts, and jointly optimizes the prompts and downstream RAG model parameters through an alternating training algorithm, significantly outperforming existing methods on retrieval and end-to-end QA tasks.
RAPID: Long-Context Inference with Retrieval-Augmented Speculative Decoding: This paper proposes RAPID, a framework combining RAG with speculative decoding. It utilizes a RAG drafter (an LLM running on compressed retrieval contexts) to generate candidate tokens for a long-context target LLM, and enhances the target distribution through test-time knowledge distillation. This simultaneously delivers a >2× speedup and improved generation quality in long-context inference.
Unable to Forget: Proactive Interference Reveals Working Memory Limits in LLMs Beyond Context Length: Drawing on the Proactive Interference (PI) paradigm from cognitive science, this study finds that the information retrieval accuracy of LLMs decreases log-linearly to zero as the amount of interfering information increases. This reveals a "working memory" capacity bottleneck that is independent of context length and cannot be effectively mitigated by prompt engineering.
Understanding Synthetic Context Extension via Retrieval Heads: This paper reveals the underlying mechanism of why synthetic context extension works through systematic experiments: the "retrieval heads" trained on synthetic data highly overlap with those trained on real data. The recall rate of retrieval heads can predict downstream long-context task performance. Mechanistic necessity of retrieval heads is demonstrated using attention knockout and activation patching.