🗣️ Dialogue Systems¶

🧪 ICML2026 · 5 paper notes

📌 Same area in other venues: 🔬 ICLR2026 (10) · 💬 ACL2026 (26) · 🤖 AAAI2026 (5) · 🧠 NeurIPS2025 (8) · 🧪 ICML2025 (2) · 💬 ACL2025 (18)

🔥 Top topics: LLM ×2

Context-Driven Incremental Compression for Multi-Turn Dialogue Generation: Concatenating full histories in multi-turn dialogues is expensive and leads to lost clues. This paper proposes C-DIC: viewing dialogues as interleaved "topic threads," it stores revisable per-thread compressed states in a compact memory. Each turn follows a lightweight "Retrieval \(\to\) Revision \(\to\) Write-back" cycle, trained with retrieval-aware truncated backpropagation through time (ra-TBPTT), maintaining stable latency and perplexity over hundreds of turns.
DiscoverLLM: From Executing Intents to Discovering Them: DiscoverLLM formalizes the scenario where "the user has not clearly defined their goals" as a progressive discovery process within a hierarchical intent tree. By using a rewardable hierarchical user simulator, the model is trained to actively explore divergently when goals are unclear and converge for execution when they are clarified. On creative writing, technical writing, and SVG tasks, the method achieves a +10% improvement in satisfaction and a -40% reduction in dialogue length compared to baselines like CollabLLM.
From Self-Evolving Synthetic Data to Verifiable-Reward RL: Post-Training Multi-turn Interactive Tool-Using Agents: Addressing two major bottlenecks in post-training multi-turn interactive tool-using agents—expensive high-quality data and RL signal degradation from user simulation noise—the authors propose "AReaL-SEA," a self-evolving multi-agent data synthesis pipeline that generates executable verifiers as rewards. Combined with an RL recipe featuring user model SFT, large batches, and dynamic filtering GRPO, Qwen3-235B achieves a pass^1 of 73.0 in Airline and 98.3 in Telecom on τ²-bench, matching or exceeding Claude/Gemini/GPT-5.
Is Your LLM Overcharging You? Tokenization, Transparency, and Incentives: This paper models LLM-as-a-Service as a "principal-agent" problem, proving that current mainstream "pay-per-token" mechanisms naturally incentivize service providers to re-segment the same string into longer token sequences for overcharging. Furthermore, even if providers are forced to disclose next-token distributions, overcharging without detection remains NP-Hard rather than impossible—the authors provide a simple heuristic algorithm that increases reported tokens by up to 11.2% while maintaining plausibility. Finally, it is proven that the only additive pricing mechanism that eliminates this incentive is "linear pay-per-character."
Not All Prefills Are Equal: PPD Disaggregation for Multi-turn LLM Serving: This paper points out that traditional Prefill-Decode (PD) disaggregated architectures are significantly inefficient in multi-turn dialogue scenarios due to the repeated P→D recomputation and transmission of KV caches for each turn. It proposes PPD (Prefill-capable Decode), a dynamic routing system that allows decode nodes to decide whether to process Turn 2+ append-prefills locally based on SLO weights, reducing Turn 2+ TTFT by approximately 68%.