🗣️ Dialogue Systems¶
🧪 ICML2026 · 4 paper notes
📌 Same area in other venues: 💬 ACL2026 (27) · 📷 CVPR2026 (1) · 🔬 ICLR2026 (5) · 🤖 AAAI2026 (5) · 🧠 NeurIPS2025 (5)
🔥 Top topics: LLM ×2
- DiscoverLLM: From Executing Intents to Discovering Them
-
DiscoverLLM formalizes the scenario where "users do not know exactly what they want" as a progressive discovery process within a hierarchical intent tree. By employing a rewardable hierarchical user simulator to train the model, it encourages active divergent exploration when intents are unclear and convergent execution when they are clear. On three tasks—Creative Writing, Technical Writing, and SVG—it improves satisfaction by +10% and reduces dialogue length by -40% compared to baselines like CollabLLM.
- From Self-Evolving Synthetic Data to Verifiable-Reward RL: Post-Training Multi-turn Interactive Tool-Using Agents
-
Addressing two major bottlenecks in post-training "multi-turn interactive tool-calling agents"—expensive high-quality data and RL signal corruption due to user simulation noise—the authors propose "Self-Evolving Multi-Agent Data Synthesis (AReaL-SEA)" coupled with executable verifiers as rewards. Combined with an RL recipe featuring "SFT-first user models + large batch + dynamic filtering GRPO," Qwen3-235B achieves pass^1 rates of 73.0 (Airline) and 98.3 (Telecom) on \(\tau^2\)-bench, matching or exceeding Claude/Gemini/GPT-5.
- Is Your LLM Overcharging You? Tokenization, Transparency, and Incentives
-
This paper models LLM-as-a-Service as a "principal-agent" problem, proving that current mainstream "pay-per-token" mechanisms naturally incentivize service providers to re-tokenize the same string into longer token sequences to overcharge users. Furthermore, even if providers are forced to disclose the next-token distribution, overcharging without being detected remains NP-Hard rather than impossible—the authors provide a simple heuristic algorithm that increases billed tokens by up to 11.2% while maintaining plausibility. Finally, the authors prove that the only additive pricing mechanism that eliminates this incentive is "linear pay-per-character billing."
- Not All Prefills Are Equal: PPD Disaggregation for Multi-turn LLM Serving
-
This paper points out that traditional Prefill-Decode (PD) disaggregated architectures are significantly inefficient in multi-turn dialogue scenarios because they require KV recomputation and transmission for every turn. It proposes PPD (Prefill-capable Decode), a dynamic routing system that allows decode nodes to decide whether to process Turn 2+ append-prefills locally based on SLO weights, reducing Turn 2+ TTFT by approximately 68%.