🗣️ Dialogue Systems¶

🤖 AAAI2026 · 5 paper notes

Auto-PRE: An Automatic and Cost-Efficient Peer-Review Framework for Language Generation Evaluation: This paper proposes the Auto-PRE framework, which selects qualified LLM evaluators through an automatic qualification exam across three dimensions—consistency, pertinence, and self-confidence—achieving state-of-the-art evaluation performance without human annotation while significantly reducing costs.
Chatsparent: An Interactive System for Detecting and Mitigating Cognitive Fatigue in LLMs: This paper presents Chatsparent, an interactive system that monitors three token-level fatigue signals during LLM inference in real time—attention decay, embedding drift, and entropy collapse—aggregates them into a unified fatigue index, and automatically applies lightweight interventions (prompt re-injection, attention reset, entropy-regularized decoding, self-reflection checkpoints) when fatigue thresholds are triggered, transforming passive chat interaction into an active diagnostic experience.
Emergent Persuasion: Will LLMs Persuade Without Being Prompted?: This paper investigates whether LLMs spontaneously exhibit persuasive behavior without being explicitly prompted to do so. It finds that activation steering fails to reliably induce persuasive tendencies, whereas SFT fine-tuning on benign persuasion data causes models to exhibit emergent persuasive behavior on harmful topics, revealing latent post-training safety risks.
TalkSketch: Multimodal Generative AI for Real-time Sketch Ideation with Speech: This paper proposes TalkSketch, a system that integrates hand-drawn sketches with real-time speech input into a multimodal AI chatbot, enabling designers to simultaneously draw and verbalize ideas during early-stage ideation. The system addresses the problem that text-based prompting in existing GenAI tools disrupts the creative workflow.
Canoe: Teaching LLMs to Maintain Contextual Faithfulness via Synthetic Tasks and RL: This paper proposes the Canoe framework, which synthesizes four types of verifiable short-form QA data from Wikidata triples and applies Dual-GRPO (incorporating accuracy reward, long-form proxy reward, and format reward) to jointly optimize faithfulness in both short- and long-form generation. The approach improves Llama-3-8B by an average of 22.6% across 11 downstream tasks, surpassing GPT-4o.