🗣️ Dialogue Systems¶

💬 ACL2025 · 18 paper notes

📌 Same area in other venues: 🔬 ICLR2026 (10) · 💬 ACL2026 (26) · 🧪 ICML2026 (5) · 🤖 AAAI2026 (5) · 🧠 NeurIPS2025 (8) · 🧪 ICML2025 (2)

🔥 Top topics: Dialogue ×16 · Sentiment Analysis ×3 · Personalized Generation ×2

DEMO: Reframing Dialogue Interaction with Fine-grained Element Modeling: This paper proposes dialogue element modeling (DEMO), a novel task that systematically defines a comprehensive element taxonomy across the dialogue lifecycle from "prelude" to "epilogue." Based on this, the authors construct the DEMO benchmark covering both element awareness and dialogue agent interaction capabilities, and train DEMO agents using imitation learning, achieving superior performance on both in-domain and out-of-domain tasks.
Dialogue Systems for Emotional Support via Value Reinforcement: This paper proposes ES-VR, the first method that integrates human value reinforcement into emotional support dialogue systems. By leveraging a target value detector and a reference generator (both trained on Reddit data), combined with a two-stage SFT + DPO training scheme, the supporter model not only alleviates the seeker's negative emotions but also explores and reinforces their positive values, achieving a deeper, internal transformation.
Dynamic Label Name Refinement for Few-Shot Dialogue Intent Classification: Proposes a dynamic label name refinement method that utilizes LLMs to dynamically generate more distinctive intent label names (e.g., "Verify PAN" → "Verify PAN card details") based on retrieved examples in retrieval-based ICL intent classification. This effectively reduces confusion between semantically similar intents, consistently improving accuracy by 2.07%-7.51% across 6 datasets.
Enabling Chatbots with Eyes and Ears: An Immersive Multimodal Conversation System: This paper proposes an immersive multimodal conversation system that endows chatbots with "eyes and ears." It constructs the M3C dataset, a multi-session multi-party dialogue dataset integrating vision and audio, and designs a dialogue model consisting of a dialogue module and a multimodal memory retrieval module, enabling dynamic, long-term conversations where multiple speakers share audiovisual experiences.
Enhancing Goal-oriented Proactive Dialogue Systems via Consistency Reflection and Correction: A model-agnostic two-stage CRC framework (Consistency Reflection & Correction) is proposed. By first prompting the model to reflect on inconsistencies between the generated response and the dialogue context and then correcting the response accordingly, it significantly improves the consistency of generated responses with the dialogue context in goal-oriented proactive dialogue systems.
EnSToM: Enhancing Dialogue Systems with Entropy-Scaled Steering Vectors for Topic Maintenance: EnSToM is proposed as a lightweight method based on entropy-scaled steering vectors, which dynamically adjusts steering intensity by leveraging the differences in internal layer entropy distributions of LLMs to enhance the topic maintenance capability of task-oriented dialogue systems without modifying model parameters.
Know You First and Be You Better: Modeling Human-Like User Simulators via Implicit Profiles: This paper proposes the USP (User Simulator with Implicit Profiles) framework. By extracting implicit user profiles from human-machine dialogues and combining conditional supervised fine-tuning with cycle-consistency-based reinforcement learning, USP significantly outperforms baseline methods across three dimensions: authenticity, consistency, and diversity, improving semantic similarity and style similarity by approximately 34% and 43%, respectively.
Know Your Mistakes: Towards Preventing Overreliance on Task-Oriented Conversational AI Through Accountability Modeling: This paper proposes an Accountability Model for task-oriented dialogue systems, which integrates an additional accountability head as a binary classifier into LLMs to predict the probability of each slot in dialogue states. This enables the detection and self-correction of false positive and false negative errors, improving JGA from 64.34 to 70.51 (↑9.6%) on MultiWOZ and achieving SOTA.
KokoroChat: A Japanese Psychological Counseling Dialogue Dataset Collected via Role-Playing by Trained Counselors: This paper proposes KokoroChat, a Japanese psychological counseling dialogue dataset collected via role-playing by trained counselors, consisting of 6,589 long sessions and detailed client feedback ratings, designed to enhance the counseling response generation and dialogue evaluation capabilities of LLMs.
Exploring Persona Sentiment Sensitivity in Personalized Dialogue Generation: Large-scale analysis reveals that the quality of personalized dialogues generated by LLMs is highly sensitive to the sentiment polarity of user personas—negative personas lead to an overemphasis on persona traits that triggers contradictions, whereas positive personas generate higher-quality dialogues through selective persona integration. Based on these insights, mitigation strategies combining turn-by-turn generation, persona ranking, and sentiment-aware prompting are proposed.
PersonaLens: A Benchmark for Personalization Evaluation in Conversational AI Assistants: This paper introduces PersonaLens, a comprehensive evaluation benchmark for the personalization capabilities of task-oriented AI assistants. It features 1,500 rich user personas, 111 tasks across 20 domains, a user simulator agent, and a judge agent. Through large-scale automated evaluation, it reveals significant deficiencies in the personalization capabilities of current LLM assistants.
ReflectDiffu: Reflect between Emotion-intent Contagion and Mimicry for Empathetic Response Generation via a RL-Diffusion Framework: Proposes a lightweight empathetic dialogue framework called ReflectDiffu, which integrates emotion contagion (capturing emotion), an intent twice mechanism (Exploring-Sampling-Correcting to map emotion to behavioral intent), and diffusion model generation, comprehensively outperforming existing baselines and Llama-3.1-8B in terms of relevance, controllability, and informativeness.
SHARE: Shared Memory-Aware Open-Domain Long-Term Dialogue Dataset Constructed from Movie Script: This paper proposes SHARE, a long-term dialogue dataset constructed from movie scripts, introducing the concept of "shared memory" for the first time. It also designs the EPISODE dialogue framework to manage personal information, personal events, and shared memories, making long-term dialogues more intimate and engaging.
Single- vs. Dual-Prompt Dialogue Generation with LLMs for Job Interviews in Human Resources: This paper systematically compares two strategies for generating job interview dialogues using LLMs—single-prompt (generating the entire dialogue at once) and dual-prompt (two agents role-playing as interviewer and candidate to talk turn-by-turn). It finds that dialogues generated by the dual-prompt method achieve a win rate 2 to 10 times higher than that of the single-prompt method in terms of naturalness, albeit at approximately 6 times the token cost.
Sparse Rewards Can Self-Train Dialogue Agents: This paper proposes JOSH (Juxtaposed Outcomes for Simulation Harvesting), a self-alignment algorithm that enables LLM dialogue agents to autonomously improve their performance through simulated environments with sparse rewards without external human feedback. Additionally, the ToolWOZ sparse-reward tool-calling simulation environment is constructed for validation.
UniConv: Unifying Retrieval and Response Generation for Large Language Models in Conversations: This work explores how to unify dense retrieval and response generation in conversational scenarios into a single LLM. Through three joint training objectives (conversational retrieval, response generation, and context identification instruction) and a data discrepancy mitigation mechanism, it achieves mutual reinforcement of retrieval and generation across five conversational search datasets, outperforming pipeline-based baselines.
When Harry Meets Superman: The Role of The Interlocutor in Persona-Based Dialogue Generation: This paper systematically investigates the impact of interlocutor information on target speaker generation quality in persona-based dialogue. Through an evaluation framework that masks/reveals interlocutor information, the study discovers that models effectively adapt to the interlocutor's persona, show weaker generalization to unfamiliar interlocutors than to unfamiliar topics, and that LLMs tend to "copy-paste" persona details in zero-shot settings.
Wizard of Shopping: Target-Oriented E-commerce Dialogue Generation with Decision Tree Branching: This paper proposes TRACER, a method that leverages decision tree models to plan dialogue paths, guiding two LLM agents (customer and seller) to generate natural and target-oriented e-commerce shopping dialogues. It also releases the Wizard of Shopping (WoS) dataset containing 3,600 dialogues, validating the effectiveness of the dataset on two downstream tasks: conversational query generation and product ranking.