Consistent Client Simulation for Motivational Interviewing-based Counseling¶
Conference: ACL 2025
arXiv: 2502.02802
Code: None
Authors: Yizhe Yang, Palakorn Achananuparp, Heyan Huang, Jing Jiang, John Pinto, Jenny Giam, Kit Phey Leng, Nicholas Gabriel Lim, Cameron Tan Shi Ern, Ee-peng Lim
Affiliations: Beijing Institute of Technology, Singapore Management University, Australian National University, etc.
Area: Other
Keywords: Client Simulation, Motivational Interviewing, Consistency, State Tracking, Action Selection, LLM Agent
TL;DR¶
This paper proposes a consistent client simulation framework for motivational interviewing (MI) psychological counseling. Through four modules—state transition, action selection, information selection, and response generation—it ensures that the behavior of simulated clients aligns with their predefined profiles (motivations, beliefs, change plans, receptivity), outperforming baseline methods in both automatic and expert evaluations.
Background & Motivation¶
Background: The training of psychological counselors traditionally requires the participation of human clients. To reduce costs, researchers have begun using LLM-simulated client agents. However, existing methods primarily focus on evaluating counselor agents, neglecting the consistency and quality of the client simulation.
Limitations of Prior Work: - Simple profile prompting (Yosef et al., 2024; Wang et al., 2024a) or conversational example prompting (Chiu et al., 2024) leads to four types of inconsistency: - (a) Motivational Inconsistency: The client's reasons for agreeing to change do not align with their predefined motivations. - (b) Belief Inconsistency: The client fails to adhere to their predefined beliefs. - (c) Plan Inconsistency: The client accepts a change plan that does not exist in their profile. - (d) Receptivity Inconsistency: The client's level of cooperation with the counselor does not match their predefined settings. - LLMs (such as ChatGPT) possess an inherent tendency to generate highly compliant responses (a side effect of safety/alignment), leading to overly-compliant simulated clients with monotonous behaviors.
Design Motivation: There is a need for fine-grained control over the state transitions and action selections of simulated clients across different stages of counseling, ensuring their profiles and behaviors remain highly consistent with real clients.
Method¶
Overall Architecture¶
The framework comprises four core modules that are executed sequentially in each dialogue turn:
- State Transition Module -> Determines the client's next state.
- Action Selection Module -> Selects the type of client action.
- Information Selection Module -> Selects the profile information to disclose.
- Response Generation Module -> Generates the client's utterance.
Client Profile Definition¶
The client profile consists of the following elements:
| Element | Description |
|---|---|
| Behavioral Issue | The behavioral problem the client needs to address |
| Initial/Terminal State | Psychological state before and after counseling |
| Persona | Client background information |
| Motivation | Specific reasons driving the change |
| Beliefs | Beliefs that hinder the change |
| Plans | Specific behavioral changes the client may agree to |
| Receptivity | The level of cooperation with the counselor, rated 1-5 |
Key Design 1: State Transition¶
Based on the Transtheoretical Model, three core states plus termination are defined:
- Precontemplation -> Denying the issue. Transition to Contemplation occurs if the counselor mentions the motivations in the client profile.
- Contemplation -> Recognizing the issue but hesitating. Transition to Preparation occurs if the belief barriers in the profile are adequately addressed.
- Preparation -> Beginning to plan changes. Transition to Termination occurs if the preferred change plan has been discussed with the counselor.
- Termination -> Ending the session.
State transitions strictly depend on whether the counselor addresses key content in the client profile, thereby ensuring consistency.
Key Design 2: Action Selection¶
A fusion strategy combining two action distributions is introduced:
- Context-aware distribution: Infers an appropriate action distribution based on the current dialogue context using the LLM.
- (State, Receptivity)-aware distribution: Learns the action distribution for each (state, receptivity) combination from the real AnnoMI dataset.
The final action distribution is the average of the two, from which sampling is performed among candidate actions relevant to the current state.
Key Design 3: Information Selection¶
Actions are categorized into two types: - Type-1 (e.g., Deny, Engage, Accept): Generated without requiring profile information. - Type-2 (e.g., Inform, Blame, Hesitate, Plan): Requires selecting relevant information from the profile.
The information selection module prevents the client from disclosing too much profile information at once, avoiding unrealistically shortened counseling sessions.
Data Annotation¶
Based on the AnnoMI dataset (consisting of real MI counseling dialogues), 86 clients and sessions were selected: - GPT-4 was leveraged to annotate the four profile elements, states, actions, and receptivity. - Quality verification of annotations: 87.31% accuracy for states, 85.20% for actions, and 80.32% for receptivity. - All profile items were manually verified for factual accuracy.
Key Experimental Results¶
Experimental Setup¶
- LLM backbone model: gpt-3.5-turbo-0125
- Counselor agent: LLM agent based on MI knowledge prompting
- Facilitator agent: Monitors dialogue termination conditions
- 3 sessions generated per client profile -> 258 generated sessions in total
Baseline Methods¶
- Base: Simple prompt containing only the behavioral issue.
- Example-based (Chiu et al., 2024): Prompts containing real session examples.
- Profile-based (Yosef et al., 2024): Prompts containing client profiles.
- Pro+Act-based (Zhang et al., 2024): Profiles + action descriptions.
Profile Consistency Evaluation (Automatic Evaluation, GPT-4 Entailment Decision)¶
| Method | Persona↑ | Motivation↑ | Belief↑ | Plan↑ | Receptivity↑ |
|---|---|---|---|---|---|
| Base | 9.01 | 16.17 | 12.15 | 9.30 | −0.31 |
| Example-based | 53.68 | 45.73 | 45.55 | 33.53 | 0.25 |
| Profile-based | 61.97 | 53.44 | 67.17 | 54.67 | 0.31 |
| Pro+Act-based | 67.09 | 55.33 | 68.60 | 57.17 | 0.33 |
| Ours | 70.57 | 73.37 | 71.70 | 68.51 | 0.58 |
Ours achieves the best results across all five consistency dimensions.
Dialogue Behavior Analysis¶
| Method | Mean Receptivity | Motivation Rate (20 steps) | Mean Motivation Turn | Action KL↓ |
|---|---|---|---|---|
| Base | 4.42±0.47 | 1.00 | 6.60 | 0.39 |
| Profile-based | 4.12±0.64 | 0.96 | 9.76 | 0.15 |
| Pro+Act-based | 3.86±1.01 | 0.94 | 9.93 | 0.13 |
| Ours | 3.32±1.15 | 0.69 | 18.60 | 0.06 |
| Real Client | 3.27±1.12 | 0.48 | 27.56 | 0.00 |
Ours demonstrates behavioral distributions (receptivity, motivation rate, action KL divergence) closest to real clients.
Expert Evaluation (1-5 scale, 6 clients × 4 profile dimensions × 3 annotators)¶
| Method | Persona | Belief | Motivation | Plan | Realism |
|---|---|---|---|---|---|
| Profile-based | 2.61 | 2.00 | 2.61 | 1.56 | 2.38 |
| Pro+Act-based | 2.65 | 2.22 | 2.78 | 1.56 | 2.50 |
| Ours | 3.33 | 2.89 | 3.00 | 2.27 | 3.16 |
| Real Client | 4.72 | 4.67 | 4.56 | 4.61 | 4.72 |
Receptivity Control Evaluation¶
| Method | Receptivity=1 | Receptivity=3 | Receptivity=5 | Correlation Coeff. |
|---|---|---|---|---|
| Pro+Act-based | 5.0 | 4.3 | 5.0 | 0.00 |
| Ours | 1.3 | 3.0 | 4.3 | 0.86 |
Ours effectively controls client behavior across different receptivity levels (Spearman correlation 0.86, p=0.0003), whereas baseline methods almost completely fail to differentiate receptivity settings.
Highlights & Insights¶
- First to systematically define consistency dimensions (motivation, beliefs, plans, receptivity) for psychology client simulation.
- Exquisite state transition design: Embedding the Transtheoretical Model of MI into a state machine, directly binding state transitions to profile content.
- Controllable receptivity is an important practical feature—training counselors requires interacting with clients of varying difficulty levels.
- Action distribution fusion (context-aware + data-driven) balances dialogue fluency and profile consistency.
- Identification of the over-compliance issue in baseline methods, revealing the side effects of LLM alignment in role-playing scenarios.
Limitations & Future Work¶
- Error accumulation in multi-step prompting: The framework relies on sequential LLM prompts, exhibiting high sensitivity to prompt design.
- Exclusively focused on client simulation, without optimizing or evaluating counselor agents.
- Evaluated only on the AnnoMI dataset, restricted to the single counseling methodology of MI.
- Significant gap remains between simulated and real clients (expert evaluation score of 3.16 vs 4.72).
- Relatively short conversation length: Baseline methods terminate rapidly due to over-compliance, whereas the proposed method can cause the counselor to give up early due to rigid state control.
Related Work¶
- Motivational Interviewing: Miller & Rollnick (2012), Transtheoretical Model (Prochaska & Velicer, 1997)
- Client Agent Simulation: PATIENT-Ψ (Wang et al., 2024b), State-Aware Patient Simulator (Liao et al., 2024)
- LLM Dialogue: Chiu et al. (2024) computational framework evaluating LLM counselors
Rating¶
⭐⭐⭐⭐ (4/5)
This work presents an in-depth study in the niche area of mental health client simulation. The multi-dimensional definition of consistency is highly valuable, and controllable receptivity is a notable highlight. While the evaluation is thorough (automatic + expert), the experiments are limited to a single dataset and counseling methodology, and the gap between simulated and real clients has yet to be bridged.