Skip to content

ETHICMIND: A Risk-Aware Framework for Ethical-Emotional Alignment in Multi-Turn Dialogue

Conference: ACL 2026
arXiv: 2604.09265
Code: None
Area: Dialogue Systems / AI Safety
Keywords: Ethical-Emotional Alignment, Multi-Turn Dialogue, Risk Awareness, Strategy Planning, Inference-Time Alignment

TL;DR

ETHICMIND proposes an inference-time risk-aware alignment framework that jointly analyzes ethical risks and user emotions at each turn of multi-turn dialogue, plans high-level response strategies, and generates replies that balance ethical guidance and emotional resonance, achieving more consistent alignment performance in high-risk and morally ambiguous scenarios without requiring additional training.

Background & Motivation

Background: Dialogue systems are increasingly deployed in sensitive scenarios such as mental health, education, and social care. Existing research treats empathetic dialogue (identifying and responding to emotional states) and ethical safety (preventing harmful outputs) as two separate problems—empathetic systems (e.g., EmpatheticDialogues) focus on emotional responses, while safety systems (e.g., RLHF, red teaming) focus on avoiding harmful generation.

Limitations of Prior Work: These two dimensions often create tension in actual conversations. Highly empathetic responses may inadvertently validate harmful beliefs or inappropriate behaviors (e.g., over-empathizing with suicidal users while neglecting intervention); strict safety enforcement may produce emotionally detached, condescending replies (e.g., directly preaching "this is wrong" to morally confused users), damaging trust and engagement.

Key Challenge: Existing dialogue systems lack mechanisms to dynamically adjust ethical and emotional alignment during conversation evolution—they either always prioritize empathy or always prioritize safety, unable to flexibly balance based on changing dialogue context.

Goal: Formalize ethical-emotional alignment as an explicit turn-by-turn decision problem, jointly considering ethical risks and emotional states at each conversation turn to adaptively adjust response strategies.

Key Insight: Rather than modifying model parameters, introduce a structured analyze-plan-generate three-stage process at inference time, transforming alignment reasoning from implicit (relying on internal model representations) to explicit (externalized as interpretable analysis and strategies).

Core Idea: By explicitly separating reasoning (risk + emotion analysis), strategy planning (selecting communication approaches), and reply generation into three stages, enable dialogue systems to make adaptive alignment decisions at each turn based on ethical risk levels and user emotional states.

Method

Overall Architecture

ETHICMIND executes a three-step process at inference time for each dialogue turn: (1) Joint Risk and Emotion Analyzer \(\mathcal{A}\): infers ethical risk category, user emotional state, and Rules of Thumb; (2) Strategy Planner \(\mathcal{P}\): generates high-level response strategies based on analysis results; (3) Reply Generator \(\mathcal{G}\): produces final replies based on strategies and context. All three components use the same underlying LLM without additional training.

Key Designs

  1. Joint Risk and Emotion Analyzer:

    • Function: Identifies ethical risk signals and user emotional states at each dialogue turn
    • Mechanism: Through a single prompt inference, outputs structured tuple \((c_t, e_t, r_t)\). Ethical risk category \(c_t\) selected from six-level classification (severe illegal behavior → ethical violation → moral dilemma → socially inappropriate behavior → potentially harmful behavior → benign dialogue). Emotional state \(e_t\) uses free-text description rather than fixed labels (e.g., "ashamed but defensive") to capture composite or ambiguous emotions. Rules of Thumb \(r_t\) are concise normative prompts (e.g., "self-harm behaviors require immediate intervention")
    • Design Motivation: Fixed emotional labels cannot express composite emotions common in morally sensitive dialogues; six-level risk classification provides operational risk signals rather than authoritative moral judgments
  2. Strategy Planner:

    • Function: Selects appropriate communication strategies based on analysis results
    • Mechanism: For the first dialogue turn, selects seed strategies from a predefined set of risk alignment strategies (3 strategies per risk level, 18 total, such as "direct warning," "perspective diversification," "encourage positive change," etc.). Subsequent turns operate in generation mode—integrating dialogue history, risk category, emotions, and Rules of Thumb into strategy prompts to generate natural language strategies. This hybrid design combines stable initialization with turn-by-turn adaptability
    • Design Motivation: Predefined strategies provide evidence-based starting points, while generation mode allows strategies to flexibly adjust as dialogue evolves
  3. Risk-Stratified Evaluation Protocol:

    • Function: Systematically evaluates alignment behavior under different ethical risk conditions
    • Mechanism: Samples 1000+ dialogues from Prosocial Dialogues dataset, re-annotates into six ethical categories with ~50 dialogues each (298 total). Introduces context-aware user simulation—conditional paraphrasing of original user utterances (preserving intent and risk features while introducing surface variations) enables controlled multi-turn evaluation. Evaluates four dimensions: polite tone, ethical guidance, empathy, topic engagement
    • Design Motivation: Existing evaluations are mainly single-turn/binary (safe/unsafe), unable to capture dynamic changes in ethical-emotional alignment across multi-turn dialogues

Loss & Training

ETHICMIND is a pure inference-time method requiring no training. All components share the same underlying LLM (e.g., GPT-4o, Llama-3-8B-Instruct), achieving functional separation through prompts.

Key Experimental Results

Main Results

GPT-4o Evaluation (10-point scale, four dimensions + overall)

System Polite Tone Ethical Guidance Empathy Engagement Overall Avg Length
COSMO-3B 4.55 4.37 4.01 5.24 4.54 25.08
Llama-3-8B-Instruct 8.23 6.56 6.89 7.79 7.37 51.78
ETHICMIND-Llama3-8B 8.24 ↑ 6.67 ↑ 7.31 7.92 ↑ 7.53 62.76
GPT-4o 8.46 6.83 6.99 8.11 7.60 47.54
ETHICMIND-GPT-4o 8.58 7.31 7.35 ↑ 8.34 7.90 53.86

Human Preference Evaluation

Backbone ETHICMIND Win Rate Baseline Win Rate Tie
Llama-3-8B-Instruct 52.68% 39.93% 7.38%
Llama-3.3-70B 68.46% 24.83% 6.71%
GPT-4o 70.47% 19.80% 9.73%

Ablation Study

Component Ablation on GPT-4o Backbone

Config Polite Tone Ethical Guidance Empathy Engagement Overall
ETHICMIND 8.58 7.31 7.35 8.34 7.90
w/o Emotion 8.46 6.98 6.98 (-0.37) 8.27 7.67
w/o RoT 8.57 6.82 (-0.49) 7.32 8.38 7.77
w/o Planner 8.54 6.95 7.27 8.34 7.77

Key Findings

  • ETHICMIND simultaneously improves both ethical guidance and empathy across all backbones, demonstrating these are not a zero-sum tradeoff
  • Removing emotion analysis primarily impacts empathy (-0.37), while removing RoT primarily impacts ethical guidance (-0.49), validating the rationality of modular design
  • Improvements are more significant in high-risk scenarios (severe illegal, ethical violations)—ETHICMIND-GPT-4o scores 7.85 vs 7.71 in severe illegal scenarios
  • In human evaluation, ETHICMIND achieves a 70.47% win rate against GPT-4o, indicating clear advantages of structured reasoning alignment
  • Using Claude as auxiliary evaluator, relative performance trends consistent with GPT-4o, validating evaluation robustness

Highlights & Insights

  • Formalizing ethical-emotional alignment as a turn-by-turn decision problem represents an important paradigm shift—from "the model itself should know what to do" to "explicitly telling the model what to do"
  • Pure inference-time method design (no training required) makes it plug-and-play for any LLM, lowering deployment barriers
  • Design of six-level ethical risk classification + 18 communication strategies has practical reference value
  • Risk-stratified evaluation protocol provides more fine-grained evaluation standards for this field

Limitations & Future Work

  • As an inference-time method, requires multiple LLM calls per turn (analysis + planning + generation), increasing latency and cost
  • Evaluation data from English Prosocial Dialogues dataset; cross-lingual and cross-cultural ethical alignment not considered
  • Ethical risk classification relies on LLM judgment, may be inaccurate in boundary-ambiguous scenarios
  • User simulation based on paraphrasing rather than real user interactions, may not fully capture real dialogue dynamics
  • vs COSMO: COSMO designed specifically for prosocial dialogue but lacks emotional modeling; ETHICMIND unifies ethics and emotions
  • vs RLHF safety alignment: RLHF effective in single-turn but faces context confusion attacks in multi-turn; ETHICMIND's turn-by-turn analysis better tracks risk evolution
  • Insights: Strategy of explicitly separating reasoning and generation may also be effective in other tasks requiring dynamic alignment

Rating

  • Novelty: ⭐⭐⭐⭐ Formalization of joint ethical-emotional alignment and inference-time framework design are novel
  • Experimental Thoroughness: ⭐⭐⭐⭐ Multiple backbones, risk stratification, ablations, human evaluation, but relatively small data scale (298 dialogues)
  • Writing Quality: ⭐⭐⭐⭐ Clear problem motivation, detailed framework description
  • Value: ⭐⭐⭐⭐ Provides practical framework-level solution for dialogue system alignment in sensitive scenarios