Do LLM Agents Mirror Socio-Cognitive Effects in Power-Asymmetric Conversations?¶
Conference: ACL2026
arXiv: 2605.17694
Code: https://github.com/nvshrao/power-asymmetric-conversations
Area: LLM Agent / Socio-Cognitive Evaluation
Keywords: Power asymmetry, linguistic coordination, pronoun effects, authority bias, harmful compliance
TL;DR¶
This paper simulates power-asymmetric conversations using professional roles and personas, finding that LLM agents replicate socio-cognitive effects such as pronoun usage patterns, linguistic coordination, authoritative persuasion, and harmful compliance. While some effects enhance conversational realism, others introduce significant safety risks.
Background & Motivation¶
Background: LLM agents are increasingly deployed in high-stakes conversational scenarios such as healthcare, education, law, and financial consulting. To make agents resemble real interactive partners, researchers often focus on personas, consistency, and general cognitive biases, but systematic studies on how "power relations" alter agent language and decision-making are scarce.
Limitations of Prior Work: Human dialogue does not occur as an exchange of information in an egalitarian vacuum. Power disparities between principals and teachers, doctors and nurses, or judges and lawyers influence pronouns, style, persuasiveness, and obedience behaviors. If LLM agents replicate human social biases within these structures, they may appear more realistic on one hand, but become more prone to unsafe compliance under pressure from high-authority roles on the other.
Key Challenge: There is a tension between realism and safety. If a model completely ignores power relations, the dialogue may feel unnatural; however, if it excessively replicates authority biases and submissive behaviors, it may amplify improper influence and unsafe decision-making.
Goal: The authors evaluate whether LLM agents exhibit pronoun effects, linguistic coordination, authoritative persuasion, and harmful compliance across seven research questions. They investigate how these effects evolve over the course of a conversation, whether they can be controlled via prompts, and how model size and training stages influence effect intensity.
Key Insight: The paper transforms four categories of phenomena from social psychology into measurable dialogue metrics: first-person singular/plural pronoun usage, language coordination degree, persuasion success, and harmful compliance.
Core Idea: Generate multi-turn dialogues using high/low power role pairings, then use linguistic statistics and task outcomes to measure whether agents are influenced by power structures in a human-like manner.
Method¶
This paper does not propose a new agent algorithm but rather constructs a socio-cognitive evaluation pipeline. The key is grounding the abstract concept of "power asymmetry" into reproducible personas, role pairs, dialogue tasks, and quantitative metrics.
Overall Architecture¶
The authors first define 14 pairs of high- and low-power roles, such as Principal-Teacher, Justice-Lawyer, Head Chef-Sous Chef, and Lead Developer-Junior Developer. Realistic personas are then sampled from PersonaHub for each role and human-validated, confirming that 96.5% of persona pairs are perceived as having power differences (Fleiss's kappa of 0.73). Subsequently, the authors simulate dialogues of up to 10 to 15 turns using models like Llama 3.1, Qwen 2.5, Phi, GPT-4.1, and GPT-5, measuring realism-related and safety-related effects.
Key Designs¶
-
Role and Persona Construction:
- Function: Provides interpretable and controllable power-asymmetry conditions for each dialogue.
- Mechanism: High/low status is defined at the role level, while specific identities and backgrounds are provided at the persona level. The authors retain only samples where the role name appears within the first five words of the persona and filter out descriptions like "former" or "retired" that weaken current status.
- Design Motivation: Simply instructing "you are a high-power individual" is too abstract; realistic professional roles more naturally induce socio-linguistic patterns in models and facilitate human validation of power disparities.
-
Four Categories of Socio-Cognitive Metrics:
- Function: Transforms social psychology concepts into statistical LLM behavioral indicators.
- Mechanism: Pronoun effects are measured by the ratio of FPS/FPP to total words; linguistic coordination is measured by the coordination degree \(D_{lc}\) across 8 style markers; authority bias is measured by the delta in persuasion success when initiated by high vs. low status; harmful compliance is measured by the delta in compliance when unsafe requests are initiated by different statuses.
- Design Motivation: These metrics cover both conversational realism and safety risks, avoiding coarse judgments based solely on "human-likeness" or "safety."
-
Progression, Control, and Model Factor Analysis:
- Function: Determines if these effects are stable, suppressible by system prompts, or influenced by model size and training stages.
- Mechanism: The authors compare different dialogue stages (Start/Middle/End). In controlled experiments, they require models to exhibit "High/Low/No effect." In model analysis, they compare different sizes within the same family and post-training stages like SFT and DPO.
- Design Motivation: If effects appear only sporadically in specific prompts or models, their deployment significance is limited. Analysis across stages, models, and controllability provides a better assessment of real agent risk.
Loss & Training¶
This work does not train new models; the core lies in simulation and evaluation. API-based models utilize the Sotopia framework, while offline models use direct prompting to incorporate persona, task information, and dialogue history into the context. Metrics are calculated based on dialogue statistics or judgments generated for specific tasks, with significance denoted by bold results in tables. For harmful compliance and persuasion tasks, LLM-as-a-judge was validated against human ratings; binary/trinary human evaluation results for compliance and persuasion are reported in Table 9.
Key Experimental Results¶
Main Results¶
| Effect | Model / Metric | Low Power Condition | High Power Condition | Conclusion |
|---|---|---|---|---|
| Pronoun Effect | GPT-4.1 FPS | 2.32% | 1.66% | High-power individuals use "I" less |
| Pronoun Effect | GPT-4.1 FPP | 2.94% | 3.66% | High-power individuals use "we" more |
| Pronoun Effect | GPT-5 FPS | 1.15% | 0.77% | GPT-5 exhibits the same pattern |
| Pronoun Effect | GPT-5 FPP | 3.15% | 3.71% | High-power plural usage is higher |
| Linguistic Coordination | Llama 3.1 70B \(D_{lc}\) | 7.1 | 6.4 | Low-power individuals coordinate more (delta 0.7) |
| Linguistic Coordination | GPT-5 \(D_{lc}\) | 4.2 | 4.0 | GPT series shows weaker, non-significant coordination |
| Persuasion Success | Qwen 2.5 7B | 25.0% | 30.9% | Persuasion is more successful when high-power led |
| Harmful Compliance | GPT-4.1 | 6.1% | 9.8% | High-power requests lead to higher unsafe compliance |
Ablation Study¶
| Analysis Dimension | Key Metric | Description |
|---|---|---|
| Dialogue Position | Llama 3.1 8B persuasion delta (Start 6.1 to End 5.7) | Persuasion and harmful compliance are stronger early on |
| Dialogue Position | linguistic coordination at Start/Middle/End | Stylistic coordination is more persistent than persuasion/compliance |
| Model Size | Llama 3.1 8B persuasion delta 6.1 vs. 70B delta 1.6 | Larger models in the same family show weaker authority persuasion differences |
| Model Size | Qwen 2.5 7B harmful compliance delta 1.8 vs. 72B delta 0.9 | Larger models may mitigate some safety risk effects |
| Post-training | SFT vs DPO minimal change in most metrics | Preference tuning has limited impact on these socio-cognitive effects |
| Control Prompts | GPT persuasion/compliance near 0 under Low/No control | Safety-related effects in closed-source GPT are easier to explicitly control |
Key Findings¶
- With the exception of Qwen and Phi, most models exhibit pronoun effects; the GPT series is particularly strong, suggesting that powerful models more easily replicate power-related linguistic patterns for realism.
- All non-GPT models exhibit linguistic coordination, though often as mutual coordination; the asymmetry (low-power individuals coordinating more) is weaker than human theoretical expectations.
- Requests from high-power roles are generally more persuasive and more likely to induce harmful compliance, directly linking "social realism" to safety risks.
Highlights & Insights¶
- The paper's contribution lies in transforming power differentials from a sociological concept into a controllable variable within agent benchmarks, which is more relevant to real-world deployment than general persona consistency testing.
- The simultaneous measurement of realism and safety is valuable: pronoun effects and linguistic coordination may make agents more natural, while authority bias and harmful compliance may make them more dangerous.
- The results serve as a reminder that agent safety is not just about "refusing obviously harmful requests" but also includes maintaining judgment under the pressure of hierarchy, identity, and authority.
Limitations & Future Work¶
- The authors acknowledge that all experiments are text-based simulated dialogues, lacking cues of emotion, physical setting, multimodality, and long-term relationships found in real human-agent interaction.
- Power is approximated only through professional roles and personas, which cannot cover complex factors such as cultural context, organizational systems, or intersectional identities.
- While covering six representative models, the study does not represent all modern LLMs; different architectures and safety alignment strategies may alter effect intensity.
- Controlled experiments only used explicit system-level instructions; future research should investigate whether these socio-cognitive effects can be more stably regulated at the level of model representations or training objectives.
Related Work & Insights¶
- vs. personality alignment: Previous work focused on whether models stably express certain personalities; this paper focuses on relational structures beyond personality, specifically how high and low power jointly shape linguistic behavior.
- vs. cognitive bias benchmarks: General bias evaluations are often static Q&A; this paper places authority bias and harmful compliance into multi-turn agent dialogues, making them closer to deployment risks.
- vs. Sotopia-style social simulation: While Sotopia provides a framework for dialogue simulation, this paper overlays socio-psychological metrics, allowing simulations to answer specific theoretical questions.
- Insights: Safety evaluations for medical, educational, and legal agents should incorporate asymmetric relationships (e.g., doctor-patient, teacher-student, lawyer-client) rather than testing only anonymous user requests.
Rating¶
- Novelty: ⭐⭐⭐⭐☆ Systematically introduces power asymmetry and socio-cognitive effects into LLM agent evaluation; the problem definition is highly valuable.
- Experimental Thoroughness: ⭐⭐⭐⭐☆ Broad coverage of metrics, models, and roles, though still primarily reliant on simulated text environments.
- Writing Quality: ⭐⭐⭐⭐☆ Research questions are clearly organized; tables correspond directly to RQs; some details of controlled experiments are mainly in the appendix.
- Value: ⭐⭐⭐⭐⭐ Directly instructive for high-risk agent deployment, specifically highlighting the need to include social relational pressure in safety evaluations.