HumanLLM: Benchmarking and Improving LLM Anthropomorphism via Human Cognitive Patterns¶
Conference: ACL 2026 arXiv: 2601.10198 Code: GitHub Area: Role-Playing / Personality Simulation Keywords: Anthropomorphism, Cognitive Patterns, Multi-Pattern Dynamics, Role-Playing Agent, Psychological Modeling
TL;DR¶
HumanLLM models 244 psychological patterns (100 personality traits + 144 social cognitive patterns) as interacting causal forces rather than isolated labels, constructs 11,359 multi-pattern interaction scenarios, achieves \(r=0.90\) human alignment through dual-layer checklist evaluation, and HumanLLM-8B surpasses Qwen3-32B in multi-pattern dynamics at 4x fewer parameters.
Method¶
Key Designs¶
-
Literature-Based Psychological Pattern Construction: Each pattern is backed by ~50 academic papers, structured into definition, core mechanism, and real-world manifestations.
-
Multi-Pattern Interaction Scenario Generation: Each scenario contains 2-5 interacting patterns covering enhancement, conflict, and conditional modulation. Dialogues include three-dimensional expression: inner thoughts (brackets), physical behavior (parentheses), and verbal expression.
-
Dual-Layer Checklist Evaluation: Pattern-level (12-15 universal behavioral indicators per pattern) + scenario-level (2-6 situation-specific behavioral expectations). Achieves \(r=0.90\) vs traditional holistic metrics' \(r=0.43\).
Key Experimental Results¶
| Model | IPE | MPD |
|---|---|---|
| GPT-5 | 15.5 | 43.4 |
| HumanLLM-8B | 25.7 | 70.3 |
| Qwen3-32B | 26.0 | 65.8 |
Highlights & Insights¶
- Modeling psychological patterns as "interacting causal forces" rather than "isolated labels" represents a conceptual breakthrough
- Discovery of normative confusion: LLM judges equate social desirability with simulation accuracy; checklist methods effectively decouple the two
Rating¶
- Novelty: ⭐⭐⭐⭐⭐
- Experimental Thoroughness: ⭐⭐⭐⭐⭐
- Writing Quality: ⭐⭐⭐⭐
- Value: ⭐⭐⭐⭐⭐