HumanLLM: Benchmarking and Improving LLM Anthropomorphism via Human Cognitive Patterns¶

Conference: ACL 2026 arXiv: 2601.10198 Code: GitHub Area: Role-Playing / Personality Simulation Keywords: Anthropomorphism, Cognitive Patterns, Multi-Pattern Dynamics, Role-Playing Agent, Psychological Modeling

TL;DR¶

HumanLLM models 244 psychological patterns (100 personality traits + 144 social cognitive patterns) as interacting causal forces rather than isolated labels, constructs 11,359 multi-pattern interaction scenarios, achieves \(r=0.90\) human alignment through dual-layer checklist evaluation, and HumanLLM-8B surpasses Qwen3-32B in multi-pattern dynamics at 4x fewer parameters.

Method¶

Key Designs¶

Literature-Based Psychological Pattern Construction: Each pattern is backed by ~50 academic papers, structured into definition, core mechanism, and real-world manifestations.
Multi-Pattern Interaction Scenario Generation: Each scenario contains 2-5 interacting patterns covering enhancement, conflict, and conditional modulation. Dialogues include three-dimensional expression: inner thoughts (brackets), physical behavior (parentheses), and verbal expression.
Dual-Layer Checklist Evaluation: Pattern-level (12-15 universal behavioral indicators per pattern) + scenario-level (2-6 situation-specific behavioral expectations). Achieves \(r=0.90\) vs traditional holistic metrics' \(r=0.43\).

Key Experimental Results¶

Model	IPE	MPD
GPT-5	15.5	43.4
HumanLLM-8B	25.7	70.3
Qwen3-32B	26.0	65.8

Highlights & Insights¶

Modeling psychological patterns as "interacting causal forces" rather than "isolated labels" represents a conceptual breakthrough
Discovery of normative confusion: LLM judges equate social desirability with simulation accuracy; checklist methods effectively decouple the two

Rating¶

Novelty: ⭐⭐⭐⭐⭐
Experimental Thoroughness: ⭐⭐⭐⭐⭐
Writing Quality: ⭐⭐⭐⭐
Value: ⭐⭐⭐⭐⭐