Skip to content

HumanLLM: Benchmarking and Improving LLM Anthropomorphism via Human Cognitive Patterns

Conference: ACL 2026 arXiv: 2601.10198 Code: GitHub Area: Role-Playing / Personality Simulation Keywords: Anthropomorphism, Cognitive Patterns, Multi-Pattern Dynamics, Role-Playing Agent, Psychological Modeling

TL;DR

HumanLLM models 244 psychological patterns (100 personality traits + 144 social cognitive patterns) as interacting causal forces rather than isolated labels, constructs 11,359 multi-pattern interaction scenarios, achieves \(r=0.90\) human alignment through dual-layer checklist evaluation, and HumanLLM-8B surpasses Qwen3-32B in multi-pattern dynamics at 4x fewer parameters.

Method

Key Designs

  1. Literature-Based Psychological Pattern Construction: Each pattern is backed by ~50 academic papers, structured into definition, core mechanism, and real-world manifestations.

  2. Multi-Pattern Interaction Scenario Generation: Each scenario contains 2-5 interacting patterns covering enhancement, conflict, and conditional modulation. Dialogues include three-dimensional expression: inner thoughts (brackets), physical behavior (parentheses), and verbal expression.

  3. Dual-Layer Checklist Evaluation: Pattern-level (12-15 universal behavioral indicators per pattern) + scenario-level (2-6 situation-specific behavioral expectations). Achieves \(r=0.90\) vs traditional holistic metrics' \(r=0.43\).

Key Experimental Results

Model IPE MPD
GPT-5 15.5 43.4
HumanLLM-8B 25.7 70.3
Qwen3-32B 26.0 65.8

Highlights & Insights

  • Modeling psychological patterns as "interacting causal forces" rather than "isolated labels" represents a conceptual breakthrough
  • Discovery of normative confusion: LLM judges equate social desirability with simulation accuracy; checklist methods effectively decouple the two

Rating

  • Novelty: ⭐⭐⭐⭐⭐
  • Experimental Thoroughness: ⭐⭐⭐⭐⭐
  • Writing Quality: ⭐⭐⭐⭐
  • Value: ⭐⭐⭐⭐⭐