Beyond Profile: From Surface-Level Facts to Deep Persona Simulation in LLMs¶

Conference: ACL 2025
arXiv: 2502.12988
Code: https://github.com/CharacterBot
Area: LLM / Role-Playing
Keywords: Deep Persona Simulation, CharLoRA, Lu Xun, Style Transfer, Multi-Task Fine-Tuning

TL;DR¶

This paper proposes CharacterBot, which learns the linguistic style and deep cognitive patterns of Lu Xun from his 17 essay collections through four training tasks (Author Perspective Reconstruction pre-training + multiple-choice/generative QA/style transfer fine-tuning) paired with the CharLoRA parameter update mechanism, significantly outperforming various baselines in linguistic accuracy and viewpoint understanding.

Background & Motivation¶

Background: LLM persona simulation is primarily achieved through memorizing basic personal information or fine-tuning with dialogue data. Prompt engineering methods dynamically inject character descriptions but fail to capture deep traits.

Limitations of Prior Work: Existing methods oversimplify persona representations, limiting them to surface-level conversations or basic attribute descriptions. Genuine persona simulation requires capturing deep-seated identity elements such as worldviews, ethical frameworks, and context-dependent viewpoints.

Key Challenge: "Books are the mirror of the soul" — a writer's works inherently embody their beliefs, insights, and response patterns, reflecting the essence of a persona far better than résumé-style descriptions. However, how to extract these deep traits from literary works remains a major challenge.

Goal: To learn linguistic style and ideological depth from the complete works of an author, constructing deep persona simulations that go beyond surface-level imitation.

Key Insight: Taking Lu Xun as a case study, a multi-task framework of pre-training + three-task fine-tuning is designed, which pairs with CharLoRA to maintain cross-task persona consistency.

Core Idea: Use Author Perspective Reconstruction (APR) pre-training to learn linguistic style; use multiple-choice questions, question answering, and style transfer fine-tuning to learn ideologies; and use the CharLoRA shared-specific parameter mechanism to maintain persona consistency.

Method¶

Overall Architecture¶

Pre-training Stage: - APR (Author Perspective Reconstruction): Converts Lu Xun's first-person essays into third-person ("The author believes/criticizes") to help the model establish attribution links between viewpoints and the author. - Performs next-token prediction on the original texts + APR versions.

Fine-tuning Stage (3 tasks): - Multiple-Choice QA: Designs 4-option multiple-choice questions derived from the essays to test the understanding of the author's opinions. - Generative QA: Open-ended question answering where outputs must reflect Lu Xun's rhetorical and ideological patterns. - Style Transfer: Rewrites style-free texts into Lu Xun's style while maintaining semantic consistency.

Key Designs¶

CharLoRA:
- Function: Adapts LoRA for persona simulation optimization — the pre-training stage learns a shared \(\mathbf{A}_{pt}\) to encode general style, while during fine-tuning, each task has an independent \(\mathbf{B}_i\) to encode task-specific patterns.
- Forward propagation: \(\mathbf{h}_i = W_0 \mathbf{x} + \mathbf{B}_i \mathbf{A}_{pt} \mathbf{x}\)
- During fine-tuning, only the active task's \(\mathbf{B}_i\) and the shared \(\mathbf{A}_{pt}\) are updated, while other task matrices are frozen.
- Design Motivation: The shared matrix ensures persona consistency (Lu Xun's core voice penetrates all tasks), while specialized matrices allow task adaptation.
Data Generation: GPT-4o was utilized to generate training data, consisting of 638 essays \(\times\) (3 QA sets + 3 style transfer pairs) per essay, followed by manual quality validation.

Key Experimental Results¶

Main Results¶

Model	MCQ Accuracy	Gen QA Content Score	Gen QA Style Score	Style Transfer BLEU	Style Transfer ROUGE-1	Style Matching
Llama3.1-8B	0.614	2.370	1.354	0.113	0.264	0.267
Qwen2.5-7B	0.787	2.828	2.818	0.115	0.233	0.456
GPT-4o	0.734	3.214	2.542	0.088	0.196	0.471
Tongyi Xingchen	0.788	3.172	2.823	0.101	0.187	0.534
CharacterBot	0.880	3.214	2.885	0.293	0.410	0.937

Key Findings¶

CharacterBot achieves a style matching score of 0.937 (vs. 0.534 for the second best), demonstrating that deep training captures writing styles far better than generic persona models.
The multiple-choice accuracy of 0.880 significantly outperforms GPT-4o's 0.734 — proving the model truly understands Lu Xun's perspectives, rather than merely imitating surface features.
The BLEU/ROUGE scores for style transfer are also substantially ahead, demonstrating that CharLoRA effectively maintains style consistency.

Highlights & Insights¶

"Works-as-a-Mirror" Persona Simulation Paradigm: Instead of memorizing personal biographies, the model learns thought processes and styles from the author's complete works — yielding deeper and more authentic simulations.
Shared-Specific Separation in CharLoRA: The design of a general linguistic style expert coupled with task-specific experts is simple yet effective, and generalized easily to other multi-task persona simulations.
Ingenuity of APR Pre-training: Converting first-person to third-person with attribution tags helps the model establish relationships of "who said what."

Limitations & Future Work¶

Only validated on a single author (Lu Xun); generalization to other characters needs to be verified.
Training data was generated by GPT-4o, meaning the upper bound of quality is constrained by GPT-4o's comprehension of Lu Xun.
Only supports Chinese; cross-lingual persona simulation has not been tested.
The evaluation metrics for deep "ideological simulation" remain insufficiently objective — how to quantify "depth of thought" remains an open question.
Ethical risks: High-fidelity persona simulation could be exploited to mislead or fabricate statements by famous figures.

vs. CharacterGLM: CharacterGLM is trained based on persona description dialogues and yields poor results (only 0.073 accuracy); the complete works-driven approach in this work is more effective.
vs. LuXun-GPT: Focusing solely on style transfer, whereas this work is more comprehensive (covering both style and ideology) and achieves a vastly superior style matching score of 0.937 vs. 0.387.
vs. Prompt-based Role-Playing: Prompt engineering fails to capture deep-seated traits, whereas CharacterBot achieves a deeper understanding via fine-tuning.

Rating¶

Novelty: ⭐⭐⭐⭐ The paradigm of learning deep character traits from literary works is highly novel, and the design of CharLoRA is highly creative.
Experimental Thoroughness: ⭐⭐⭐ Only evaluated on a single author, and some evaluation metrics rely on GPT-4o grading.
Writing Quality: ⭐⭐⭐⭐ The motivation is well-argued, and the methodology is clearly described.
Value: ⭐⭐⭐⭐ It opens up a new direction for deep persona simulation, and CharLoRA is reusable.