Classifying Unreliable Narrators with Large Language Models¶

Metadata	Content
Conference	ACL 2025
arXiv	2506.10231
Code	GitHub
Area	NLP / Text Classification / Narrative Analysis
Keywords	unreliable narrators, narratology, LLM classification, curriculum learning, expert-annotated dataset

TL;DR¶

Drawing on literary narratology theory, this work defines three different levels of unreliable narrators (intra-narrational / inter-narrational / inter-textual), constructs an expert-annotated dataset TUNa, and systematically evaluates the performance of LLMs on the task of classifying unreliable narrators.

Background & Motivation¶

Problem Definition: In daily life, first-person narratives (reviews, social media, cover letters, etc.) are frequently encountered, and judging whether the narrator is reliable is a critical issue for secure information transmission. An unreliable narrator refers to a narrator who unintentionally misleads readers (distinct from intentional deception).
Limitations of Prior Work: Previously, no work utilized automated methods to analyze narrator unreliability, and no annotated datasets were available.
Narratological Foundation: The authors draw on Hansen's (2007) taxonomic framework of narratology, dividing unreliability into three levels that increase progressively from concrete to abstract:
- (1) Intra-narrational: The narrator exhibits "verbal tics," such as hedging language, selective memory, admission of bias, and other textual cues.
- (2) Inter-narrational: A second voice contradicts the narrator (e.g., refutations in others' dialogue), or the narrator exhibits consistent unreliability over past and present.
- (3) Inter-textual: The narrator conforms to classic unreliable character archetypes—the naïf, the madman, the pícaro, or the clown.
Key Challenge: Unreliability cues are often subtle and context-dependent, scattered throughout the text, and sometimes require deep reasoning regarding the narrator's emotional and psychological state.

Method¶

Overall Architecture¶

Ours models unreliable narrator identification as three independent classification tasks: - Intra-narrational: Binary classification (verbal tics vs. reliable) - Inter-narrational: Three-class classification (same-unreliable-character-over-time / other-character-contradiction / reliable) - Inter-textual: Five-class classification (naïf / madman / pícaro / clown / reliable)

Key Designs¶

TUNa Dataset Construction: Collects first-person narrative texts from four domains: blogs (PersonaBank), Reddit posts (r/AITA), hotel reviews (Deceptive Opinion), and literary works (Project Gutenberg). Each sample is annotated by at least 2 experts majoring in English literature, achieving a Cohen's Kappa inter-annotator agreement of κ=0.71~0.75 (substantial agreement). Disagreements are resolved through discussion.
Curriculum Learning: Training samples are ranked by "degree of ambiguity"—LLMs first count the number of feature instances for each label type; samples with fewer candidate labels are considered easier. Parameter-efficient fine-tuning via LoRA is performed first on the easy subset, followed by the difficult subset, gradually improving the model's capability.
Transfer from Literature to Reality: Models are trained using training samples from the literary domain (Fiction) and tested out-of-domain on real-world domains such as blogs, Reddit, and reviews, verifying the transferability of unreliability knowledge learned from literature.

Loss & Training¶

Standard classification cross-entropy loss is used, paired with LoRA adapters (8-bit quantization) for parameter-efficient fine-tuning, training for 3 epochs.

Key Experimental Results¶

Main Results (Llama3.1-8B, F1 macro)¶

Task	Method	Fiction	Blog	Subreddit	Review
Intra-nar	CL	58.51	53.94	50.04	67.17
Intra-nar	Fine-tuned	50.09	50.63	49.00	55.85
Intra-nar	Zero-Shot	45.17	45.56	47.41	58.46
Inter-nar	CL	34.59	35.92	30.91	35.29
Inter-nar	Fine-tuned	34.63	28.73	25.59	36.59
Inter-tex	CL	27.42	19.58	13.49	16.72
Inter-tex	Fine-tuned	28.59	18.99	10.85	17.54

Ablation Study: Comparison of Different Model Sizes (Average F1 macro across all domains)¶

Model	Intra-nar (CL)	Inter-nar (CL)	Inter-tex (CL)
Llama3.1-8B	57.42	34.18	19.30
Llama3.3-70B	51.26	33.49	21.04
Mistral-7B	55.76	31.15	29.68
ModernBERT	39.94	27.07	16.98

Key Findings¶

Increasing Task Difficulty: Intra-narrational is the easiest, while Inter-textual is the hardest (requiring more abstract reasoning).
Effectiveness of Curriculum Learning: CL significantly outperforms standard fine-tuning on smaller models, but the few-shot performance of large models (70B) is already comparable to CL.
Feasibility of Cross-Domain Transfer: Knowledge learned from Fiction can be reasonably transferred to real-world domains such as Blogs, Subreddits, and Reviews.
Gender Bias: Male narrators are correctly predicted at a higher rate than female narrators.
Impact of Narrative Style: Dialogue-heavy styles aid intra-narrational prediction, while descriptive styles aid inter-narrational and inter-textual prediction.

Highlights & Insights¶

Innovatively combines literary theory (narratology) with NLP tasks, defining a completely new and meaningful task—automated identification of unreliable narrators.
The TUNa dataset covers multiple domains (literature/blogs/Reddit/reviews) with high-quality expert annotations (approx. 5 minutes of annotation time per sample).
Cleverly designed curriculum learning strategy: defines sample difficulty based on the "number of ambiguous candidate labels" rather than traditional loss magnitude.
Discovers that knowledge of unreliability learned from fictional works can transfer to real-world texts, opening a new direction of "learning real-world understanding from fiction."

Limitations & Future Work¶

Processes only short texts (up to 1050 tokens), without considering entire novels or long documents.
The dataset contains only English, without covering other languages.
The dataset size is relatively small (817 samples), limited by the high cost of expert annotation.
Even with the best methods, the F1 score for the Inter-textual task remains very low (~30%), indicating that the task is highly challenging.

Related to but fundamentally different from misinformation detection and deception detection: this work focuses on unintentional unreliability rather than intentional deception.
A natural extension of narrative AI directions such as character understanding and protagonist emotion analysis.
The curriculum learning strategy of "defining difficulty based on ambiguity" can be generalized to other NLP tasks containing ambiguous labels.

Rating¶

Dimension	Score
Novelty	⭐⭐⭐⭐
Technical Depth	⭐⭐⭐
Experimental Thoroughness	⭐⭐⭐⭐
Value	⭐⭐⭐
Overall Recommendation	⭐⭐⭐⭐