Towards Enhanced Immersion and Agency for LLM-based Interactive Drama¶
Conference: ACL 2025
arXiv: 2502.17878
Code: GitHub
Area: LLM NLP
Keywords: Interactive Drama, Immersion, Agency, Script Generation, Role-Playing Agent
TL;DR¶
This work proposes an Immersion-Agency paradigm to conceptualize LLM-based interactive drama, and designs two methods—Playwriting-guided Generation and Plot-based Reflection—to enhance story generation quality and player agency, respectively.
Background & Motivation¶
LLM-based interactive drama is an emerging genre of AI conversational application, where players role-play and interact with LLM agents representing other characters to experience an unfolding narrative. However, existing works suffer from the following limitations:
- Lack of Theoretical Framework: Previous studies focused primarily on general architecture design, without deeply exploring the core dimensions of the interactive experience. This paper introduces two key concepts from classical interactive narrative theory: Immersion (the feeling of being absorbed in the story) and Agency (the player's ability to influence the story world).
- Insufficient Story Generation Quality: Although LLMs are exposed to a vast amount of literary works during pre-training, fine-tuning processes lack emphasis on playwriting techniques. Consequently, the generated stories often lack fundamental dramatic structures and compelling conflicts. Experiments demonstrate that GPT-4o and Qwen2.5-72b rarely employ any narrative techniques without explicit prompting.
- Neglect of Character Agency: Previous character agent architectures rarely considered how player actions could meaningfully influence character reactions and the narrative trajectory.
Method¶
Overall Architecture¶
The system consists of two major modules: (1) Script Generation, which utilizes Playwriting-guided Generation to generate high-quality dramatic stories (including plot structures and narrative techniques) from a premise provided by the player; (2) Character Agents, which employ Plot-based Reflection to allow NPCs to dynamically adjust the plot chain based on player actions, thereby enhancing agency.
Key Designs¶
-
Playwriting-guided Generation:
- Defines 8 classic dramatic situations (such as love, phoenix rebirth, Cinderella, vengeance, etc.), described based on Aristotle’s three-act structure (setup, confrontation, resolution).
- Summarizes 6 micro-narrative techniques (suspense, twists, non-linear narrative, multiple-narrative, irony, symbolism).
- Generation workflow: Sample 1 dramatic situation + 3 narrative techniques \(\rightarrow\) Writer LLM generates the story \(\rightarrow\) Critic LLM evaluates and provides improvement feedback \(\rightarrow\) Writer revises \(\rightarrow\) Repeat 3 times to select the best formulation \(\rightarrow\) Progressive refinement of details.
- Effect: Narrative technique utilization rate increases from $6\%\text
}12\%$ in the baseline to $28\%\text{74\%$ (based on GPT-4o).
-
Plot-based Reflection:
- The character agent performs a reflection every \(k=5\) interaction steps to analyze memories of player activities (emotions, intentions) and dynamically adjust the plot chain.
- Each reflection is constrained to adjusting at most one incomplete plot point or inserting at most one new plot point, preventing incoherent narratives caused by excessive LLM modifications.
- This enables characters to exhibit meaningful shifts in reactions driven by player behavior, such as leaking secrets, offering companionship, or advancing the plot in specific directions.
-
Hybrid Agent Architecture:
- Director-Actor Architecture: A Director Agent coordinates globally while individual Actor Agents play their respective characters, which is suitable for high-interaction scenarios.
- One-for-All Architecture: A single global agent plays all characters, yielding higher efficiency, suitable for narrative-centric scenarios.
- The hybrid approach dynamically switches between these two architectures based on scenario characteristics, balancing performance and efficiency (accelerating inference by \(1.49\times\)).
Loss & Training¶
This work does not involve model training; all agents are built on prompt engineering using GPT-4o. The key strategies include: - A "Sampling-Critic-Revise" loop to ensure the correct application of playwriting techniques. - Progressive generation (adding details from coarse to fine). - A memory system that preserves all dialogue history within the prompt.
Key Experimental Results¶
Main Results¶
Story generation evaluation (50 premise paragraphs, human annotator win rates):
| Method | Conflict (Best ↑ / Worst ↓) | Suspense | Emotional Tension | Character Arc | Technique Adherence Rate |
|---|---|---|---|---|---|
| Outline-First | 18%/34% | 10%/28% | 10%/50% | 18%/36% | - |
| Playwriting-Guided | 32%/24% | 32%/22% | 48%/16% | 34%/20% | 92% |
| w/o Critic & Revise | 24%/24% | 26%/34% | 18%/28% | 12%/32% | 66% |
| w/o Refinement | 26%/18% | 32%/26% | 24%/6% | 36%/12% | - |
Character Agent evaluation (5-point scale, hand-crafted script "Seven at the Station", 10 human players + 10 agent players):
| Architecture | Character Consistency | Attractiveness | Narrative Completeness | Progress | Influence | Intention Following | Speedup Ratio |
|---|---|---|---|---|---|---|---|
| Director-Actor | 3.9 | 4.2 | 3.8 | 3.6 | 4.2 | 3.9 | 1.00x |
| Hybrid Architecture | 4.1 | 3.9 | 4.3 | 4.3 | 4.0 | 4.0 | 1.49x |
| w/o Reflection | 4.0 | 3.5 | 4.2 | 3.9 | 3.5 | 3.3 | 1.90x |
Ablation Study¶
| Configuration | Key Metric | Description |
|---|---|---|
| w/o Critic & Revise | Technique adherence rate \(92\% \rightarrow 66\%\) | Critic LLM is crucial for ensuring the correct application of playwriting techniques. |
| w/o Refinement | Emotional tension \(48\% \rightarrow 24\%\) | Progressive refinement contributes the most to emotional details. |
| w/o Plot-based Reflection | Influence \(4.0 \rightarrow 3.5\), Intention following \(4.0 \rightarrow 3.3\) | The reflection mechanism is the core of agency. |
| Pure Director-Actor | Progress 3.6 vs. Hybrid 4.3 | Multi-agent communication causes information loss, affecting narrative progress. |
Key Findings¶
- Progressive refinement contributes the most to emotional tension (\(10\% \rightarrow 48\%\)), as emotions typically derive from subtle details in the text.
- Plot-based Reflection not only enhances agency but also increases character attractiveness (\(3.5 \rightarrow 3.9\)), presumably because reflection encourages characters to display stronger empathy.
- Aggressive agent players paradoxically scored higher in character attractiveness and influence dimensions, likely because agents interact more actively, and high-quality responses in turn left a deep impression on the annotators.
- The algorithm can automatically match appropriate dramatic situations to different themes (e.g., romance \(\rightarrow\) love, crime \(\rightarrow\) deliverer/savior).
Highlights & Insights¶
- Theoretical Contribution: First to establish an Immersion-Agency evaluation paradigm for LLM-based interactive drama, providing an analytical framework from the perspectives of narratology and psychology.
- Systematization of Dramatic Techniques: Combining classical dramatic theories (Polti’s 36 dramatic situations, Aristotle’s three-act structure) with modern LLM prompt engineering, presenting an interesting practice of computational creativity.
- Human-Centric Evaluation: Rejecting automated LLM evaluators and insisting on evaluations conducted by annotators with training in the humanities, as the evaluation of literary works requires accuracy and empathy.
- Practicality of Hybrid Architecture: Dynamically selecting the architecture based on scenario characteristics achieves a favorable balance between efficiency and quality.
Limitations & Future Work¶
- Dependency on GPT-4o: All agents rely on the same closed-source model, which is costly and uncontrollable.
- Efficiency Issues: Playwriting-guided Generation is 10-12 times slower than vanilla prompting.
- Reflection Boundary Control: LLMs tend to over-adjust the narrative. Currently, this is addressed via hard constraints, which may restrict more creative adaptation.
- Solely Focused on Dialogue Formats: Exploration of scene generation (multimodal elements like visuals and music) for enhancing immersion is not conducted.
- Limited Evaluation Scale: Tested with only 10 human players and one hand-crafted script; generalizability remains to be verified.
Related Work & Insights¶
- Mateas (2000)'s theory of Interactive Drama provides the original definitions of Immersion and Agency, which this work operationalizes into evaluable dimensions.
- Park et al. (2023)'s memory-based reflection focuses on memory synthesis, whereas Plot-based Reflection focuses on plot adaptation; the two are complementary, parallel techniques.
- Wu et al. (2024) first defined the six elements and plot chain mechanism of LLM interactive drama, upon which this work adds reflection and generation enhancement.
- This paper demonstrates the possibility of integrating classical dramatic theory into AI systems, providing broad insights for NPC AI, game narrative, educational simulation, and other fields.
Rating¶
| Dimension | Score (1-5) |
|---|---|
| Novelty | 4 |
| Theoretical Depth | 4 |
| Experimental Thoroughness | 3 |
| Practical Value | 4 |
| Writing Quality | 4 |
| Overall Score | 3.8 |