Skip to content

EvoSpark: Endogenous Interactive Agent Societies for Unified Long-Horizon Narrative Evolution

Conference: ACL 2026
arXiv: 2604.12776
Code: None
Area: LLM/NLP
Keywords: Multi-agent narratives, long-horizon story evolution, social memory metabolism, spatial alignment, emergent characters

TL;DR

EvoSpark proposes a multi-agent framework to support long-horizon narrative evolution. It resolves social memory stacking and narrative-spatial misalignment through a triple design: Recursive Social Basis (RSB for social cognitive metabolism), Generative Mise-en-scène (GMS for character-location-plot alignment), and Emergent Character Grounding Protocol (ECGP to transform LLM hallucinations into persistent characters).

Background & Motivation

Background: LLM multi-agent systems have made progress in narrative generation (e.g., Generative Agents, BookWorld), but face systemic degradation in long-horizon simulations.

Limitations of Prior Work: (1) Social memory stacking—additive memory leads to the accumulation of contradictory relationship states (e.g., being friends and enemies simultaneously), resulting in behavioral incoherence; (2) Narrative-spatial misalignment—textual agents lack spatial state synchronization mechanisms, where characters often appear in locations that contradict plot logic.

Key Challenge: Long-horizon narratives require a balance between "open emergence" and "logical consistency"—excessive control sacrifices autonomy, while excessive freedom leads to chaos. Existing frameworks are either strictly scripted (sacrificing emergence) or completely open (sacrificing coherence).

Goal: To build a unified framework supporting a full spectrum of control from strict hierarchical planning to complete free emergence while maintaining long-term logical consistency.

Key Insight: Redesigning the memory system and spatial management—memory is not an additive log but "living cognition" (metabolically updated), and space is not a passive container but a "virtual stage manager."

Core Idea: Social Evolutionary Basis (RSB) for memory metabolism + Generative Mise-en-scène (GMS) for spatial alignment + Emergent Character Grounding (ECGP) to transform hallucinations into creativity.

Method

Overall Architecture

Collaboration among four types of agents: Genesis Agent (narrative conception and macro-planning), Architect Agent (world instantiation and character promotion), Director Agent (simulation execution and spatial alignment), and Role Agents (interaction execution and memory updates). Support for three control modes: HDP (Hierarchical Detailed Planning), SNP (Sequential Key Nodes), and Free EN (Full Free Emergence).

Key Designs

  1. Role Social Evolutionary Basis (RSB) and Reflection-Synthesis-Solidification Mechanism:

    • Function: Solves the social memory stacking problem—transforming memory from an additive log into metabolizable living cognition.
    • Mechanism: A four-layer memory architecture—Episodic Evolution Buffer (EEB, short-term cache), Shared World Knowledge Base (SWKB, immutable global truth), Role Episode Base (REB, immutable experience log for traceability), and Role Social Evolutionary Basis (RSB, mutable current state snapshot). A Reflection-Synthesis-Solidification cycle is triggered at the end of an event: Reflection trigger (interaction intensity exceeds threshold) → Synthesis (comparing new EEB data with old RSB status, resolving topological conflicts) → Solidification (overwriting RSB in-place, replacing old relationships with new ones).
    • Design Motivation: The reflection in Generative Agents only synthesizes observations to maintain state without metabolism—old relationships are stacked rather than replaced, inevitably leading to contradictions in the long run.
  2. Generative Mise-en-scène (GMS):

    • Function: Solves narrative-spatial misalignment—ensuring characters appear in locations consistent with plot logic.
    • Mechanism: Operates in two stages—offline planning alignment (Genesis Agent establishes initial constraints across character, location, and plot dimensions) and dynamic spatial alignment (Director Agent synchronizes narrative intent with real-time context via spatial blocking at runtime, including entity resolution steps to correct identity hallucinations generated by the LLM). GMS serves as a "virtual stage manager" to implicitly provide agents with spatial awareness.
    • Design Motivation: Environments in existing frameworks are typically passive containers—BookWorld has discrete geographic tracking but lacks fine-grained character-location-plot alignment.
  3. Emergent Character Grounding Protocol (ECGP):

    • Function: Transforms LLM hallucinations (generating uninitialized character names) into persistent story-world entities.
    • Mechanism: A four-step process—Inspiration Detection (LLM hallucinates new names even under a restricted character list = signal of narrative necessity) → Entity Resolution (Director verifies if it is a truly new entity rather than an alias) → Ontological Promotion (elevating hierarchical status based on plot importance) → Integration and Grounding (Architect instantiates the new character in the story world and RSB).
    • Design Motivation: Transforming hallucinations from errors into creative assets—generative narratives require open-ended world expansion, and the randomness of LLMs provides a mechanism for emergent new characters.

Loss & Training

EvoSpark is a pure inference-time framework and does not involve training. Various LLM backbones are used (GPT-4o and open-source models in experiments).

Key Experimental Results

Main Results

EvoSpark significantly outperforms Open-Theatre, BookWorld, and HoLLMwood across three modes (HDP, SNP, Free EN) and multi-language, multi-backbone settings in dimensions such as character performance, narrative coherence, and spatial consistency. It generates 200k-250k words per run in long-horizon settings.

Ablation Study

Configuration Key Metrics Description
W/o GMS Dynamic Spatial Alignment Increase in spatial contradictions Characters "lost in space"
W/o RSB Metabolism Incoherent behavior in long-horizon Social memory stacking leads to contradictions
W/o ECGP Limited world expansion Loss of ability to emerge new characters

Key Findings

  • The presence or absence of GMS directly affects physical consistency—without GMS, logical contradictions such as "a character staring at A but their body turned towards B" appear.
  • The RSB metabolism mechanism is the core of long-term consistency—additive memory shows severe stacking after 15 events.
  • ECGP proves the possibility of "hallucination as creativity"—approximately 20% of emergent characters contribute significantly to the subsequent narrative.

Highlights & Insights

  • The ECGP design of transforming LLM hallucinations into creativity is highly inspiring—in other scenarios, "erroneous" information generated by LLMs might be redefined as "exploration."
  • The concept of memory metabolism is superior to simple memory management—instead of managing storage space, it makes memory "alive" and "metabolic," which is closer to the essence of human memory.
  • The unified framework of three control modes (HDP/SNP/Free EN) demonstrates how to support a full spectrum from strict to free control in a single architecture.

Limitations & Future Work

  • Evaluation is primarily on fictional narratives; applicability to other types of simulations (e.g., social science simulations) remains to be verified.
  • High computational costs—long-horizon simulations require a large number of LLM calls.
  • The trigger threshold for RSB metabolism is a hyperparameter; different story types may require different configurations.
  • Entity resolution in ECGP may fail when the number of characters is very large.
  • vs Generative Agents: GA uses reflection to synthesize observations without metabolism; EvoSpark uses RSB for in-place updates to solve memory stacking.
  • vs BookWorld: BW has discrete geographic tracking but lacks character-location-plot alignment; EvoSpark uses GMS for fine-grained spatial management.
  • vs HoLLMwood: HW uses a writer-editor workflow to refine narrative quality but lacks spatial and memory metabolism mechanisms.

Rating

  • Novelty: ⭐⭐⭐⭐⭐ Memory metabolism, spatial scheduling, and hallucination transformation are all novel concepts.
  • Experimental Thoroughness: ⭐⭐⭐⭐ Three modes, multi-baseline comparison, and long-term consistency analysis.
  • Writing Quality: ⭐⭐⭐⭐ Detailed framework description, though excessive component names lead to higher cognitive load.