Time, Identity and Consciousness in Language Model Agents¶

Conference: AAAI 2026 Spring Symposium arXiv: 2603.09043 Code: Available Area: LLM Agent / AI Safety Keywords: Machine Consciousness, Identity Evaluation, Language Model Agents, Temporal Consistency, Stack Theory

TL;DR¶

This paper applies the temporal gap concept from Stack Theory to LLM agent evaluation, proposing a conservative evaluation toolkit that distinguishes between "talking like a stable self" and "being organized like a stable self." It reveals identity trade-offs across different scaffold structures via persistence scores and an identity morphospace.

Background & Motivation¶

Background: Machine consciousness evaluation primarily relies on behavioral observation — for language models, this means language use and tool use. Existing evaluation methods allow agents to "say the right things" (e.g., claim self-awareness) even when the underlying constraints are not simultaneously present.

Limitations of Prior Work: (1) Behavioral evaluation can be confounded by an agent's linguistic capabilities — models can generate correct statements about themselves without actually possessing the properties in question; (2) ingredient-wise occurrence within an evaluation window and co-instantiation at a single decision step are fundamentally different, yet existing methods do not distinguish between them.

Key Challenge: The gap between "sounding like" and "being" — language models can perfectly mimic discourse about identity and consciousness without possessing those properties at the organizational level.

Goal: Develop a conservative identity evaluation toolkit capable of distinguishing imitative behavior from organizational-level identity consistency.

Key Insight: Leverage the "temporal gap" concept from Stack Theory to scaffold evaluation — distinguishing between components appearing one-by-one within a time window and components co-instantiated at a single decision step.

Core Idea: Instantiate the Arpeggio and Chord postulates of Stack Theory to evaluate "grounded identity statements," generating two persistence scores; map common scaffold structures into an identity morphospace.

Method¶

Overall Architecture¶

Instrument and record behavioral trajectories of LLM agents. Extract identity-relevant state information from scaffold traces. Compute persistence scores using the Arpeggio and Chord postulates respectively. Map multiple scaffold structures into an identity morphospace to reveal design trade-offs along identity dimensions.

Key Designs¶

Arpeggio vs. Chord Persistence Scores:
- Function: Quantitatively distinguish between "sequential occurrence of components" and "simultaneous co-instantiation of components"
- Mechanism: The Arpeggio score measures whether identity-relevant components appear sequentially within a time window (weak form — indicating at least that the agent has been exposed to the necessary information); the Chord score measures whether these components simultaneously influence behavior within a single decision step (strong form — indicating that the agent genuinely integrates all relevant factors at the moment of decision)
- Design Motivation: Occurrence alone does not equal joint participation in decision-making. An agent may process different aspects of identity at separate steps without ever integrating them into a single decision
Five Operational Identity Metrics:
- Function: Operationalize abstract identity concepts into computable metrics
- Mechanism: Five concrete metrics related to identity persistence are defined, including temporal consistency (consistency of responses to the same identity question at different time points) and contextual robustness (stability of identity expression across different conversational contexts). These metrics can be computed directly from instrumented scaffold traces
- Design Motivation: Philosophical notions of identity must be translated into technically measurable indicators
Identity Morphospace:
- Function: Visualize the trade-offs of different scaffold designs along identity dimensions
- Mechanism: Using Arpeggio/Chord scores and the five identity metrics as coordinate axes, common LLM scaffolds (e.g., ReAct, Plan-then-Execute, Memory-augmented) are mapped as points in the morphospace, making the strengths, weaknesses, and trade-offs of different scaffolds immediately apparent
- Design Motivation: Provides scaffold designers with an identity-dimension reference when selecting among architectural options

Loss & Training¶

This paper presents an evaluation framework rather than a training methodology. All metrics are computed rule-based from instrumented traces.

Key Experimental Results¶

Main Results¶

Scaffold Type	Arpeggio Score	Chord Score	Notes
Simple prompt	Low	Low	Virtually no identity structure
ReAct	Medium	Low	Components appear but do not co-instantiate
Memory-augmented	High	Medium	Memory aids component accumulation
Plan-then-Execute	Medium	Medium	Planning provides some integration

Ablation Study¶

Feature	Effect on Chord	Notes
Long-term memory	Increases Arpeggio	Helps components persist
Reflection mechanism	Increases Chord	Facilitates multi-component integration
Fixed system prompt	Increases surface consistency	Does not improve true Chord

Key Findings¶

Most existing scaffolds achieve acceptable Arpeggio scores but low Chord scores, indicating that identity components appear but are rarely integrated within a single decision step
Memory-augmented scaffolds show the greatest advantage in identity persistence, yet still fall far short of being "organized like a stable self"
A simple system prompt of the form "I am XXX" improves surface-level identity consistency but does not improve underlying Chord scores

Highlights & Insights¶

Philosophy → Engineering Translation: Operationalizing the philosophical postulates of Stack Theory into computable metrics bridges philosophy and engineering
Arpeggio vs. Chord Distinction: This core distinction is highly insightful — "sequential occurrence" vs. "simultaneous co-instantiation" precisely characterizes the difference between "mimicking" and "possessing"
Morphospace as a Visualization Tool: Introduces a new evaluation dimension for scaffold design, helping practitioners understand the consequences of design choices

Limitations & Future Work¶

Stack Theory is itself a relatively nascent consciousness framework whose philosophical foundations remain contested
Ground truth for identity evaluation is difficult to establish — what constitutes "genuine" identity persistence?
Experiments are limited in scale, covering only a small number of scaffolds and models
Instrumented recording may alter the agent's own behavior

vs. Consciousness Tests (Butlin et al. 2023): Traditional consciousness tests focus on behavioral performance; this paper adds a temporal dimension to the analysis
vs. Self-Awareness Benchmarks: Existing self-awareness benchmarks evaluate via QA formats, whereas this paper evaluates at the organizational structure level
vs. Embodied Agent Evaluation: The proposed method is also applicable to identity evaluation in embodied agents

Rating¶

Novelty: ⭐⭐⭐⭐⭐ Operationalizing consciousness theory into LLM evaluation tools is highly original
Experimental Thoroughness: ⭐⭐⭐ Primarily proof-of-concept; experimental scale is limited
Writing Quality: ⭐⭐⭐⭐ Concepts are clearly explained, though substantial philosophical terminology is involved
Value: ⭐⭐⭐⭐ Opens a new direction for identity/consciousness evaluation in AI safety