Writing in Symbiosis: Mapping Human Creative Agency in the AI Era¶

Conference: NeurIPS 2025 arXiv: 2512.13697
Code: Not yet open-sourced (paper states code available on request)
Area: LLM/NLP Keywords: human-AI coevolution, creative writing, stylometric analysis, authorial archetypes, LLM influence detection

TL;DR¶

Through longitudinal corpus analysis of 50,000+ documents, this paper proposes the "Dual-Track Evolution" hypothesis — that in the LLM era, human writing exhibits thematic convergence alongside structural stylistic differentiation — and identifies three authorial adaptation archetypes: Adopters, Resistors, and Pragmatists.

Background & Motivation¶

State of the Field¶

The widespread adoption of large language models raises a fundamental question: as "creative writing" increasingly becomes a human–machine collaborative act, is the distinctiveness of human authorship disappearing?

Limitations of Prior Work¶

Oversimplified homogenization narrative: Existing work primarily focuses on LLM-driven "stylistic homogenization" of internet and academic text, framing it as a unidirectional influence from AI to humans.

Lack of individual-level resolution: Most studies examine only aggregate trends, overlooking differential adaptation strategies at the individual author level.

Conflation of topic and style: Prior research frequently conflates thematic shifts with stylistic changes, lacking effective methods to disentangle the two.

Absence of longitudinal controls: No systematic within-author comparisons across the pre- and post-LLM boundary exist.

Core Hypothesis¶

Dual-Track Evolution Hypothesis: - Track 1: Universal thematic convergence toward AI-related topics. - Track 2: Structural stylistic differentiation — rather than simple homogenization — at the level of writing style.

Method¶

Overall Architecture¶

The research design proceeds in three stages: corpus construction → feature engineering → multi-perspective analysis.

Temporal boundary: November 30, 2022 (the release date of ChatGPT) divides the timeline into pre-LLM and post-LLM periods.

Corpus Construction¶

Scale: 50,000+ documents (derived from 823k+ messages and papers)
Time span: January 2021 – December 2024
Two genres:
- Formal text: arXiv computer science preprints (permissive license)
- Informal text: Discord Unveiled dataset (CC BY 4.0), anonymized communications from public servers
AI reference corpus: ShareGPT-90k (Apache-2.0) + Dolly-15k (CC BY-SA 3.0)
Sampling strategy:
- Discord: topic-controlled stratified sampling, balancing pre/post-LLM quotas across categories
- arXiv: monthly sampling to ensure temporal continuity

Key Designs: Perplexity-Gap Analysis¶

The core methodological contribution is the use of a Perplexity Gap to quantify stylistic temporal evolution:

\[\Delta_{ppl} = \frac{-\ln(p_{GPT2}(x))}{|x|_{chars}} - \frac{-\ln(p_{Llama}(x))}{|x|_{chars}}\]

Pre-LLM Judge: GPT-2 Medium (355M), trained exclusively on pre-2022 data (847M tokens, ~41.7 hours on A100)
Current Baseline: Llama-3-8B-base (frozen modern model)
Interpretation: Text that is "easy" for the current model but "difficult" for the older model exhibits linguistic characteristics of the LLM era.

AI-likeness index (within-author z-score normalized):

\[AI_{likeness} = \frac{\Delta_{ppl} - \mu_{author}}{\sigma_{author}}\]

Δ-Feature Vector¶

Each author's stylistic change is represented as a 7-dimensional normalized vector:

Feature	Description
\(\Delta_{ppl}\)	Perplexity gap
\(\Delta_{TTR}\)	Lexical diversity (type-token ratio)
\(\Delta_{FKGL}\)	Readability grade (Flesch-Kincaid)
\(\Delta_{passive\%}\)	Passive voice ratio
\(\Delta_{1p\%}\)	First-person pronoun frequency
\(\Delta_{punct}\)	Punctuation density
\(\Delta_{sent\_len}\)	Mean sentence length

All features are z-score normalized within each author across the temporal boundary.

Statistical Controls¶

Fixed-effects model:

\[y_{it} = \beta \cdot PostLLM_t + \gamma \cdot len_{it} + \delta_c + \alpha_i + \epsilon_{it}\]

where \(\alpha_i\) denotes author fixed effects, \(\delta_c\) denotes server-category fixed effects, with HC3 robust standard errors and Holm-Bonferroni correction.

Clustering Method¶

HDBSCAN is applied to the Δ-feature vectors (min_cluster_size=15, min_samples=5, metric=euclidean). Rationale for selecting HDBSCAN: - Identifies clusters of varying shapes and densities - Handles noise points automatically (without forced assignment) - Allows genuine authorial archetypes to emerge organically from the data

Key Experimental Results¶

Main Results: Three Authorial Archetypes¶

Clustering analysis of 2,100 social-corpus authors reveals three behavioral patterns:

Archetype	Count	Share	Core Characteristics
Resistors	442	21%	Low/negative perplexity gap; maintain pre-LLM linguistic complexity
Adopters	370	18%	Highest perplexity gap; writing converges toward LLM style
Pragmatists	866	41%	Moderate stylistic change + high engagement with AI topics

Clustering quality metrics: - Silhouette: 0.426 (95% CI: 0.419–0.433) - ARI robustness: 0.891 (95% CI: 0.884–0.898) - Bootstrap consistency: 89% - Predictive AUC: 0.813

Macro-Trend Validation¶

Finding	Evidence
AI topic convergence	Both genres show significant increase in AI-related content after Nov 2022
Thematic structural breakpoint	Q1 2023 (detected via PELT algorithm)
Stylistic complexity breakpoint	Q2 2023 (lagging behind thematic shift)
Social corpus perplexity gap increase	+23% (early 2023)
Formal corpus perplexity gap increase	+15% (early 2023)

Ablation Study: Dynamic Arc of Stylistic Adaptation¶

The most important finding is a two-phase pattern:

Phase	Period	Social Corpus	Formal Corpus
Convergence phase	Early 2023	Perplexity gap ↑23%	↑15%
Avoidance phase	Late 2023–2024	Perplexity gap ↓18% (from peak)	↓12%

This suggests that once LLM-like stylistic features become stigmatized, authors actively avoid them — particularly in formal contexts.

Key Findings¶

Cross-validation accuracy: 89.3% (95% CI: 87.1–91.5)
Held-out arXiv data: 89.1% (86.8–91.4)
Null model comparison: silhouette 0.31 vs. 0.43 (p<0.001)
Temporal boundary robustness: 84% archetype assignment consistency (91% for extreme cases)
AI-likeness remains significant after controlling for FKGL/TTR/sentence length: partial correlation r=0.34, p<0.001

Highlights & Insights¶

The Dual-Track Evolution framework is an elegant theoretical contribution — unifying the seemingly contradictory phenomena of "convergence" and "differentiation" within a single model.
The Perplexity-Gap method ingeniously leverages the capability gap between language models from different eras to quantify stylistic change, avoiding circular reasoning.
The two-phase dynamic arc (initial convergence followed by avoidance) reveals how social pressure modulates writing style — awareness of AI detection and academic peer-review pressure both drive stylistic retreat.
Important implications for AI detection: the three archetypes imply that a simple human-vs-machine binary detection framework is insufficient — Adopters' texts are statistically closer to AI output than those of Resistors.
The majority of authors (Resistors + Pragmatists, totaling 62%) maintain non-AI stylistic signatures, suggesting that distinctive human expression remains both valued and actively preserved.

Limitations & Future Work¶

Observational rather than causal: It cannot be confirmed that AI tool usage directly causes stylistic change; other confounding factors may be present.
English only: Findings may be biased toward English-speaking writing communities; different languages and cultures may exhibit different responses.
Lack of direct human validation: The archetypes are statistically constructed and have not been validated through participant studies (e.g., surveys of authors' actual AI usage behavior).
Social corpus bias: Discord users are not representative of all internet users.
Potential for misuse: The archetype framework could be exploited for author identity surveillance or unjust stylistic discrimination.
Future directions: Extension to multilingual settings, incorporation of causal inference designs, and integration of user surveys.

Relationship to Geng & Trotta (2025): The latter focuses on human–LLM co-evolution in academic writing; this paper extends that scope to social corpora and proposes individual-level analysis.
Tension with AI detection research: Detectors assume a human/machine binary, but this paper demonstrates that assumption is no longer valid in a co-evolution context.
Scaffolded collaboration (Dhillon et al. 2024): Different strategies of scaffolded collaboration align with the archetypes identified in this paper.
Language simplification trends (Di Marco et al. 2024): Broader language simplification trends on social media may confound the assessment of AI influence.

Insight: When designing AI writing tools, the differential needs of each archetype should be considered — Resistors require tools that preserve distinctive voice; Adopters benefit from deep collaboration tools; Pragmatists need tools that support content exploration while protecting stylistic identity.

Rating¶

Novelty: ⭐⭐⭐⭐ — The "Dual-Track Evolution" hypothesis and individual-level archetype framework are relatively novel contributions to the field; the Perplexity-Gap method is creative.
Experimental Thoroughness: ⭐⭐⭐⭐ — Large-scale corpus, multiple statistical controls, and thorough clustering robustness validation; however, human validation and multilingual experiments are absent.
Writing Quality: ⭐⭐⭐⭐ — Clear structure and coherent narrative logic, progressing systematically from macro to micro levels.
Value: ⭐⭐⭐⭐ — Significant implications for creative writing research in the AI era, AI detection, and human–computer interaction; practical application pathways remain to be explored.