Writing in Symbiosis: Mapping Human Creative Agency in the AI Era¶
Conference: NeurIPS 2025 arXiv: 2512.13697 Code: Not yet open-sourced (paper states code available on request) Area: LLM/NLP Keywords: human-AI coevolution, creative writing, stylometric analysis, authorial archetypes, LLM influence detection
TL;DR¶
Through longitudinal corpus analysis of 50,000+ documents, this paper proposes the "Dual-Track Evolution" hypothesis — that in the LLM era, human writing exhibits thematic convergence alongside structural stylistic differentiation — and identifies three authorial adaptation archetypes: Adopters, Resistors, and Pragmatists.
Background & Motivation¶
State of the Field¶
The widespread adoption of large language models raises a fundamental question: as "creative writing" increasingly becomes a human–machine collaborative act, is the distinctiveness of human authorship disappearing?
Limitations of Prior Work¶
Oversimplified homogenization narrative: Existing work primarily focuses on LLM-driven "stylistic homogenization" of internet and academic text, framing it as a unidirectional influence from AI to humans.
Lack of individual-level resolution: Most studies examine only aggregate trends, overlooking differential adaptation strategies at the individual author level.
Conflation of topic and style: Prior research frequently conflates thematic shifts with stylistic changes, lacking effective methods to disentangle the two.
Absence of longitudinal controls: No systematic within-author comparisons across the pre- and post-LLM boundary exist.
Core Hypothesis¶
Dual-Track Evolution Hypothesis: - Track 1: Universal thematic convergence toward AI-related topics. - Track 2: Structural stylistic differentiation — rather than simple homogenization — at the level of writing style.
Method¶
Overall Architecture¶
The research design proceeds in three stages: corpus construction → feature engineering → multi-perspective analysis.
Temporal boundary: November 30, 2022 (the release date of ChatGPT) divides the timeline into pre-LLM and post-LLM periods.
Corpus Construction¶
- Scale: 50,000+ documents (derived from 823k+ messages and papers)
- Time span: January 2021 – December 2024
- Two genres:
- Formal text: arXiv computer science preprints (permissive license)
- Informal text: Discord Unveiled dataset (CC BY 4.0), anonymized communications from public servers
- AI reference corpus: ShareGPT-90k (Apache-2.0) + Dolly-15k (CC BY-SA 3.0)
- Sampling strategy:
- Discord: topic-controlled stratified sampling, balancing pre/post-LLM quotas across categories
- arXiv: monthly sampling to ensure temporal continuity
Key Designs: Perplexity-Gap Analysis¶
The core methodological contribution is the use of a Perplexity Gap to quantify stylistic temporal evolution:
- Pre-LLM Judge: GPT-2 Medium (355M), trained exclusively on pre-2022 data (847M tokens, ~41.7 hours on A100)
- Current Baseline: Llama-3-8B-base (frozen modern model)
- Interpretation: Text that is "easy" for the current model but "difficult" for the older model exhibits linguistic characteristics of the LLM era.
AI-likeness index (within-author z-score normalized):
Δ-Feature Vector¶
Each author's stylistic change is represented as a 7-dimensional normalized vector:
| Feature | Description |
|---|---|
| \(\Delta_{ppl}\) | Perplexity gap |
| \(\Delta_{TTR}\) | Lexical diversity (type-token ratio) |
| \(\Delta_{FKGL}\) | Readability grade (Flesch-Kincaid) |
| \(\Delta_{passive\%}\) | Passive voice ratio |
| \(\Delta_{1p\%}\) | First-person pronoun frequency |
| \(\Delta_{punct}\) | Punctuation density |
| \(\Delta_{sent\_len}\) | Mean sentence length |
All features are z-score normalized within each author across the temporal boundary.
Statistical Controls¶
Fixed-effects model:
where \(\alpha_i\) denotes author fixed effects, \(\delta_c\) denotes server-category fixed effects, with HC3 robust standard errors and Holm-Bonferroni correction.
Clustering Method¶
HDBSCAN is applied to the Δ-feature vectors (min_cluster_size=15, min_samples=5, metric=euclidean). Rationale for selecting HDBSCAN: - Identifies clusters of varying shapes and densities - Handles noise points automatically (without forced assignment) - Allows genuine authorial archetypes to emerge organically from the data
Key Experimental Results¶
Main Results: Three Authorial Archetypes¶
Clustering analysis of 2,100 social-corpus authors reveals three behavioral patterns:
| Archetype | Count | Share | Core Characteristics |
|---|---|---|---|
| Resistors | 442 | 21% | Low/negative perplexity gap; maintain pre-LLM linguistic complexity |
| Adopters | 370 | 18% | Highest perplexity gap; writing converges toward LLM style |
| Pragmatists | 866 | 41% | Moderate stylistic change + high engagement with AI topics |
Clustering quality metrics: - Silhouette: 0.426 (95% CI: 0.419–0.433) - ARI robustness: 0.891 (95% CI: 0.884–0.898) - Bootstrap consistency: 89% - Predictive AUC: 0.813
Macro-Trend Validation¶
| Finding | Evidence |
|---|---|
| AI topic convergence | Both genres show significant increase in AI-related content after Nov 2022 |
| Thematic structural breakpoint | Q1 2023 (detected via PELT algorithm) |
| Stylistic complexity breakpoint | Q2 2023 (lagging behind thematic shift) |
| Social corpus perplexity gap increase | +23% (early 2023) |
| Formal corpus perplexity gap increase | +15% (early 2023) |
Ablation Study: Dynamic Arc of Stylistic Adaptation¶
The most important finding is a two-phase pattern:
| Phase | Period | Social Corpus | Formal Corpus |
|---|---|---|---|
| Convergence phase | Early 2023 | Perplexity gap ↑23% | ↑15% |
| Avoidance phase | Late 2023–2024 | Perplexity gap ↓18% (from peak) | ↓12% |
This suggests that once LLM-like stylistic features become stigmatized, authors actively avoid them — particularly in formal contexts.
Key Findings¶
- Cross-validation accuracy: 89.3% (95% CI: 87.1–91.5)
- Held-out arXiv data: 89.1% (86.8–91.4)
- Null model comparison: silhouette 0.31 vs. 0.43 (p<0.001)
- Temporal boundary robustness: 84% archetype assignment consistency (91% for extreme cases)
- AI-likeness remains significant after controlling for FKGL/TTR/sentence length: partial correlation r=0.34, p<0.001
Highlights & Insights¶
- The Dual-Track Evolution framework is an elegant theoretical contribution — unifying the seemingly contradictory phenomena of "convergence" and "differentiation" within a single model.
- The Perplexity-Gap method ingeniously leverages the capability gap between language models from different eras to quantify stylistic change, avoiding circular reasoning.
- The two-phase dynamic arc (initial convergence followed by avoidance) reveals how social pressure modulates writing style — awareness of AI detection and academic peer-review pressure both drive stylistic retreat.
- Important implications for AI detection: the three archetypes imply that a simple human-vs-machine binary detection framework is insufficient — Adopters' texts are statistically closer to AI output than those of Resistors.
- The majority of authors (Resistors + Pragmatists, totaling 62%) maintain non-AI stylistic signatures, suggesting that distinctive human expression remains both valued and actively preserved.
Limitations & Future Work¶
- Observational rather than causal: It cannot be confirmed that AI tool usage directly causes stylistic change; other confounding factors may be present.
- English only: Findings may be biased toward English-speaking writing communities; different languages and cultures may exhibit different responses.
- Lack of direct human validation: The archetypes are statistically constructed and have not been validated through participant studies (e.g., surveys of authors' actual AI usage behavior).
- Social corpus bias: Discord users are not representative of all internet users.
- Potential for misuse: The archetype framework could be exploited for author identity surveillance or unjust stylistic discrimination.
- Future directions: Extension to multilingual settings, incorporation of causal inference designs, and integration of user surveys.
Related Work & Insights¶
- Relationship to Geng & Trotta (2025): The latter focuses on human–LLM co-evolution in academic writing; this paper extends that scope to social corpora and proposes individual-level analysis.
- Tension with AI detection research: Detectors assume a human/machine binary, but this paper demonstrates that assumption is no longer valid in a co-evolution context.
- Scaffolded collaboration (Dhillon et al. 2024): Different strategies of scaffolded collaboration align with the archetypes identified in this paper.
- Language simplification trends (Di Marco et al. 2024): Broader language simplification trends on social media may confound the assessment of AI influence.
Insight: When designing AI writing tools, the differential needs of each archetype should be considered — Resistors require tools that preserve distinctive voice; Adopters benefit from deep collaboration tools; Pragmatists need tools that support content exploration while protecting stylistic identity.
Rating¶
- Novelty: ⭐⭐⭐⭐ — The "Dual-Track Evolution" hypothesis and individual-level archetype framework are relatively novel contributions to the field; the Perplexity-Gap method is creative.
- Experimental Thoroughness: ⭐⭐⭐⭐ — Large-scale corpus, multiple statistical controls, and thorough clustering robustness validation; however, human validation and multilingual experiments are absent.
- Writing Quality: ⭐⭐⭐⭐ — Clear structure and coherent narrative logic, progressing systematically from macro to micro levels.
- Value: ⭐⭐⭐⭐ — Significant implications for creative writing research in the AI era, AI detection, and human–computer interaction; practical application pathways remain to be explored.