✍️ Text Generation¶

💬 ACL2026 · 17 paper notes

📌 Same area in other venues: 🔬 ICLR2026 (12) · 🧪 ICML2026 (2) · 🤖 AAAI2026 (3) · 📹 ICCV2025 (1) · 🧪 ICML2025 (1) · 💬 ACL2025 (27)

🔥 Top topics: Summarization ×5 · LLM ×3 · Agents ×2

Adaptive Planning for Multi-Attribute Controllable Summarization with Monte Carlo Tree Search: This paper proposes PACO, which reformulates "multi-attribute controllable summarization" as a planning problem to find an "attribute control sequence." Using a customized Monte Carlo Tree Search (where nodes are full summaries and actions are single-attribute adjustments), it identifies the optimal adjustment path during the prompting stage without any attribute-specific training. With Llama-3.2-1B, it achieves controllability comparable to the Llama-3.3-70B baseline, while Llama-3.3-70B + PACO surpasses all existing methods.
Are Emotion and Rhetoric Neurons in LLM? Neuron Recognition and Adaptive Masking for Emotion-Rhetoric Prediction Steering: This paper systematically investigates the representation mechanisms and intrinsic correlations of emotion and rhetorical neurons in LLMs. By proposing a neuron recognition framework combined with multi-dimensional screening and an adaptive masking verification method, it achieves directional induction of emotion/rhetoric prediction and utilizes rhetorical neurons to assist emotion recognition.
Can You Make It Sound Like You? Post-Editing LLM-Generated Text for Personal Style: The authors conducted a pre-registered online study with 81 participants who used GPT-o4-mini to draft and then manually post-edit style-sensitive texts such as wedding vows and apology letters. The findings reveal that while post-editing significantly moves the text toward the user's personal style and away from the LLM's style, the edited texts still systematically retain more "AI-like" traces than independent writing—a residue that participants themselves fail to perceive.
Children's English Reading Story Generation via Supervised Fine-Tuning of Compact LLMs with Controllable Difficulty and Safety: The authors utilized 2,580 stories generated by GPT-4o / Llama-3.3-70B corresponding to the UFLI K–2 English reading curriculum to perform four SFT designs (baseline, Good Stories, Rewarded SFT, and simulated children's pronunciation errors) on three 8B models (Llama 3 / Granite 3.3 / Apertus). The results demonstrate that compact models + appropriate SFT strategies can outperform zero-shot GPT-4o and Llama-3.3-70B on key K-2 metrics such as Spache readability, syntactic complexity, and toxicity. Among these, Rewarded SFT proved most stable and nearly hallucination-free.
ConlangCrafter: Constructing Languages with a Multi-Hop LLM Pipeline: This paper introduces ConlangCrafter, an LLM-based multi-hop pipeline that decomposes constructed language (conlang) design into modular stages of phonology, grammar, and lexicon. It ensures typological diversity through randomness injection and internal consistency via self-refinement loops, while proposing an automated evaluation framework encompassing typological diversity analysis and translation consistency.
Difficulty-Controllable Cloze Question Distractor Generation: This paper proposes DCDG, which enables easy/hard difficulty control for cloze distractor generation via dual-path data augmentation, QA ensemble difficulty clustering, and multi-task seq2seq training, significantly outperforming GPT-4o in both automatic and human evaluations.
EDUMATH: Generating Standards-aligned Educational Math Word Problems: The authors systematize the task of "generating math word problems (MWP) aligned with K-12 math curriculum standards," collecting 11,000+ STEM MWP training data points annotated by real US teachers. Through an SFT + KTO + ModernBERT filtering pipeline, they trained two open-source SOTA generators, EDUMATH-12B/30B. They conducted the first RCT on actual 3rd-5th grade students, finding that while student accuracy was comparable between LLM-generated and human-written problems, students showed an almost unanimous preference for customized LLM problems.
FACTS: Table Summarization via Offline Template Generation with Agentic Workflows: Ours proposes FACTS (Fast, Accurate, and Privacy-Compliant Table Summarization), which automatically generates reusable offline templates (SQL queries + Jinja2 templates) through a three-stage Agentic workflow. It achieves rapid, accurate, and privacy-compliant query-focused table summarization, outperforming baselines across FeTaQA, QTSumm, and QFMTS benchmarks.
Frankentext: Stitching Random Text Fragments into Long-Form Narratives: This paper proposes the Frankentext paradigm, which enables LLMs to stitch random human text fragments into coherent long-form narratives under extreme constraints (90% of text copied verbatim from human writing). This reveals the severe failure of current AI text detectors in mixed-authorship scenarios (72% of Frankentext is misclassified as human writing).
In-depth Research Impact Summarization through Fine-Grained Temporal Citation Analysis: This paper proposes the "Scientific Impact Summarization" task: first identifying fine-grained intents that truly reveal impact from the citation contexts of a paper, and then generating an impact narrative that evolves over time. This approach better illustrates how a paper is adopted, criticized, and transformed by subsequent work compared to simple citation counts.
Investigating the Representation of Backchannels and Fillers in Fine-tuned Language Models: This paper trains BERT, GPT-2, TurnGPT, LLaMA-3 8B, and Qwen-3 8B on English and Japanese spoken dialogue corpora using three fine-tuning tasks: MASK, NTP, and TTP. It utilizes t-SNE visualization and silhouette clustering to quantify the representation quality of "backchannels" (e.g., uh-huh) and "fillers" (e.g., um). The study finds that fine-tuning significantly distinguishes these "semantically bleached" functional words within the embedding space and enables models to naturally generate diverse backchannels/fillers during NLG, marking a quantifiable step toward "human-like conversational LMs."
Losses that Cook: Topological Optimal Transport for Structured Recipe Generation: This paper proposes a topological loss function based on Sinkhorn divergence that represents ingredient lists as point clouds in embedding space. By minimizing the geometric discrepancy between predicted and ground-truth ingredients, it significantly improves ingredient recall and quantity accuracy in structured recipe generation, being preferred in 62% of human evaluations.
Planning Beyond Text: Graph-based Reasoning for Complex Narrative Generation: This paper proposes the PLOTTER framework, which signifies the first shift of narrative planning from textual representations to graph structures (Event Graph + Character Graph). Through a multi-agent Evaluate-Plan-Revise iterative loop, narrative defects are diagnosed and repaired on graph topologies, significantly outperforming existing methods in dimensions such as narrativity, characterization, and dramatic tension.
Right at My Level: A Unified Multilingual Framework for Proficiency-Aware Text Simplification: The Re-RIGHT framework is proposed, utilizing a 4B policy model trained via GRPO with a three-module reward (lexical coverage + semantic preservation + coherence). It achieves precise text simplification across English, Japanese, Korean, and Chinese according to learner proficiency levels (CEFR/JLPT/TOPIK/HSK), outperforming large models like GPT-5.2 and Gemini 2.5.
SCURank: Ranking Multiple Candidate Summaries with Summary Content Units for Enhanced Summarization: Ours proposes SCURank, a ranking framework based on Summary Content Units (SCUs). It ranks candidate summaries by extracting SCUs, estimating information importance through cross-summary clustering, and scoring based on informativeness. This replaces unstable direct LLM ranking and coarse-grained ROUGE ranking. In multi-LLM distillation scenarios, combined with BRIO contrastive learning, it significantly improves the summarization performance of distilled models.
ThreadSumm: Summarization of Nested Discourse Threads Using Tree of Thoughts: This paper proposes ThreadSumm, a multi-stage LLM pipeline framework that models nested discourse thread summarization as a hierarchical reasoning problem. It first extracts aspects and Atomic Content Units (ACUs) for content planning, constructs thread-aware sequences through sentence ordering, and finally utilizes Tree of Thoughts (ToT) search to generate and score multiple paragraph candidates. The method outperforms baselines on Reddit and StackExchange datasets.
XtraGPT: Context-Aware and Controllable Academic Paper Revision via Human-AI Collaboration: This paper presents XtraGPT—the first open-source LLM suite (1.5B–14B) specifically for academic paper revision. By fine-tuning on 7,000 top-tier conference papers and 140,000 standard-guided instruction-revision pairs, it achieves context-aware paragraph-level controllable revisions. The 7B version matches GPT-4o-mini, while the 14B version outperforms GPT-4o-mini. Human evaluations show an average increase of 0.65 points in predicted paper scores after revisions.