C3TG: Conflict-aware, Composite, and Collaborative Controlled Text Generation¶

Conference: AAAI 2026 arXiv: 2511.09292 Code: None Area: LLM Efficiency / Controllable Text Generation Keywords: controlled text generation, multi-attribute control, KL divergence, energy function, conflict resolution

TL;DR¶

This paper proposes the C3TG framework, which achieves fine-grained multi-attribute controllable text generation through a two-stage approach: in the generation stage, weighted KL divergence is used to fuse attribute distributions and adjust token probabilities; in the optimization stage, an energy function (combining classifier scores and conflict penalty terms) drives iterative rewriting via a Feedback Agent. C3TG achieves 90.4% attribute accuracy across 17 attribute subcategories while substantially reducing toxicity.

Background & Motivation¶

Background: Controllable text generation (CTG) aims to govern attributes such as sentiment, style, tone, and topic in generated text. Existing methods fall into two categories: directly modulating the decoding distribution (PPLM, GeDi, COLD) and indirect control (prompting, fine-tuning).

Limitations of Prior Work: (1) Most methods can only control a single or a small set of simple attributes; (2) simultaneous multi-attribute control lacks conflict resolution mechanisms—enhancing one attribute may suppress or amplify another; (3) iterative feedback optimization pipelines are absent.

Key Challenge: Multiple attributes may exhibit conflicting or dependent relationships (e.g., "humorous" and "formal" are inherently in tension), making it infeasible for a single generation pass to satisfy all attribute targets simultaneously.

Goal: Achieve simultaneous fine-grained control over 17 attribute dimensions while handling inter-attribute conflicts.

Key Insight: A collaborative paradigm in which a large model handles generation and small models handle evaluation—the LLM generates text, BERT classifiers assess attribute alignment, and a Feedback Agent drives iterative rewriting.

Core Idea: During generation, tokens are sampled via geometrically weighted averaging of attribute priors; during optimization, an energy function combining classifier scores and dimensional stability penalties drives iterative refinement through a three-stage Chain-of-Prompt procedure.

Method¶

Overall Architecture¶

Two stages: (1) Generation Phase: token distributions are extracted from a base Llama2 model and \(n\) attribute-specific models, and tokens are sampled using the closed-form solution to weighted KL divergence minimization, \(P^*(x|x_{1:t-1}) = \prod_i Q_i^{\lambda_i/\Lambda} / Z\); (2) Optimization Phase: BERT classifiers evaluate alignment across 17 attribute dimensions, and the energy function \(E(x) = \sum \alpha_i|C_{A_i}(x) - T_i| + \sum \beta_j|C_{A_j}(x) - C_{A_j}(x_{prev})|\) drives three-stage iterative rewriting.

Key Designs¶

Weighted KL-Divergence Fusion (Generation Phase)
- Function: Fuses token distributions from multiple attribute models into a unified sampling distribution.
- Mechanism: Minimizes the weighted KL divergence \(\mathcal{J}[P] = \sum_i \lambda_i D_{KL}(P \| Q_i)\); the closed-form solution is the geometrically weighted average of attribute priors. User-specified \(\lambda_i\) controls the influence of each attribute.
- Design Motivation: Geometric averaging offers stronger theoretical guarantees than linear mixing of probabilities and naturally interpolates among multiple distributions.
- Implementation details: Each attribute model is an independently fine-tuned Llama2-7B trained on attribute-annotated corpora. At inference time, the weighted geometric mean of each model's logits is computed on-the-fly, normalized, and used for sampling.
Energy Function with Conflict Penalties (Optimization Phase)
- Function: Quantifies the deviation between the generated text and attribute targets, while penalizing interference with non-optimized attributes.
- Mechanism: \(E(x) = \underbrace{\sum_i \alpha_i|C_{A_i}(x) - T_i|}_{\text{alignment term}} + \underbrace{\sum_j \beta_j|C_{A_j}(x) - C_{A_j}(x_{prev})|}_{\text{stability penalty}}\). The first term measures attribute deviation; the second prevents optimization of one attribute from disrupting others that have already been satisfied.
- Design Motivation: The "whack-a-mole" problem is the central challenge in multi-attribute optimization; the stability penalty explicitly constrains variation in non-target dimensions.
Three-Stage Chain-of-Prompt Refinement (Iterative Rewriting)
- Function: Progressively improves attribute alignment through a Feedback Agent–driven three-stage prompt chain.
- Mechanism: Stage 1 — core attribute calibration (prioritizing correction of the most misaligned dimensions) → Stage 2 — attribute balance adjustment (fine-tuning dimensions disturbed by Stage 1) → Stage 3 — global fine-tuning (pushing all attributes toward their targets until \(E(x) \leq \tau\)).
- Design Motivation: A single rewriting pass cannot resolve all attribute conflicts simultaneously; the coarse-to-fine staged approach enables progressive convergence.

Attribute Coverage¶

17 subcategories spanning 4 major classes: emotion (joy, sadness, love, anger, fear, surprise), style (formal, humor, poetic, sarcasm, academic), tone (professional, casual, persuasive), and topic (courage, nature, technology).

Key Experimental Results¶

Main Results (ROCStories + WritingPrompts)¶

Method	ROC Acc↑	ROC PPL↓	ROC Dist-3↑	WP Acc↑	Toxic↓
COLD	24.4	21.07	0.22	20.5	0.53
BOLT	36.5	17.33	0.38	32.1	0.76
PPLM	32.4	15.04	0.39	29.7	0.39
Model Arithmetic	87.5	11.08	0.81	84.2	0.16
LLM Prompt	89.5	5.37	0.89	80.0	0.29
C3TG	90.4	4.04	0.90	85.6	0.12

C3TG achieves comprehensive superiority in attribute accuracy, fluency (PPL), diversity (Dist), and toxicity.

Ablation Study¶

Configuration	Acc↑	Notes
Generation Phase only	~85%	No iterative optimization
+ Optimization (w/o conflict penalty)	~87%	Iterative but without protection of non-target attributes
+ Full C3TG (with conflict penalty)	90.4%	Complete method

Key Findings¶

The generation phase alone achieves reasonable performance (~85%), with the optimization phase contributing an additional ~5%.
The conflict penalty has the greatest impact in conflicting-attribute scenarios: when simultaneously targeting "joy + formal," the formal dimension is severely disrupted without the penalty.
Toxicity is reduced to 0.12 (vs. 0.29 for LLM Prompt), indicating that attribute control yields safer text as a by-product.
Convergence is typically achieved within 2–3 iterations (\(E(x) \leq \tau = 0.025\)).
In human evaluation, C3TG leads all baselines in naturalness (4.2/5) and attribute consistency (4.5/5).
In conflicting-attribute pair experiments (e.g., joy + formal), C3TG achieves simultaneous satisfaction of both attributes at 82%, compared to only 61% for Model Arithmetic.

Highlights & Insights¶

The "large model for generation, small model for evaluation" collaborative paradigm is highly efficient: lightweight BERT classifiers provide real-time attribute feedback without requiring modification of LLM parameters.
The dimensional stability penalty addresses the core challenge of multi-attribute optimization by explicitly protecting non-target dimensions—analogous to constraint preservation in constrained optimization.
Fine-grained control over 17 subcategories represents a substantial advance over prior work that controls only coarse-grained attributes such as positive/negative sentiment or toxic/non-toxic content.

Limitations & Future Work¶

Training independent Llama2 attribute models and BERT classifiers for each attribute incurs significant upfront preparation costs.
Iterative optimization requires multiple LLM inference passes, increasing latency and cost proportionally with the number of iterations.
The selection and categorization of the 17 attribute subcategories involve a degree of subjectivity; extending to new attributes requires additional training.
The framework is evaluated only on Llama2 and has not been validated on more recent models (Llama3, GPT series).
The penalty coefficients \(\beta_j\) are determined empirically based on inter-attribute correlations and may not generalize broadly.

vs. PPLM: Applies gradient perturbations only to hidden states for single-attribute control; C3TG supports multi-attribute control with iterative optimization.
vs. Model Arithmetic: Also performs multi-attribute fusion via model addition/subtraction; accuracy is comparable but PPL is substantially higher (11.08 vs. 4.04), indicating inferior generation quality.
vs. LLM Prompt: Simple instruction-based prompting achieves 89.5% accuracy but exhibits higher toxicity (0.29 vs. 0.12) and lacks a conflict resolution mechanism.
Insight: The large-model-generation + small-model-evaluation collaborative paradigm is generalizable to other controlled generation settings such as style transfer and dialogue systems.

Rating¶

Novelty: ⭐⭐⭐⭐ A complete and well-designed conflict-aware multi-attribute control framework.
Experimental Thoroughness: ⭐⭐⭐⭐ Compared against 10+ baselines with both automatic and human evaluation, and comprehensive ablation studies.
Writing Quality: ⭐⭐⭐⭐ Rigorous mathematical derivations and clear pipeline diagrams.
Value: ⭐⭐⭐⭐ A practical framework for multi-attribute controllable generation; the conflict resolution approach is transferable to related tasks.