Skip to content

A Structured Clustering Approach for Inducing Media Narratives

Conference: ACL 2026
arXiv: 2604.10368
Code: Available (mentioned in the paper)
Area: Interpretability
Keywords: media narratives, structured clustering, causal event chains, role analysis, framing theory

TL;DR

This paper proposes a framework for automatically inducing media narrative patterns from large-scale news corpora. By jointly modeling causal event chains and character roles (hero/villain/victim), the framework employs a role-constrained clustering algorithm to organize narrative chains into semantically coherent narrative patterns. The approach generates interpretable narrative patterns consistent with framing theory in two domains: immigration and gun control.

Background & Motivation

Background: Media narratives exert substantial influence on public opinion. NLP research on media analysis has accumulated considerable work, falling broadly into two categories: (1) coarse-grained label approaches (e.g., left/right stance, political/economic/security topic frames), which are scalable but lose the nuance of narrative structure; and (2) domain-specific taxonomies (e.g., dedicated labels for immigration or economic issues), which capture nuance but lack cross-domain generalizability.

Limitations of Prior Work: Coarse-grained approaches overlook the fine-grained narrative structures emphasized in communication research—namely, how character assignments and causal framing guide readers toward particular conclusions. Domain-specific approaches require extensive manual annotation and cannot scale to new domains. This divide limits consistent narrative analysis across domains.

Key Challenge: A fundamental tension exists between scalability and interpretive depth—one must either sacrifice narrative detail for scale, or sacrifice scale for depth.

Goal: To design a narrative induction framework that preserves narrative structural depth (causal event relations + character functional roles) while scaling to large corpora without relying on domain-specific taxonomies.

Key Insight: Drawing on the Narrative Policy Framework from communication studies, the paper treats character roles (hero/villain/victim) as key structural elements of narrative analysis, using role constraints to distinguish event chains that appear superficially similar but carry different narrative meanings.

Core Idea: Atomic narrative representations are constructed from causal event chains with role annotations; role-constrained clustering (cannot-link constraints) then automatically induces high-level narrative patterns across articles.

Method

Overall Architecture

A multi-stage pipeline: (1) extract events (predicate–object tuples) from news articles → (2) identify causal relations between events → (3) verbalize causal event chains in natural language → (4) extract characters and their narrative functions (hero/villain/victim) → (5) perform structured clustering with role constraints → (6) use an LLM to generate narrative pattern descriptions for each cluster.

Key Designs

  1. Causal Narrative Chain Construction:

    • Function: Extract causally related event sequences from news articles as atomic narrative units.
    • Mechanism: Dependency parsing is first applied to extract (verb, object) event tuples; causal relations between event pairs are then identified using the DAPrompt method. To reduce cost, Llama 3.3 70B generates silver labels for 20K event pairs, which are distilled into a lightweight DAPrompt model. DeepSeek-R1 subsequently verbalizes causal triples into coherent natural language sentences.
    • Design Motivation: Causal chains are more informative than plain event sequences—they reveal the "because X, therefore Y" logic that is central to manipulative media narratives.
  2. Character and Narrative Function Annotation:

    • Function: Identify characters within narrative chains and their functional roles in the story.
    • Mechanism: Llama 3.3 70B extracts character mentions from articles in a 5-shot setting; k-means then clusters mentions into character groups (e.g., "immigrants," "law enforcement"). DeepSeek-R1 subsequently annotates each character with a narrative function (hero/villain/victim) and an overall stance (pro/con) in a zero-shot setting.
    • Design Motivation: The functional assignment of characters within the same event chain determines the ideological direction of a narrative. For example, "immigrants = victim + law enforcement = villain" versus "immigrants = villain + law enforcement = hero" involve the same entities yet convey diametrically opposite messages.
  3. Role-Constrained Structured Clustering:

    • Function: Distinguish event chains that are semantically similar but narratively distinct.
    • Mechanism: Cannot-link constraints are generated for chain pairs whose role-function configurations conflict (i.e., the same character group is assigned different functions across two chains). The k-means objective is modified to \(\mathcal{R} = \frac{1}{2}\sum \|x_i - \mu_{l_i}\|^2 + \sum w_c \mathbf{1}[l_i = l_j]\), adding a penalty for constraint violations to the distance term. Initialization is likewise adapted to a constraint-aware k-means++ variant.
    • Design Motivation: Pure text-similarity clustering would group "immigrants as victims" with "immigrants as threats" (since both concern immigration), whereas role constraints force them into separate clusters.

Narrative Pattern Attribution

After clustering, narrative chains and role information are sampled from each cluster, and DeepSeek-R1 generates narrative pattern descriptions encompassing the three elements of Entman's framing theory (issue definition, evaluation, and remedy).

Key Experimental Results

Main Results (Structured Clustering vs. Standard k-means)

Domain Method Frame F1 Exact Match Purity Avg. Role Purity
Immigration k-means 41.19 26.90 80.79
Immigration Structured Clustering 42.32 32.79 81.48
Gun Control k-means 37.65 29.22 81.18
Gun Control Structured Clustering 41.68 36.66 82.90

Ablation Study (Top 25% Chains Nearest to Centroids)

Domain Method Frame F1 Exact Match Purity
Immigration k-means 33.22 32.31
Immigration Structured 36.96 37.83
Gun Control k-means 32.86 35.16
Gun Control Structured 36.45 42.71

Key Findings

  • Structured clustering consistently outperforms standard k-means on all metrics, with the most pronounced gains in Exact Match Purity (gun control domain: +7.44 pp).
  • Role constraints are critical for distinguishing fine-grained narrative differences: in the gun control domain, "police as heroes protecting the public" versus "police as threats violating rights" are successfully separated.
  • Verbalized causal chains achieve high quality: 3.3/4 for immigration and 3.49/4 for gun control.
  • Role annotation accuracy is excellent: 4.73/5 for gun control and 4.0/5 for immigration.
  • Chains near cluster centroids exhibit higher quality (higher purity metrics in the top 25%), confirming that the clustering recovers meaningful core structure.

Highlights & Insights

  • Using character narrative functions (hero/villain/victim) as clustering constraints constitutes an elegant bridge between communication theory and computational methods, ensuring that clustering results reflect narrative structural coherence rather than mere topical similarity.
  • The pipeline design skillfully balances cost and quality: large models generate silver labels that are distilled into lightweight models for inference, sacrificing only 15% performance on causal relation prediction while substantially reducing cost.
  • Domain agnosticism is a key advantage: only minimal role group annotation at the cluster level is required, enabling extension to new domains without domain-specific taxonomies.

Limitations & Future Work

  • Evaluation is limited to two English policy domains (immigration and gun control); cross-lingual and broader domain generalization remains to be verified.
  • Causal relation prediction achieves an F1 of only 58.46, and errors may cascade to downstream clustering.
  • Role group identification still requires limited human annotation at the cluster level; fully unsupervised role discovery is a natural direction for improvement.
  • Temporal dynamics are not considered—how narrative patterns evolve over time (e.g., shifts before and after elections) remains unexplored.
  • The selection of the number of clusters \(k\) remains an open problem.
  • vs. Chambers & Jurafsky (2008): Classic narrative schema induction focuses solely on event sequences without considering the evaluative dimension of character roles. The present work incorporates narrative function annotation, more closely aligning with communication studies' understanding of narrative.
  • vs. Frame detection methods (Card et al., 2015): Frame detection relies on coarse-grained labels; this paper obtains finer-grained and more interpretable narrative patterns through structured clustering.
  • vs. LLM-based analysis: Direct LLM-based narrative analysis offers an end-to-end solution but lacks interpretability and structure. Each stage of the proposed pipeline is independently verifiable and interpretable.

Rating

  • Novelty: ⭐⭐⭐⭐ First to use character narrative functions as clustering constraints, with solid theoretical motivation.
  • Experimental Thoroughness: ⭐⭐⭐⭐ Includes human evaluation and multiple metrics, though limited to two domains.
  • Writing Quality: ⭐⭐⭐⭐⭐ Motivation is clear, figures are intuitive, and the methodological pipeline is comprehensively described.
  • Value: ⭐⭐⭐⭐ Of practical value to computational communication studies and media analysis, with transferable methodology.