Synergizing LLMs with Global Label Propagation for Multimodal Fake News Detection¶

Conference: ACL 2025
arXiv: 2506.00488
Code: https://github.com/TSCenter/GLPN-LLM
Area: Social Computing
Keywords: Fake News Detection, Label Propagation, LLM Pseudo Labels, Multimodal Learning, Graph Neural Networks

TL;DR¶

This paper proposes the GLPN-LLM framework, which effectively integrates LLM-generated pseudo labels via a mask-based global label propagation mechanism. It addresses the performance bottleneck of directly combining LLM predictions, comprehensively outperforming SOTA models on Twitter, PHEME, and Weibo datasets.

Background & Motivation¶

Multimodal Fake News Detection¶

The spread of fake news on social media has become a severe societal issue, requiring multimodal detection methods that analyze both textual and visual features.

Dilemma of LLMs in Fake News Detection¶

A key observation: LLMs (such as GPT-4o) alone underperform traditional multimodal detection models in fake news detection. For instance, on the Twitter dataset, the F1-score of the LLM is only 78.20, whereas HMCAN achieves 82.57. Simply combining raw LLM predictions with existing model outputs also yields limited improvements.

Core Problem¶

How can the capabilities of LLMs be effectively integrated into fake news detection systems? Simple direct combination is insufficient, prompting the need for more sophisticated approaches.

Key Insight¶

Label propagation techniques remain effective even when the pseudo-label accuracy is mediocre (Sun et al., 2025), making them particularly suitable for integrating the imperfect pseudo labels generated by LLMs.

Method¶

Overall Architecture¶

GLPN-LLM consists of three core modules:

Multimodal Feature Extraction: CLIP is used to extract textual and visual features.
LLM-based Pseudo Label Generation: GPT-4o generates pseudo labels.
Global Label Propagation with Mask: Mask-based global label propagation is performed.

Key Designs¶

1. Multimodal Feature Extraction¶

The dual encoders of CLIP are used to extract visual features $v_i \in \mathbb{R}^{d_v}$ and textual features $t_i \in \mathbb{R}^{d_t}$ respectively, which are then concatenated to obtain a unified representation: $$x_i = t_i \oplus v_i$$

Each news item acts as a node in the graph, with edges constructed based on five similarity metrics: - Concatenated Feature Similarity: Cosine similarity of the concatenated text-image embeddings. - Image-Text Cross Similarity (bidirectional) - Image-Image Similarity - Text-Text Similarity

An edge is established when any of the similarity scores exceeds the threshold $\theta = 0.95$, ensuring that only strongly related news items are connected.

3. Mixed-Initiative Labeling (LLM Pseudo Label Generation)¶

A structured prompt is constructed and fed into the LLM: - Input: [cls] <prompt> [SEP] <cleaned Twitter text> - Output: [detection] ŷ [confidence] c

The LLM outputs two pieces of information: 1. Detection Label ŷ: true or fake 2. Confidence Score c: Prediction confidence

Confidence-based filtering: Only high-confidence pseudo labels are selected for use.

4. Label Integration and Global Random Mask (GRM)¶

Label Integration: Label information (ground-truth labels / high-confidence pseudo labels / zero vectors) is concatenated with the node features as a one-hot encoding:

\[x_i' = x_i \oplus y_i'\]

Three cases are considered: - Labeled training nodes: Ground-truth labels are used. - Unlabeled nodes with high confidence: LLM pseudo labels are used. - Others: Zero vectors are used.

Global Random Mask (GRM) Mechanism (Core Innovation):

During training, $\rho \times N$ nodes are randomly selected according to a mask ratio $\rho$ (default is 0.3), and their label embeddings are replaced with zero vectors: $$y_i' = \tilde{y}_i \cdot m_i, \quad m_i \in \{0, 1\}$$

Why is GRM necessary? To prevent label leakage. If a node's label is included in its input feature, the model might directly exploit this information for prediction without truly learning the graph structure and content features. GRM ensures that masked nodes can only obtain label information via label propagation from their neighbors.

During training, loss is formulated solely on the masked nodes; during inference, all available label information is utilized.

5. GCN Classification¶

The node features containing label information $x_i'$ are fed into a GCN for label propagation and classification, using cross-entropy loss and the Adam optimizer.

Loss & Training¶

Cross-entropy loss is applied, calculated only on masked nodes.
A different subset of masked nodes is randomly selected in each epoch.
Adam optimizer is utilized.
No masking is applied during inference, utilizing full label information.

Key Experimental Results¶

Main Results¶

F1-score results on three benchmark datasets:

Method	Twitter F1	PHEME F1	Weibo F1
LLM (GPT-4o)	78.20	76.87	81.75
HMCAN	82.57	83.49	87.20
FCN-LP (CLIP)	85.24	87.97	89.78
FCN-LP (CLIP) + LLM	85.97	89.21	89.85
GLPN-LLM (CLIP)	89.03	90.66	91.52

GLPN-LLM (HMCAN) results:

Method	Twitter F1	PHEME F1	Weibo F1
FCN-LP (HMCAN)	84.04	84.50	88.11
FCN-LP (HMCAN) + LLM	85.10	84.60	88.75
GLPN-LLM (HMCAN)	86.86	86.87	91.46

Ablation Study (taking CLIP as an example):

Method	Twitter F1	PHEME F1	Weibo F1
FCN-LP	85.24	87.97	89.78
GLPN (w/o LLM)	86.30	86.96	90.76
GLPN-LLM	89.03	90.66	91.52

Key Findings¶

LLMs alone perform significantly worse than traditional methods: GPT-4o's F1 score on Twitter is only 78.20, whereas HMCAN reaches 82.57.
Simple combinations with LLMs yield limited improvements: FCN-LP + LLM improves by only about 0.5-1.2% over FCN-LP.
GLPN-LLM achieves substantial improvements: Compared to FCN-LP + LLM, it achieves gains of 3.06% on Twitter, 1.45% on PHEME, and 1.67% on Weibo.
The GRM module is key: GLPN (with GRM but without LLM) already improves over FCN-LP, demonstrating that the global label propagation mechanism itself is effective.
LLM pseudo labels generate value through propagation: GLPN-LLM improves upon GLPN by 2-4%, indicating that LLM pseudo labels are effectively utilized under the propagation mechanism.
Impact of mask rate: Label propagation provides information via neighbor labels rather than the node's own label; a mask rate that is too low leads to label leakage, while one that is too high leads to insufficient information.

Highlights & Insights¶

Precise Problem Identification: The study clearly reveals the dilemma where LLMs are "helpful but difficult to utilize directly" in fake news detection, and proposes an elegant solution.
Clever Application of Label Propagation: The robustness of LP against noisy labels perfectly matches the characteristics of LLM pseudo labels.
Clear Design Intuition of GRM: Analogous to the masked language model concept—masking the labels forces the model to learn from the context (neighbors).
Multi-Similarity Fusion in Cross-Modal Graph Construction: Five similarity metrics ensure the capture of rich cross-modal relationships.
Experimental Validation of an Important Intuition: The value of LLMs lies not in the accuracy of single-point predictions, but in amplifying their overall judgment capability through the propagation mechanism.

Limitations & Future Work¶

Use of Textual Information Only by the LLM: The current GPT-4o prompt contains only cleaned tweet text and does not utilize image information (this could be adapted to use GPT-4V for multimodal analysis).
High Graph Construction Threshold ($\theta = 0.95$): This may potentially omit some valuable weak correlations.
Validation Limited to English and Chinese Datasets: Performance in other linguistic contexts remains unexplored.
LLM API Costs: Calling GPT-4o for each sample incurs high costs for large-scale deployment.
Simplistic GCN Layers and Architecture: More advanced GNN architectures (e.g., GAT, GraphSAGE) could be explored.
Unexplored Robustness against Adversarial Examples: The detection capability against elaborately designed adversarial fake news remains unknown.

The Revival of Label Propagation: Traditional semi-supervised methods (Zhu & Ghahramani, 2002) are revitalized in the GNN era, opening a new direction when combined with LLM pseudo labels.
FCN-LP (Zhao et al., 2023): As direct prior work, the global mask mechanism proposed in this paper represents a significant improvement over it.
Indirect Utilization Paradigm of LLM Capabilities: Instead of using LLMs directly for prediction, their outputs are integrated as auxiliary signals into traditional models—a paradigm that can be generalized to other tasks.
Comparison with the LACA Approach: While LACA uses LLMs for data generation, GLPN-LLM uses LLMs to generate pseudo labels and exploits them through graph propagation. Both tackle the issue of how to utilize LLMs when their standalone performance is suboptimal.

Rating¶

Dimension	Score (1-10)	Explanation
Novelty	7	The GRM mechanism is novel, but the overall framework is a combination of existing components.
Experimental Thoroughness	8	Three datasets, comprehensive ablation, and parameter sensitivity analyses.
Writing Quality	7	Clearly structured but some formula derivations are relatively cumbersome.
Value	7	Relies heavily on the LLM API, resulting in high deployment costs.
Overall Score	7	A practical framework for effectively leveraging LLM pseudo labels, with the core contribution in the GRM mechanism.