Skip to content

Probabilistic Concept Graph Reasoning for Multimodal Misinformation Detection

Conference: CVPR 2026 arXiv: 2603.25203 Code: https://github.com/2302Jerry/pcgr Area: Robotics Keywords: Multimodal Misinformation Detection, Concept Graph Reasoning, Probabilistic Inference, Explainable AI, Automatic Concept Growth

TL;DR

This paper reformulates Multimodal Misinformation Detection (MMD) as a structured probabilistic reasoning problem over concept graphs. The proposed PCGR framework employs MLLMs to automatically discover and validate human-interpretable concept nodes, constructs a hierarchical probabilistic concept graph, and achieves interpretable misinformation detection, outperforming 13 baselines across three benchmarks.

Background & Motivation

  1. Background: Multimodal misinformation (image-text fake news/rumors) is increasingly prevalent. Existing detection methods fall into two categories: (1) end-to-end black-box models that fuse visual and textual features for direct classification—performant but opaque; (2) mechanism-driven models based on manipulation types or retrieved evidence—more transparent but relying on fixed concept sets that struggle to adapt to novel manipulation strategies.
  2. Limitations of Prior Work: Black-box models cannot explain their decision process, undermining trustworthiness. Existing interpretable methods either depend on fixed, manually defined concept sets (poor generalization) or produce only post-hoc explanations that are decoupled from the actual inference process.
  3. Key Challenge: Human fact-checkers assess veracity through structured reasoning—decomposing claims, verifying each component, and synthesizing a final judgment—yet existing models lack such an auditable reasoning process.
  4. Goal: (a) How can the concept set expand automatically to accommodate novel manipulation strategies? (b) How can probabilistic reasoning be embedded into the model architecture rather than applied as post-processing? (c) How can the model support both coarse-grained (true/false) and fine-grained (manipulation type) detection simultaneously?
  5. Key Insight: Inspired by the human fact-checking process, MMD is modeled as a pipeline of concept-level assessment → hierarchical reasoning → aggregated verdict, with each concept represented by soft probabilities rather than hard decisions.
  6. Core Idea: Construct an automatically growing hierarchical probabilistic concept graph that embeds reasoning directly into the model architecture, making every intermediate concept state auditable.

Method

Overall Architecture

PCGR follows a build-then-infer paradigm: (1) Concept Growth—MLLMs automatically discover and validate new concepts to form a hierarchical directed acyclic graph (DAG); (2) Probabilistic Encoding—each image-text instance is encoded into the concept space, with an activation probability computed for each concept; (3) Hierarchical Inference—soft reasoning is performed over the concept graph via top-down hierarchical attention, aggregating uncertainty to produce the final verdict.

Key Designs

  1. Automatic Concept Growth (ACG):

    • Function: Continuously discovers and integrates new reasoning concepts to adapt to evolving manipulation strategies.
    • Mechanism: An "error log" of high-loss samples is maintained. Each round, representative seed pairs are selected via k-means clustering and fed to an MLLM (e.g., GPT-5/Qwen3-omni) acting as an "expert fact-checker." The MLLM analyzes why a sample is misleading, distills reusable diagnostic patterns, and generates concise interrogative concepts (e.g., "Does the text exaggerate the event?"). Candidate concepts undergo triple filtering: (1) semantic uniqueness (cosine similarity ≤ 0.8 with existing concepts); (2) statistical independence (Pearson correlation ≤ 0.9); (3) informative activation (expected probability in \([0.05, 0.95]\)). At most 5 new concepts are added per round over at most 6 rounds.
    • Design Motivation: A fixed concept set cannot handle continuously evolving misinformation tactics; the model must autonomously acquire new judgment dimensions.
  2. Probabilistic Concept Graph Construction and Soft Inference:

    • Function: Models inter-concept dependencies as structured probabilistic reasoning.
    • Mechanism: Image-text pairs reside at the base layer \(\mathcal{L}_0\); higher layers grow bottom-up. Edge construction integrates three signals: semantic dependency (cosine similarity), statistical dependency (soft PMI: \(\log \frac{\bar{p}_{ij}}{\bar{p}_i \bar{p}_j}\)), and logical dependency (entailment/contradiction scores from an NLI model), with edge weight \(s_{ij} = -\alpha\cos(h_i,h_j) + \beta \text{Soft-PMI} + \gamma r_{ij}^{ent} - \delta r_{ij}^{contr}\); edges are added only when \(s_{ij} > \zeta = 0.55\). Inference proceeds top-down, with high-level abstract hypotheses supplying priors for low-level details. Final aggregation uses a multiplicative form (approximating logical AND): \(\hat{p}_i = \lambda p_i \cdot (1-\lambda) \prod_{j \in Pa(i)} (\alpha_{ij} p_j)\).
    • Design Motivation: Misinformation verdicts require multiple consistency cues to hold simultaneously (logical AND semantics). Multiplicative aggregation is more faithful to this property than additive or voting schemes, and yields better calibration and robustness.
  3. Concept Probability Estimation and Bipolar Prototype Encoding:

    • Function: Produces soft probability estimates for each concept.
    • Mechanism: For each concept \(c_k\), CLIP extracts visual and textual embeddings \(v, t\), and Sentence-BERT extracts the concept description embedding \(d_i\). Each concept is represented by positive/negative bipolar prototypes \(h_i^+, h_i^-\) denoting its activated/deactivated states, combined as \(h_i = \tau_i h_i^+ + (1-\tau_i) h_i^-\). Probabilities are computed via low-rank interaction: \(\ell_k = h_k \oplus \mu_k U^\top \text{diag}(\phi(e_k)) V^\top \nu_k\), \(p_k = \text{Linear}(w_k \ell_k + b_k)\).
    • Design Motivation: Bipolar prototypes explicitly model the uncertainty that "absence of evidence is not evidence of absence," yielding more reliable probability estimates.

Loss & Training

The total loss is \(L = (1-\eta) L_{veracity} + \eta L_{ortho}\), where \(L_{veracity}\) is the binary cross-entropy detection loss and \(L_{ortho} = \sum_{i \neq j} \frac{q_i^\top q_j}{\|q_i\|^2 \|q_j\|^2}\) is a concept orthogonality regularizer. Training alternates between the concept generation module and the detection module. When fine-grained labels are available (e.g., text manipulation / visual manipulation / cross-modal inconsistency), they serve as anchor concepts at \(\mathcal{L}_0\) and receive additional supervision.

Key Experimental Results

Main Results (Coarse-Grained Detection)

Method MiRAGeNews Acc MiRAGeNews F1 MMFakeBench Acc MMFakeBench F1 AMG Acc AMG F1
GPT-5 56.8 54.0 58.8 57.2 59.9 57.9
MGCA (strongest baseline) 72.3 66.6 74.1 71.3 78.2 76.8
PCGR 80.2 70.9 80.6 73.5 84.3 79.8

Ablation Study (AMG Dataset)

Configuration Description Performance Drop
w/o acg Remove automatic concept growth Mic-F1 and Mac-F1 drop by ~12.9% and ~12.5% (largest degradation)
w/o dag Replace hierarchical DAG with flat structure Significant drop
w/o hat Replace hierarchical attention with standard attention Significant drop
w/o ma Replace multiplicative aggregation with voting Notable drop
w/o alt Remove alternating training Notable drop
w/o warm Remove warm-up phase Moderate drop
w/o cf Remove concept filtering Moderate drop

Key Findings

  • Surpassing GPT-5: PCGR substantially outperforms GPT-5 across all datasets (e.g., 80.2% vs. 56.8% on MiRAGeNews), demonstrating that a specialized detector with fewer parameters can exceed a general-purpose MLLM through explicit reasoning architecture.
  • OOD Robustness: PCGR remains stable on MiRAGeNews (whose test set contains unseen image generators and publishers), whereas most baselines suffer significant degradation.
  • ACG Contributes Most: Removing ACG causes the largest performance drop (~12.9%), confirming that continuously discovering new concepts is critical for adapting to novel manipulation strategies.
  • Fine-Grained Detection: PCGR achieves the best Mic-F1 in both 4-class (MMFakeBench) and 6-class (AMG) fine-grained detection tasks (68.6% and 75.6%, respectively), showing that the concept graph simultaneously supports coarse- and fine-grained tasks.

Highlights & Insights

  • Reasoning as Architecture: PCGR embeds the reasoning process directly into the model architecture rather than relying on external prompting or post-hoc explanations. This makes the reasoning process auditable and interventional—users can inspect the probability of each concept node to understand why the model reaches a particular verdict.
  • Elegant Design of Automatic Concept Growth: The pipeline of MLLM generation → triple filtering → validation enables continuous evolution of the concept set, avoiding the high cost of manual concept annotation while ensuring quality through filtering.
  • Rationale for Multiplicative Aggregation: Using a multiplicative form to approximate "logical AND" for aggregating concept probabilities is semantically well-motivated—misinformation verdicts require multiple independent cues to hold simultaneously, so any strong negative signal should "pull down" the final score.

Limitations & Future Work

  • Concept growth depends on the capabilities of the underlying MLLM (e.g., GPT-5); if the MLLM itself is insensitive to a novel manipulation strategy, it may fail to generate effective concepts.
  • Growth in the number of concepts may increase inference overhead, necessitating periodic pruning of inactive concepts.
  • Validation is limited to image-text pairs; temporal reasoning for video misinformation is not addressed.
  • The paper's classification under the robotics area appears inaccurate; it more appropriately belongs to the multimodal / trustworthy AI domain.
  • vs. Concept Bottleneck Models (CBMs): CBMs employ a fixed, flat concept space, limiting scalability to complex reasoning tasks. PCGR addresses both limitations through a hierarchical DAG and automatic concept growth.
  • vs. Graph-of-Thought (GoT): GoT realizes graph-structured reasoning in LLMs via prompting but relies on external prompts. PCGR directly embeds the probabilistic concept graph into model parameters, requiring no external prompting.
  • vs. HAMMER/MGCA: HAMMER and MGCA are the strongest existing MMD-specific models, yet still rely on end-to-end feature fusion. PCGR introduces an explicit concept layer that provides additional reasoning structure.

Rating

  • Novelty: ⭐⭐⭐⭐⭐ Reformulating MMD as probabilistic concept graph reasoning is a highly original framework design; the automatic concept growth mechanism is equally novel.
  • Experimental Thoroughness: ⭐⭐⭐⭐ Three datasets, comparison against 13 baselines, detailed ablation and case studies, though inference efficiency analysis is lacking.
  • Writing Quality: ⭐⭐⭐⭐ The framework is described clearly with high-quality figures, though the method section is notation-heavy.
  • Value: ⭐⭐⭐⭐ Practically valuable for trustworthy AI and misinformation detection; interpretability is a strong selling point.