Skip to content

RFKG-CoT: Relation-Driven Adaptive Hop-count Selection and Few-Shot Path Guidance for Knowledge-Aware QA

Conference: AAAI 2026 arXiv: 2512.15219 Code: N/A Area: Graph Learning Keywords: Knowledge Graph Question Answering, Chain-of-Thought, Relation-Aware, Adaptive Hop-count, Few-Shot Guidance

TL;DR

This paper proposes RFKG-CoT, which enhances LLM reasoning over knowledge graphs via two components: relation-driven adaptive hop-count selection (dynamically adjusting reasoning steps using KG relation activation masks) and few-shot path guidance (in-context examples in a Question-Paths-Answer format). Evaluated on 4 KGQA benchmarks, the method achieves significant improvements — GPT-4 reaches 91.5% (+6.6pp) on WebQSP, and Llama2-7B gains up to +14.7pp.

Background & Motivation

Background: Methods such as KG-CoT integrate knowledge graph paths into LLM reasoning to mitigate hallucination, but bottlenecks remain in both path selection and path utilization.

Limitations of Prior Work: - Rigid hop-count selection: Existing methods select hop counts based solely on question features, ignoring KG relational structure. For example, "Who is Justin Bieber's brother?" requires only 1 hop via a direct "brother" relation, but an indirect "father-son" chain requires multiple hops. - Insufficient path utilization: KG paths are concatenated directly into LLM prompts without guidance on how to interpret and use them.

Key Challenge: The quality of KG paths depends on hop-count selection, which should be jointly determined by both the question and the KG relational structure rather than fixed uniformly.

Key Insight: Use relation activation masks to capture KG relational semantics for dynamic hop-count selection; use few-shot "Think" templates to teach LLMs how to extract answers from paths.

Core Idea: Relation masks make hop-count selection structure-aware + few-shot path guidance teaches LLMs how to leverage retrieved paths.

Method

Overall Architecture

Initialize topic entities → compute relation scores via MLP → dynamically select hop counts using relation activation masks → generate KG paths → submit to LLM reasoning with few-shot guidance (Question-Paths-Think-Answer template).

Key Designs

  1. Relation-Driven Adaptive Hop-count Selector:

  2. Function: Dynamically selects the number of reasoning steps based on KG relation activation patterns rather than question features alone.

  3. Mechanism: Records which relations are activated at each reasoning step via a relation activation mask, and uses this to determine whether additional hops are needed. Selects 1 hop when a direct relation exists; automatically increases hop count for indirect chains.
  4. Design Motivation: The same question may require different hop counts under different KG topologies; relation masks capture this structural information.

  5. Few-Shot Path Guidance:

  6. Function: Uses structured in-context examples to teach LLMs how to interpret and utilize KG paths.

  7. Mechanism: Each example contains: a query, serialized KG paths (Entity→Relation→Entity), a symbolic "Think" template mapping path elements to answer constraints, and an explicit answer format. The optimal number of examples is \(E=3\).
  8. Design Motivation: LLMs receiving KG paths lack guidance on how to translate path information into reasoning steps; the "Think" template serves as a bridge.

Loss & Training

  • The graph reasoning module learns relation scores via an MLP, optimized on the training set.
  • LLM inference is performed in a zero-shot/few-shot manner without fine-tuning.

Key Experimental Results

Main Results

Dataset LLM RFKG-CoT KG-CoT Gain
WebQSP GPT-4 91.5% 84.9% +6.6pp
CompWebQ GPT-4 65.1% 62.3% +2.8pp
WebQuestions GPT-4 78.2% 68.0% +10.2pp
WebQSP ChatGPT 89.9% 82.1% +7.8pp
WebQSP Llama2-7B 87.1% 72.4% +14.7pp

Ablation Study

Component WebQSP (ChatGPT) CompWebQ Notes
KG-CoT baseline 82.1% 51.6% No improvement
+ Relation mask 85.5% 59.8% +3.4/+8.2pp
+ Few-shot guidance 87.7% 57.8% +5.6/+6.2pp
Full RFKG-CoT 89.9% 61.4% +7.8/+9.8pp

Key Findings

  • Complementary components: The relation mask improves path quality and selection, while few-shot guidance improves path utilization; their combination exceeds the sum of individual contributions.
  • Smaller models benefit more: Llama2-7B gains +14.7pp vs. +6.6pp for GPT-4, as smaller models have less parametric knowledge and rely more heavily on external paths.
  • Non-monotonic effect of few-shot count: \(E=3\) is optimal; \(E=5\) degrades performance, likely due to the cognitive load on the transformer.

Highlights & Insights

  • The relation activation mask is an elegant design — encoding KG topological information as a binary mask to guide hop-count decisions offers greater flexibility than question classifiers.
  • Inverse scaling finding: Smaller models benefit disproportionately more, suggesting that KG path guidance is most valuable when compensating for limited parametric knowledge.

Limitations & Future Work

  • The method has not been evaluated on state-of-the-art reasoning models (e.g., o1, DeepSeek-R1).
  • Learning of relation masks depends on the coverage of relation types in the training data.
  • The selection strategy for few-shot examples could be further optimized.
  • vs. KG-CoT: Improvements are made at both critical stages — hop-count selection and path utilization.
  • vs. ToG: ToG performs dynamic navigation over KGs but provides no path guidance; RFKG-CoT offers a more structured reasoning framework.

Rating

  • Novelty: ⭐⭐⭐⭐ The combination of relation masks and few-shot path guidance constitutes effective incremental innovation.
  • Experimental Thoroughness: ⭐⭐⭐⭐ Covers 4 datasets, 3 LLMs, detailed ablations, and hyperparameter analysis.
  • Writing Quality: ⭐⭐⭐⭐ Method motivation and design logic are clearly articulated.
  • Value: ⭐⭐⭐⭐ Offers practical improvements for KGQA; the substantial gains on smaller models have notable application value.