ACL 2025 Graph Learning Knowledge Graph Question Answering Retrieval-Augmented Generation Graph Semantic Distance Subgraph Isomorphism Retrieval Fact Verification

SimGRAG: Leveraging Similar Subgraphs for Knowledge Graphs Driven Retrieval-Augmented Generation¶

Conference: ACL 2025
arXiv: 2412.15272
Code: https://github.com/YZ-Cai/SimGRAG
Area: Graph Learning
Keywords: Knowledge Graph Question Answering, Retrieval-Augmented Generation, Graph Semantic Distance, Subgraph Isomorphism Retrieval, Fact Verification

TL;DR¶

This paper proposes SimGRAG, an approach that employs a two-stage "query → pattern graph → subgraph" alignment strategy. By leveraging LLMs to convert queries into graph patterns and utilizing Graph Semantic Distance (GSD) to efficiently retrieve the most semantically similar subgraphs in KGs, SimGRAG achieves plug-and-play KG-driven RAG, outperforming all existing methods on question-answering and fact verification tasks.

Background & Motivation¶

Background: KG-driven RAG is an important direction for addressing LLM hallucinations and out-of-date knowledge. The core challenge lies in effectively aligning natural language queries with the structured knowledge of knowledge graphs.

Limitations of Prior Work: - KAPING: Directly retrieves isolated triples using query embeddings; for multi-hop queries, query embeddings capture too much information, leading to inaccurate retrieval. - G-Retriever: First retrieves similar entities/relations and then constructs connected subgraphs, which fails to guarantee the conciseness of the retrieved subgraphs. - KG-GPT: Requires LLMs to determine matching relations for each query, which suffers from poor scalability as the number of candidate relations increases. - KELP: Trains a path selection model, which is limited to 1-2 hop paths and lacks plug-and-play capability.

Key Challenge: Query text is unstructured whereas KG knowledge is structured; the key challenge is to establish both accurate and efficient alignment between the two.

Goal: Design a training-free, plug-and-play RAG method that can scale to tens of millions of nodes in KGs.

Key Insight: A two-step decomposition—first let the LLM transform the query into a small "pattern graph" (a structured intermediate representation), and then use graph isomorphism combined with semantic distance to find the best-matching subgraph in the KG.

Core Idea: Use LLMs to convert queries into pattern graphs, then retrieve isomorphic subgraphs in the KG via Graph Semantic Distance, unifying structural and semantic matching.

Method¶

Overall Architecture¶

SimGRAG operates in three steps: 1. Query-to-Pattern: The LLM generates a pattern graph \(\mathcal{P}\) semantically aligned with the query. 2. Pattern-to-Subgraph: Graph Semantic Distance (GSD) is used to retrieve the top-k subgraphs in the KG that best match \(\mathcal{P}\). 3. Verbalized Subgraph-Augmented Generation: The retrieved subgraphs are verbalized and fed into the LLM to generate the final answer.

Key Designs¶

Query-to-Pattern Alignment:
- A 12-shot prompt guides the LLM (e.g., Llama 3 70B) to convert queries into a set of triples \(\{(h_1, r_1, t_1), (h_2, r_2, t_2), \ldots\}\).
- The LLM first decomposes the query phrases and then generates the triples.
- Supports unknown entities (represented, for instance, as "UNKNOWN director 1"), which are skipped during GSD calculation.
- On MetaQA 3-hop and FactKG, the pattern-generation accuracy of Llama 3 70B reaches 98% and 93%, respectively.
Graph Semantic Distance (GSD):
- Defined over an isomorphic mapping \(f: V_\mathcal{P} \rightarrow V_\mathcal{S}\).
- The Nomic embedding model is used to generate 768-dimensional semantic vectors: \(z_v = EM(v)\), \(z_r = EM(r)\).
- GSD is defined as the sum of L2 distances between node pairs and relation pairs: \(GSD(\mathcal{P}, \mathcal{S}) = \sum_{\text{node } v \in \mathcal{P}} \|z_v - z_{f(v)}\|_2 + \sum_{\text{edge } \langle u,v \rangle \in \mathcal{P}} \|z_{r_{\langle u,v \rangle}} - z_{r_{\langle f(u), f(v) \rangle}}\|_2\)
- For unknown entities/relations, the corresponding terms are excluded from GSD.
Optimized Retrieval Algorithm:
- Based on the filtering-ordering-enumerating paradigm.
- First applies vector search to obtain candidate entity sets \(C^{(n)}[v_\mathcal{P}]\) and relation sets \(C^{(r)}[r_\mathcal{P}]\) for each pattern node and edge.
- DFS traversal is then used to incrementally expand the isomorphic mapping.
- Pruning Optimization: Computes a GSD lower bound \(B = \Delta_{mapped}^{(n)} + \Delta_{mapped}^{(r)} + X + Y\). Pruning is performed when \(B\) exceeds the maximum GSD in the current top-k list.
- Greedy Strategy: Expand candidate nodes with smaller distances first to tighten the lower bound as early as possible.
- The average retrieval time on the DBpedia dataset (with tens of millions of entities) is < 1 second.
Subgraph Verbalized Generation:
- Serializes the top-k subgraphs into a set of triples, which are appended to the prompt.
- A 12-shot in-context learning setup guides the LLM to answer based on the subgraph evidence.

Loss & Training¶

No training required! This is a purely inference-time method, leveraging the plug-and-play capabilities of pre-trained LLMs and embedding models.
Only requires manually constructing a few-shot prompt with a small number of examples (usually 12) for each KG.

Key Experimental Results¶

Main Results¶

Method	MetaQA 1-hop	MetaQA 2-hop	MetaQA 3-hop	PQ 2-hop	PQ 3-hop	WC2014	FactKG
ChatGPT (no KG)	60.0	23.0	38.7	-	-	-	68.5
KAPING	90.8	71.2	43.0	41.0	52.1	88.1	75.5
KG-GPT†	93.6	93.6	88.2	86.1	42.5	71.1	69.5
KELP†	94.7	96.0	-	-	-	-	73.3
G-Retriever†	98.5	87.6	54.9	61.8	46.7	67.5	61.4
SimGRAG	98.0	98.4	97.8	88.7	78.6	98.1	86.8

(† indicates that oracle entities were provided as an upper bound)

Ablation Study¶

Variable	MetaQA 1-hop	MetaQA 3-hop	FactKG
4-shot	98.6	92.8	84.0
8-shot	98.3	98.8	87.9
12-shot	98.0	97.8	86.8
k=1	95.2	97.0	88.1
k=2	98.0	97.6	87.6
k=3	98.0	97.8	86.8
Llama3-70B	98.0	97.8	86.8
Phi4-14B	92.7	90.8	86.1
Qwen2.5-72B	98.6	98.2	83.6

Retrieval Efficiency¶

Dataset	Vector Search (s)	Optimized Retrieval (s)	Total (s)
MetaQA 1-hop	0.02	0.0006	0.02
MetaQA 3-hop	0.02	0.002	0.02
FactKG (Ten-million scale)	0.59	0.15	0.74

Key Findings¶

Without requiring oracle entities, SimGRAG outperforms all methods that require training or oracle entities.
The advantage is even more pronounced for multi-hop queries: MetaQA 3-hop accuracy improves from 43.0 (KAPING) to 97.8, and FactKG accuracy increases from 75.5 to 86.8.
Even when using the smaller 14B Phi-4 model, SimGRAG remains competitive (achieving 86.1 on FactKG vs. the highest baseline performance of 75.5).
The optimized retrieval algorithm achieves sub-second retrieval (< 1s) on a ten-million-scale KG.
Error analysis: The primary errors on MetaQA stem from the query-to-pattern and augmented generation steps, while pattern-to-subgraph also accounts for 24% of the errors on FactKG.

Highlights & Insights¶

Two-stage Alignment Paradigm: The query \(\rightarrow\) pattern \(\rightarrow\) subgraph framework decomposes the complex alignment of unstructured to structured data into two simpler sub-problems, each of which can be optimized independently.
Elegant GSD Metric: Graph isomorphism guarantees structural matching, while semantic distance ensures content matching. Jointly, they effectively eliminate noise; even if a specific entity is ranked very low (e.g., 112th), it can still be successfully retrieved by combining it with the relations/other entities in the structure.
Truly Plug-and-Play: Requires no training or fine-tuning. With just 12 examples, it can be seamlessly deployed on a new KG.
Efficient Pruning Strategy: Utilizing the lower bound of GSD to prune the search tree keeps the retrieval time on a ten-million-scale KG under 1 second.

Limitations & Future Work¶

Strongly relies on the instruction-following capabilities of LLMs; weaker LLMs lead to degraded pattern graph generation quality.
Assumes that the KG schema aligns with human cognition; additional fine-tuning might be required for domain-specific (e.g., industrial, medical) proprietary KGs.
Embedding models may perform poorly for entity linking on domain-specific KGs.
Few-shot examples are sensitive to complex query types (there is a significant difference between 4-shot and 8-shot on MetaQA 3-hop).
The complementary or conflicting relationship with internal LLM knowledge remains unexplored.

KAPING (Baek et al., 2023): Trivia retrieval based on embedding similarity; simple but shows poor multi-hop performance.
G-Retriever (He et al., 2024): Employs GNNs to encode subgraphs for prompt tuning; requires training.
KG-GPT (Kim et al., 2023): Requires LLMs to select relationships, making it unscalable for large-scale KGs.
Combining graph isomorphism search with semantic embedding can be generalized to other graph retrieval scenarios (such as molecular graph search and code graph search).
The concept of a pattern graph can be viewed as a structured chain-of-thought, which can inspire research directions in "structured reasoning intermediate representations".

Rating¶

Novelty: ⭐⭐⭐⭐⭐ (Two-stage alignment + GSD metric + optimized retrieval algorithm; the methodological pipeline is complete and novel)
Experimental Thoroughness: ⭐⭐⭐⭐⭐ (Covering 6 datasets, comparing multiple baselines, and presenting comprehensive ablation studies, efficiency analyses, and error analyses)
Writing Quality: ⭐⭐⭐⭐⭐ (Clear problem definition, intuitive diagrams, and complete algorithm pseudocode)
Value: ⭐⭐⭐⭐⭐ (Highly practical: plug-and-play, scales to ten-million-level KGs, and requires no training)