NeuroPath: Neurobiology-Inspired Path Tracking and Reflection for Semantically Coherent Retrieval¶

Conference: NeurIPS 2025 arXiv: 2511.14096 Code: GitHub Area: Video Understanding / RAG Keywords: RAG, multi-hop QA, knowledge graph, place cells, semantic path tracking

TL;DR¶

Inspired by the hippocampal place cell navigation and memory consolidation mechanisms in neurobiology, this paper proposes NeuroPath—a RAG framework based on semantic path tracking—that achieves average improvements of 16.3% in recall@2 and 13.5% in recall@5 on multi-hop QA tasks through LLM-driven goal-directed path construction and a post-retrieval completion strategy.

Background & Motivation¶

Background: RAG significantly enhances LLM performance on knowledge-intensive tasks. Naive RAG retrieves documents based on vector similarity but cannot capture inter-document associations, making it ill-suited for multi-hop reasoning.

Limitations of Prior Work: - Naive RAG: Flat knowledge organization with no cross-document association. - Graph-based RAG (HippoRAG): Uses the PPR algorithm to propagate node importance, but ignores edge semantics, resulting in higher structural relevance than semantic coherence. - Graph-based RAG (LightRAG): Subgraph construction by collecting direct neighbors introduces substantial noise.

Key Challenge: The advantage of graph structures lies in explicit semantic reasoning paths, yet existing graph-based methods focus more on topological structure than on path-level semantic coherence, failing to fully exploit this advantage.

Goal: (1) Address the loss of semantic coherence in retrieval results; (2) eliminate irrelevant noise introduced during node matching and subgraph construction.

Key Insight: Drawing an analogy to the navigation mechanism of hippocampal place cells—place cells preplay future path sequences during navigation and replay them during rest to consolidate memory.

Core Idea: Entities in the knowledge graph are treated as place cells and triples as place fields; dynamic retrieval is performed via LLM-driven, goal-directed semantic path tracking.

Method¶

Overall Architecture¶

The framework follows a three-step pipeline: (1) Static Indexing: an LLM extracts a knowledge graph from documents and constructs coreference sets; (2) Dynamic Path Tracking: simulating the place cell preplay mechanism, the LLM performs goal-directed path filtering and expansion from seed nodes; (3) Post-Retrieval Completion: simulating the replay mechanism, a two-stage retrieval is conducted using intermediate reasoning chains and the original query to fill in missing information.

Key Designs¶

Static Indexing and Pseudo-Coreference Resolution:
- An LLM extracts the entity set \(\mathcal{E}\) and relation triple set \(\mathcal{T}\) from each document \(d_i\) in a single pass.
- A potential coreference set \(\mathcal{R}_i\) is constructed for each entity \(e_i\), containing candidate entities whose cosine similarity exceeds 0.8: \(\text{Sim}(i,j) = \text{CosSim}(\text{Enc}(i), \text{Enc}(j))\) \(\mathcal{R}_i = \text{argtopk}_j \text{Sim}(i,j), \quad i,j \in \mathcal{E}\) The top-5 most similar entities are retained as the coreference set by default.
Dynamic Path Tracking (Simulating Preplay):
- Seed Node Selection: Key entities are extracted from the query and matched to the most similar nodes in the graph; coreference sets are expanded as initial seeds \(\mathcal{S}^0\).
- Path Expansion: Triples \(\mathcal{P}_{sub}^h\) connected to the seed nodes are retrieved and concatenated with the current expanded path \(\mathcal{P}_{exp}^h\) to form candidate paths: \(\mathcal{P}_{cur}^{h+1} = \mathcal{P}_{val}^h + \text{Cat}(\mathcal{P}_{exp}^h, \mathcal{P}_{sub}^h)\)
- LLM Tracking: The LLM filters candidate paths, marks valid paths \(\mathcal{P}_{val}^h\), decides whether further expansion is needed, and generates expansion requirements \(g^h\).
- Pruning Based on Expansion Requirements: The expansion requirements generated by the LLM at the previous hop are used to prune new paths by similarity, preventing exponential growth: \(\mathcal{P}_{cur}^{h'} = \text{argtopk}_p \text{Sim}(g^{h-1}, p), \quad p \in \mathcal{P}_{cur}^h\) Top-30 paths are retained by default.
Post-Retrieval Completion (Simulating Replay):
- After finalizing the paths, source documents on the paths are collected as candidates \(\mathcal{D}_p\).
- The reasoning chain from the last hop \(c_{\text{last}}\) and the expansion requirement \(g_{\text{last}}\) are concatenated with the original query \(q\) to perform a second-stage retrieval that fills in missing information.
- The final document set is \(\mathcal{D}_{ret} = \mathcal{D}_p \cup \mathcal{D}_e\).

Loss & Training¶

No additional training is required—zero-shot prompting is used throughout.
Graph indexing uses GPT-4o-mini.
Path tracking can use either GPT-4o-mini or Qwen-2.5-14B.
The maximum number of reasoning hops is set to 2 by default.

Key Experimental Results¶

Main Results — Retrieval Performance (Contriever Retriever)¶

Method	MuSiQue R@2	MuSiQue R@5	2Wiki R@2	2Wiki R@5	HotpotQA R@2	HotpotQA R@5	Avg R@2	Avg R@5
BGE-M3 (Naive)	40.4	54.2	64.9	71.8	71.8	84.7	59.0	70.2
HippoRAG 2 (Graph-based)	41.8	55.5	62.5	74.2	65.3	83.4	56.5	71.0
Iter-RetGen (Iterative)	46.0	59.8	62.1	76.5	78.3	90.6	62.1	75.6
NeuroPath	48.0	62.7	77.2	92.5	75.6	90.4	66.9	81.9

QA Performance (GPT-4o-mini + Contriever)¶

Method	MuSiQue EM	2Wiki EM	HotpotQA EM	Avg EM
HippoRAG	27.8	58.6	43.3	43.2
HippoRAG 2	27.4	46.0	50.7	41.4
Iter-RetGen	29.9	51.5	48.7	43.4
NeuroPath	31.4	63.4	50.5	48.4

Ablation Study¶

Component	MuSiQue R@2	2Wiki R@2	HotpotQA R@2	Token Consumption Change
Full model (p=30)	48.0	77.2	75.9	Baseline
w/o pruning	48.7	76.8	75.7	Tokens increase ~45%
p=20	47.3	76.5	74.9	Tokens decrease ~7%
w/o post-retrieval completion (hop=2)	41.8	73.6	67.5	—
w/o post-retrieval completion (hop=1)	35.5	61.0	61.3	—

Key Findings¶

Compared to state-of-the-art graph-based RAG methods, recall@2 improves by an average of 16.3% and recall@5 by 13.5%.
Compared to iterative RAG methods, NeuroPath achieves higher accuracy while reducing token consumption by 22.8%.
The largest gains are observed on the most challenging MuSiQue dataset, which is specifically designed for difficult multi-hop reasoning.
NeuroPath is robust to the choice of retriever, whereas iterative methods and HippoRAG 2 exhibit high sensitivity (differences up to 20%).
Robust performance is maintained across 4 smaller LLMs (Llama3.1, GLM4, Mistral0.3, Gemma3).
The post-retrieval completion (Replay mechanism) contributes approximately 6–8% of the recall improvement.

Highlights & Insights¶

Novel Neuroscience-Inspired Analogy: The mapping from place cell preplay/replay to path tracking/post-retrieval completion is both conceptually elegant and empirically effective.
Path-Level Retrieval Outperforms Node/Subgraph-Level Retrieval: Explicit semantic paths ensure coherence in retrieval results, avoiding the noise introduced by subgraph-based methods.
Active LLM Participation in Retrieval: Rather than passive matching, the LLM actively reasons, filters, and predicts expansion directions at each hop, realizing a form of "thinking-driven retrieval."
High Token Efficiency: Token consumption is reduced by 22.8% compared to iterative RAG while simultaneously achieving higher accuracy.

Limitations & Future Work¶

The framework relies on LLMs for path tracking, and inference costs (number of LLM API calls) remain relatively high.
Knowledge graph quality is constrained by the LLM's extraction capability, and extraction errors propagate to downstream retrieval.
Coreference resolution relies on a simple vector similarity threshold (0.8), which may miss coreferent entities with significant name variation.
The maximum hop count is limited to 2; scalability to deeper reasoning chains remains to be validated.
On tasks with lower knowledge integration demands, such as HotpotQA, the advantage over simpler methods is less pronounced.

HippoRAG: The primary competing method, which uses the PPR algorithm but ignores edge semantics. NeuroPath addresses this through explicit path-level semantic coherence.
LightRAG: Subgraph construction introduces excessive noise (answering incorrectly even with 60 entities and 169 relations), demonstrating that "more retrieval" does not equate to "better retrieval."
PathRAG: Another path-based method, but it applies uniform resource allocation and disregards edge importance and semantics.
Place Cell Theory (O'Keefe, 1971): Provides an elegant conceptual framework for the method design.
Insight: The shift in RAG from "retrieve more" to "retrieve more precise paths" may represent a key future direction.

Rating¶

Novelty: ⭐⭐⭐⭐ The neuroscience analogy is novel, practically grounded, and effective; the path tracking concept exhibits genuine originality.
Experimental Thoroughness: ⭐⭐⭐⭐⭐ Three primary datasets plus three additional datasets, with ablations across multiple LLMs and retrievers.
Writing Quality: ⭐⭐⭐⭐ The structure is clear and case studies are intuitive, though the depth of the neuroscience analogy could be strengthened.
Value: ⭐⭐⭐⭐ Substantially outperforms state-of-the-art on multi-hop QA and provides important reference value for the RAG community.