Skip to content

NeuroPath: Neurobiology-Inspired Path Tracking and Reflection for Semantically Coherent Retrieval

Conference: NeurIPS 2025 arXiv: 2511.14096 Code: GitHub Area: Video Understanding / RAG Keywords: RAG, multi-hop QA, knowledge graph, place cells, semantic path tracking

TL;DR

Inspired by the hippocampal place cell navigation and memory consolidation mechanisms in neurobiology, this paper proposes NeuroPath—a RAG framework based on semantic path tracking—that achieves average improvements of 16.3% in recall@2 and 13.5% in recall@5 on multi-hop QA tasks through LLM-driven goal-directed path construction and a post-retrieval completion strategy.

Background & Motivation

Background: RAG significantly enhances LLM performance on knowledge-intensive tasks. Naive RAG retrieves documents based on vector similarity but cannot capture inter-document associations, making it ill-suited for multi-hop reasoning.

Limitations of Prior Work: - Naive RAG: Flat knowledge organization with no cross-document association. - Graph-based RAG (HippoRAG): Uses the PPR algorithm to propagate node importance, but ignores edge semantics, resulting in higher structural relevance than semantic coherence. - Graph-based RAG (LightRAG): Subgraph construction by collecting direct neighbors introduces substantial noise.

Key Challenge: The advantage of graph structures lies in explicit semantic reasoning paths, yet existing graph-based methods focus more on topological structure than on path-level semantic coherence, failing to fully exploit this advantage.

Goal: (1) Address the loss of semantic coherence in retrieval results; (2) eliminate irrelevant noise introduced during node matching and subgraph construction.

Key Insight: Drawing an analogy to the navigation mechanism of hippocampal place cells—place cells preplay future path sequences during navigation and replay them during rest to consolidate memory.

Core Idea: Entities in the knowledge graph are treated as place cells and triples as place fields; dynamic retrieval is performed via LLM-driven, goal-directed semantic path tracking.

Method

Overall Architecture

The framework follows a three-step pipeline: (1) Static Indexing: an LLM extracts a knowledge graph from documents and constructs coreference sets; (2) Dynamic Path Tracking: simulating the place cell preplay mechanism, the LLM performs goal-directed path filtering and expansion from seed nodes; (3) Post-Retrieval Completion: simulating the replay mechanism, a two-stage retrieval is conducted using intermediate reasoning chains and the original query to fill in missing information.

Key Designs

  1. Static Indexing and Pseudo-Coreference Resolution:

    • An LLM extracts the entity set \(\mathcal{E}\) and relation triple set \(\mathcal{T}\) from each document \(d_i\) in a single pass.
    • A potential coreference set \(\mathcal{R}_i\) is constructed for each entity \(e_i\), containing candidate entities whose cosine similarity exceeds 0.8: \(\text{Sim}(i,j) = \text{CosSim}(\text{Enc}(i), \text{Enc}(j))\) \(\mathcal{R}_i = \text{argtopk}_j \text{Sim}(i,j), \quad i,j \in \mathcal{E}\) The top-5 most similar entities are retained as the coreference set by default.
  2. Dynamic Path Tracking (Simulating Preplay):

    • Seed Node Selection: Key entities are extracted from the query and matched to the most similar nodes in the graph; coreference sets are expanded as initial seeds \(\mathcal{S}^0\).
    • Path Expansion: Triples \(\mathcal{P}_{sub}^h\) connected to the seed nodes are retrieved and concatenated with the current expanded path \(\mathcal{P}_{exp}^h\) to form candidate paths: \(\mathcal{P}_{cur}^{h+1} = \mathcal{P}_{val}^h + \text{Cat}(\mathcal{P}_{exp}^h, \mathcal{P}_{sub}^h)\)
    • LLM Tracking: The LLM filters candidate paths, marks valid paths \(\mathcal{P}_{val}^h\), decides whether further expansion is needed, and generates expansion requirements \(g^h\).
    • Pruning Based on Expansion Requirements: The expansion requirements generated by the LLM at the previous hop are used to prune new paths by similarity, preventing exponential growth: \(\mathcal{P}_{cur}^{h'} = \text{argtopk}_p \text{Sim}(g^{h-1}, p), \quad p \in \mathcal{P}_{cur}^h\) Top-30 paths are retained by default.
  3. Post-Retrieval Completion (Simulating Replay):

    • After finalizing the paths, source documents on the paths are collected as candidates \(\mathcal{D}_p\).
    • The reasoning chain from the last hop \(c_{\text{last}}\) and the expansion requirement \(g_{\text{last}}\) are concatenated with the original query \(q\) to perform a second-stage retrieval that fills in missing information.
    • The final document set is \(\mathcal{D}_{ret} = \mathcal{D}_p \cup \mathcal{D}_e\).

Loss & Training

  • No additional training is required—zero-shot prompting is used throughout.
  • Graph indexing uses GPT-4o-mini.
  • Path tracking can use either GPT-4o-mini or Qwen-2.5-14B.
  • The maximum number of reasoning hops is set to 2 by default.

Key Experimental Results

Main Results — Retrieval Performance (Contriever Retriever)

Method MuSiQue R@2 MuSiQue R@5 2Wiki R@2 2Wiki R@5 HotpotQA R@2 HotpotQA R@5 Avg R@2 Avg R@5
BGE-M3 (Naive) 40.4 54.2 64.9 71.8 71.8 84.7 59.0 70.2
HippoRAG 2 (Graph-based) 41.8 55.5 62.5 74.2 65.3 83.4 56.5 71.0
Iter-RetGen (Iterative) 46.0 59.8 62.1 76.5 78.3 90.6 62.1 75.6
NeuroPath 48.0 62.7 77.2 92.5 75.6 90.4 66.9 81.9

QA Performance (GPT-4o-mini + Contriever)

Method MuSiQue EM 2Wiki EM HotpotQA EM Avg EM
HippoRAG 27.8 58.6 43.3 43.2
HippoRAG 2 27.4 46.0 50.7 41.4
Iter-RetGen 29.9 51.5 48.7 43.4
NeuroPath 31.4 63.4 50.5 48.4

Ablation Study

Component MuSiQue R@2 2Wiki R@2 HotpotQA R@2 Token Consumption Change
Full model (p=30) 48.0 77.2 75.9 Baseline
w/o pruning 48.7 76.8 75.7 Tokens increase ~45%
p=20 47.3 76.5 74.9 Tokens decrease ~7%
w/o post-retrieval completion (hop=2) 41.8 73.6 67.5
w/o post-retrieval completion (hop=1) 35.5 61.0 61.3

Key Findings

  • Compared to state-of-the-art graph-based RAG methods, recall@2 improves by an average of 16.3% and recall@5 by 13.5%.
  • Compared to iterative RAG methods, NeuroPath achieves higher accuracy while reducing token consumption by 22.8%.
  • The largest gains are observed on the most challenging MuSiQue dataset, which is specifically designed for difficult multi-hop reasoning.
  • NeuroPath is robust to the choice of retriever, whereas iterative methods and HippoRAG 2 exhibit high sensitivity (differences up to 20%).
  • Robust performance is maintained across 4 smaller LLMs (Llama3.1, GLM4, Mistral0.3, Gemma3).
  • The post-retrieval completion (Replay mechanism) contributes approximately 6–8% of the recall improvement.

Highlights & Insights

  • Novel Neuroscience-Inspired Analogy: The mapping from place cell preplay/replay to path tracking/post-retrieval completion is both conceptually elegant and empirically effective.
  • Path-Level Retrieval Outperforms Node/Subgraph-Level Retrieval: Explicit semantic paths ensure coherence in retrieval results, avoiding the noise introduced by subgraph-based methods.
  • Active LLM Participation in Retrieval: Rather than passive matching, the LLM actively reasons, filters, and predicts expansion directions at each hop, realizing a form of "thinking-driven retrieval."
  • High Token Efficiency: Token consumption is reduced by 22.8% compared to iterative RAG while simultaneously achieving higher accuracy.

Limitations & Future Work

  • The framework relies on LLMs for path tracking, and inference costs (number of LLM API calls) remain relatively high.
  • Knowledge graph quality is constrained by the LLM's extraction capability, and extraction errors propagate to downstream retrieval.
  • Coreference resolution relies on a simple vector similarity threshold (0.8), which may miss coreferent entities with significant name variation.
  • The maximum hop count is limited to 2; scalability to deeper reasoning chains remains to be validated.
  • On tasks with lower knowledge integration demands, such as HotpotQA, the advantage over simpler methods is less pronounced.
  • HippoRAG: The primary competing method, which uses the PPR algorithm but ignores edge semantics. NeuroPath addresses this through explicit path-level semantic coherence.
  • LightRAG: Subgraph construction introduces excessive noise (answering incorrectly even with 60 entities and 169 relations), demonstrating that "more retrieval" does not equate to "better retrieval."
  • PathRAG: Another path-based method, but it applies uniform resource allocation and disregards edge importance and semantics.
  • Place Cell Theory (O'Keefe, 1971): Provides an elegant conceptual framework for the method design.
  • Insight: The shift in RAG from "retrieve more" to "retrieve more precise paths" may represent a key future direction.

Rating

  • Novelty: ⭐⭐⭐⭐ The neuroscience analogy is novel, practically grounded, and effective; the path tracking concept exhibits genuine originality.
  • Experimental Thoroughness: ⭐⭐⭐⭐⭐ Three primary datasets plus three additional datasets, with ablations across multiple LLMs and retrievers.
  • Writing Quality: ⭐⭐⭐⭐ The structure is clear and case studies are intuitive, though the depth of the neuroscience analogy could be strengthened.
  • Value: ⭐⭐⭐⭐ Substantially outperforms state-of-the-art on multi-hop QA and provides important reference value for the RAG community.