Evoking User Memory: Personalizing LLM via Recollection-Familiarity Adaptive Retrieval¶

Conference: ICLR 2026 arXiv: 2603.09250 Code: See Reproducibility Statement in the paper Area: Personalization / Information Retrieval Keywords: LLM personalization, memory retrieval, dual-process theory, adaptive retrieval, cognitive science

TL;DR¶

Inspired by the dual-process theory in cognitive science, this paper proposes RF-Mem, a memory retrieval framework that achieves efficient and scalable LLM personalization through adaptive switching between two pathways: Familiarity (fast similarity matching) and Recollection (deep chain-based reconstruction).

Background & Motivation¶

Personalizing large language models requires incorporating user-specific history, preferences, and context into response generation. Two dominant existing approaches each suffer from critical drawbacks:

Full-context methods: Stuffing all user memory into the prompt is costly and unscalable — as user memory accumulates, prompt length quickly exceeds model context window limits.

One-shot retrieval methods: Reducing retrieval to a single-round similarity search (top-K) only captures surface-level matches and fails to recover memories that are indirectly but critically related to the query.

Cognitive science research shows that human memory recognition operates via a dual process: - Familiarity: A fast but coarse recognition process that rapidly judges whether something has been encountered before. - Recollection: A slow but precise reconstruction process that consciously retrieves specific details and associated context.

Existing systems lack both recollection-style retrieval capability and a mechanism to adaptively switch between the two retrieval pathways. This leads to either under-retrieval (missing key memories) or noise introduction (retrieving irrelevant content).

Method¶

Overall Architecture¶

RF-Mem (Recollection-Familiarity Memory Retrieval) is a dual-pathway memory retriever guided by familiarity uncertainty. Given a user query, the framework first computes a Familiarity signal to assess retrieval certainty, then selects either the fast Familiarity pathway or the deep Recollection pathway based on that certainty level.

Key Designs¶

Familiarity Signal Computation:
- Computes the distribution of similarity scores between the query and candidate memories.
- Familiarity level is measured by two metrics:
  - Mean Score: Captures overall matching strength; a higher mean indicates the presence of familiar memories.
  - Entropy: Measures uncertainty in the score distribution; lower entropy indicates more definitive and concentrated retrieval results.
- High mean + low entropy = high Familiarity (confident direct match exists).
- Low mean / high entropy = low Familiarity (deeper retrieval required).
Familiarity Pathway (Fast Track):
- Activated when the Familiarity signal is strong.
- Performs standard top-K similarity retrieval and directly returns the most relevant memories.
- Efficient: requires only a single forward retrieval pass with no additional computation.
- Applicable when: the user query has a clear and direct association with historical memories.
Recollection Pathway (Deep Reconstruction Track):
- Activated when the Familiarity signal is weak, simulating the human process of conscious memory reconstruction.
- Candidate Memory Clustering: Clusters retrieved candidate memories to identify distinct memory themes and contexts.
- Alpha-Mix Query Expansion: Blends the query with cluster centroids of candidate memories via alpha mixing to generate expanded queries in embedding space.
- Iterative Evidence Expansion: Uses the expanded query for re-retrieval, progressively covering memories that are indirectly yet semantically related to the original query.
- Applicable when: the user query involves cross-temporal or cross-topic memory associations that require chain-like reasoning to recall.
Adaptive Pathway Switching:
- Automatically selects the pathway based on a threshold mechanism applied to the Familiarity signal.
- Avoids both extremes: the high cost of full-context methods and the low recall of one-shot retrieval.
- Achieves optimal retrieval quality under fixed budget and latency constraints.

Loss & Training¶

RF-Mem is a modular framework compatible with different embedding models and LLMs. The key innovation lies at the retrieval strategy level rather than model training — personalization is improved through smarter retrieval decisions.

Key Experimental Results¶

Main Results¶

Evaluation is conducted on three personalization benchmarks spanning different corpus scales:

Method	Benchmark 1	Benchmark 2	Benchmark 3	Notes
Full-context inference	Baseline	Baseline	Baseline	Highest cost, performance upper bound
One-shot retrieval	Below full-context	Below full-context	Below full-context	Simple and fast but lower quality
RF-Mem	Best	Best	Best	Consistently outperforms both baselines under fixed budget

Ablation Study¶

Configuration	Key Metric	Notes
Familiarity pathway only	Baseline level	Equivalent to standard top-K retrieval
Recollection pathway only	Above Familiarity-only	Wastes computation on simple queries
Dual-pathway + adaptive switching	Best	Balances efficiency and quality
w/o clustering	Performance drop	Clustering helps identify memory topic structure
w/o Alpha-Mix	Performance drop	Query expansion is central to Recollection

Key Findings¶

Consistent advantage: RF-Mem outperforms both baselines across all three benchmarks and at varying corpus scales.
Budget efficiency: Under fixed retrieval budgets (token count) and latency constraints, RF-Mem achieves performance close to full-context methods while maintaining the efficiency of one-shot retrieval.
Scalability: RF-Mem's advantage becomes more pronounced as the user memory store grows — full-context costs scale linearly, whereas RF-Mem's overhead grows modestly.
Pathway distribution: Approximately 60–70% of queries are handled by the fast Familiarity pathway, while 30–40% require the deep Recollection pathway.

Highlights & Insights¶

Elegant transfer from cognitive science: Introducing the Familiarity-Recollection dual-process theory of human memory into LLM retrieval system design is both intellectually elegant and practically effective.
Uncertainty-guided adaptation: Using the mean and entropy of retrieval score distributions as adaptive switching signals is more robust than simple threshold-based approaches.
Memory reconstruction in embedding space: Simulating the chain-like reconstruction of recollection via clustering and Alpha-Mix in embedding space avoids costly multi-round LLM calls.
Pragmatic design philosophy: The modular architecture allows individual components to be replaced and optimized independently, making it well-suited for engineering deployment.

Limitations & Future Work¶

Familiarity threshold tuning: Adaptive switching relies on threshold parameters that may require dataset-specific tuning, lacking a fully automated solution.
Clustering algorithm selection: The clustering method in the Recollection pathway may underperform on high-dimensional sparse memories.
Long-term memory forgetting and updates: The paper does not explicitly address how to handle outdated or contradictory user memories.
Privacy considerations: Storing and retrieving user history entails privacy risks; the paper does not discuss privacy-preserving mechanisms in depth.
Abstract-based evaluation only (note): Due to the unavailability of the full-text HTML version, some experimental details and data points are inferred from the abstract.

Dual-process cognitive theory (Yonelinas, 2002): Familiarity and Recollection are two fundamental processes in human memory recognition; this paper operationalizes the theory into retrieval system design.
RAG (Retrieval-Augmented Generation): RF-Mem can be viewed as an enhanced RAG variant with retrieval strategies specifically optimized for personalization scenarios.
Adaptive Retrieval: Works such as Self-RAG and FLARE study when to retrieve; RF-Mem studies how to retrieve.
Personalized LLMs: Benchmarks such as LaMP and PersonaLLM have advanced the development of personalized LLMs; this paper improves upon the retrieval module in that line of work.

Rating¶

Novelty: ⭐⭐⭐⭐⭐ — Innovative application of the dual-process theory from cognitive science to LLM retrieval.
Experimental Thoroughness: ⭐⭐⭐⭐ — Three benchmarks, multi-scale evaluation, and ablation analysis.
Writing Quality: ⭐⭐⭐⭐ — Concepts are clearly presented with thorough motivation from an interdisciplinary perspective.
Value: ⭐⭐⭐⭐ — Provides a practical and scalable solution for memory retrieval in personalized LLMs.