Skip to content

Tool4POI: A Tool-Augmented LLM Framework for Next POI Recommendation

Conference: AAAI 2026 arXiv: 2511.06405 Code: N/A Area: Recommender Systems Keywords: POI Recommendation, Tool-Augmented LLM, Agent, Open-Set Recommendation, Location-Based Services

TL;DR

This paper is the first to introduce the tool-augmented LLM paradigm to the next POI recommendation task. Through three modules—preference extraction, multi-round candidate retrieval, and reranking—the framework enables LLMs to retrieve recommendations from the full POI pool. It achieves over 40% accuracy in Out-of-History (OOH) scenarios (where existing methods yield 0%), with average Acc@5/10 improvements of 20%/30%.

Background & Motivation

Background: Next Point-of-Interest (POI) recommendation is a core task in location-based services. Traditional approaches (RNN/Transformer/GCN) model user trajectory sequences via embedding representations. Recently, the contextual reasoning capability of LLMs has been introduced to this task—e.g., LLM-Mob (in-context learning) and LLM4POI (supervised fine-tuning)—demonstrating potential for understanding spatiotemporal dynamics.

Limitations of Prior Work: LLM-based methods face two fundamental constraints: (1) heavy reliance on historical completeness—they can only recommend POIs the user has previously visited and cannot handle Out-of-History (OOH) scenarios (where users intend to visit places they have never been), which account for over 30% of real-world cases; (2) context window limitations—a city may contain hundreds of thousands of POIs, making it infeasible to encode all candidates into a prompt for open-set recommendation.

Key Challenge: User behavior exhibits both regularity (commuting patterns) and exploratory tendencies (trying new restaurants). Existing LLM methods overfit to visited POIs and cannot support exploratory behavior; fine-tuning approaches (e.g., GNPR-SID) further exacerbate this bias.

Goal: Design a plug-and-play, fine-tuning-free framework that enables LLMs to retrieve recommendations from the full POI pool via external tools, overcoming both the OOH limitation and the large-scale candidate space constraint.

Key Insight: The observation that humans select destinations through progressive filtering (by category, region, and distance) suggests that this narrowing process can be simulated via multi-round tool calls by an LLM agent.

Core Idea: Equip LLMs with external tool-calling capabilities and implement a three-stage pipeline—preference extraction → multi-round tool-based retrieval → reranking—to enable open-set POI recommendation.

Method

Overall Architecture

Tool4POI comprises three modules, all built on Qwen2.5-14B and applicable in a plug-and-play manner without fine-tuning: (1) Preference Extraction Module—extracts user preferences along region, category, and temporal dimensions from long-term check-in history; (2) Tool-Augmented Candidate Retrieval Module—the LLM acts as a retrieval agent, interacting with six external tools across multiple rounds to retrieve relevant candidates from the full POI pool; (3) Reranking Module—reorders candidates based on the user's recent check-in behavior to reflect current intent.

Key Designs

  1. Preference Extraction Module:

    • Function: Extracts a structured representation of long-term user preferences from historical check-in trajectories.
    • Mechanism: A structured prompt is designed to feed the user's temporally ordered check-in sequence (converted to regional codes via Google Maps Plus Codes) into the LLM, which outputs preference keywords along Region, Category, and Time dimensions. Plus Codes aggregate latitude/longitude coordinates into region-level encodings, enabling spatially proximate POIs to share the same code, thereby simplifying geographic feature representation.
    • Design Motivation: Historical data is voluminous yet rich in implicit patterns; the reasoning capability of LLMs is well suited for extracting multi-dimensional preference summaries.
  2. Tool-Augmented Candidate Retrieval Module:

    • Function: Enables the LLM to autonomously retrieve relevant candidates from the full POI pool, overcoming context window limitations.
    • Mechanism: Six external tools are defined: query tools (getPOIinfo for POI metadata retrieval), retrieval tools (filterByCategories/filterByRegions for filtering by category/region), auxiliary tools (findPotential for generating initial candidates via POI-level collaborative filtering; sortByDistance for distance-based ranking), and a control tool (finish to terminate retrieval). The LLM acts as a RetrievalAgent, autonomously determining tool call order and parameters based on extracted preferences. Termination conditions: explicit invocation of finish, candidate set size falling below threshold \(\tau=10\), or reaching the maximum call count \(K=6\).
    • Design Motivation: (1) findPotential introduces collective behavioral priors via a directed co-occurrence graph \(G=(\mathcal{P}, \mathcal{E})\), enabling OOH POIs to be retrieved; (2) multi-round interaction simulates the human decision-making process, progressively narrowing the candidate space to produce a high-quality candidate set.
  3. Reranking Module:

    • Function: Reranks retrieved candidates based on the user's recent behavior to capture short-term intent.
    • Mechanism: The user's recent check-in trajectory \(R_u\), the target timestamp \(t_{i+1}\), and the candidate set \(\mathcal{C}\) are jointly fed into the LLM, which ranks candidate POIs by visit likelihood based on recent behavioral patterns, using natural language reasoning rather than embedding similarity.
    • Design Motivation: Preference extraction captures long-term interests while reranking captures short-term dynamics (seasonal changes, life-stage transitions, etc.); the two are complementary.

Loss & Training

Tool4POI is entirely training-free and requires no fine-tuning. During inference, the three modules execute sequentially. The top-20 candidates from the retrieval module are passed to the reranking module.

Key Experimental Results

Main Results

Method NYC Acc@5 NYC Acc@10 TKY Acc@5 TKY Acc@10 CA Acc@5 CA Acc@10
Tool4POI 0.6346 0.7623 Best Best Best Best
GNPR-SID (FT LLM) Low Low Low Low Low Low
GETNext 0.4815 0.5811 0.4045 0.4961 0.3278 0.3946
STAN 0.4582 0.5734 0.3798 0.4464 0.2348 0.3018

Ablation Study

Configuration All Acc@1 All Acc@10 OOH Acc@1 OOH Acc@10
Tool4POI (Full) 0.3164 0.7623 0.0522 0.5863
w/o Retrieval Module 0.2545 0.5559 0 0
w/o Reranking Module 0.1655 0.7145 0.0963 0.6024

Key Findings

  • Existing LLM methods achieve 0% accuracy in OOH scenarios, while Tool4POI reaches 40%+ Acc@10, demonstrating the critical value of tool-augmented retrieval.
  • The retrieval module contributes most to top-k recommendation (by introducing diverse candidates), while the reranking module contributes most to top-1 precision (by capturing current intent).
  • The most significant improvements are observed on the sparse dataset CA (averaging only 10 check-ins per user), with gains as high as 100%, demonstrating robustness in data-sparse settings.
  • Model scale effects: even a 3B model outperforms 7B fine-tuned methods, indicating that tool augmentation matters more than model scale.

Highlights & Insights

  • This is the first work to introduce the agent tool-calling paradigm to recommender systems, pioneering the integration of recommendation systems with LLM agents. The findPotential tool, which leverages a POI co-occurrence graph as a collective prior, is particularly elegant.
  • The training-free, plug-and-play design allows direct deployment across any city or dataset without retraining, making it highly practical.
  • The iterative candidate narrowing via multi-round interaction mirrors the cognitive process humans use when selecting destinations, and this paradigm is transferable to other open-set recommendation scenarios (e.g., product recommendation, content recommendation).

Limitations & Future Work

  • Retrieval quality depends on the LLM's tool-calling accuracy; smaller models may produce errors that break the tool chain.
  • The six-tool design is relatively hand-crafted; additional domain-specific tools (e.g., weather queries, event calendars) may further improve performance.
  • In OOH scenarios, reranking may actually degrade performance due to lack of contextual information; adaptive decisions on whether to perform reranking could be considered.
  • Inference latency is high due to multi-round LLM calls, which requires optimization for real-time recommendation settings.
  • vs. LLM4POI: Supervised fine-tuning of LLMs for QA-style recommendation leads to severe overfitting toward high-frequency POIs. Tool4POI requires no training and retrieves open-set candidates via tools.
  • vs. GNPR-SID: Uses semantic IDs to train LLMs but still cannot recommend OOH POIs. Tool4POI's findPotential tool leverages collective behavior to transcend individual history limitations.
  • vs. Traditional Methods (e.g., GETNext): Fixed embedding representations lack flexibility. Tool4POI leverages LLM reasoning for context-aware recommendation.

Rating

  • Novelty: ⭐⭐⭐⭐⭐ — First tool-augmented LLM POI recommendation framework; groundbreaking resolution of the OOH problem.
  • Experimental Thoroughness: ⭐⭐⭐⭐ — Comprehensive evaluation on three datasets with in-depth IH/OOH analysis.
  • Writing Quality: ⭐⭐⭐⭐ — Clear method description with complete algorithmic pseudocode.
  • Value: ⭐⭐⭐⭐⭐ — Pioneering contribution to the intersection of recommender systems and LLM agents.