AutoTool: Efficient Tool Selection for Large Language Model Agents¶

Conference: AAAI 2026 arXiv: 2511.14650 Code: GitHub Area: Agent Keywords: tool selection, LLM agent efficiency, tool usage inertia, graph-based planning, inference cost reduction

TL;DR¶

This paper proposes AutoTool, a graph-based tool selection framework that exploits tool usage inertia to construct a Tool Inertia Graph (TIG). By leveraging statistical structure, AutoTool bypasses redundant LLM inference for tool selection and parameter filling, reducing inference overhead by up to 30% while maintaining task completion rates.

Background & Motivation¶

Background: LLM agents have become powerful tools for automating complex tasks; frameworks such as ReAct drive multi-step decision-making through think–act–observe cycles.
Limitations of Prior Work: Existing frameworks rely on LLM inference at every step for tool selection, incurring high computational overhead and latency, particularly in multi-step tasks with numerous LLM calls.
Key Challenge: Not every decision step requires the full reasoning capacity of an LLM; many tool invocations occur in highly patterned contexts, making current approaches an over-reliance on LLM inference.
Key Observation: The authors identify the phenomenon of tool usage inertia—tool invocations follow predictable sequential patterns. For example, in ScienceWorld, go_to is followed by look_around in 88.7% of cases.
Theoretical Validation: A \(k\)-th order Markov chain analysis shows that conditional entropy decreases from 3.50 bits (0th order) to 2.52 bits (1st order) to 1.93 bits (2nd order), with likelihood ratio tests yielding \(p < .001\), confirming sequential dependency.
Core Idea: A graph structure captures statistical regularities in tool invocations; tools are selected directly under high-confidence conditions, with fallback to LLM inference when uncertainty is high.

Method¶

Overall Architecture¶

AutoTool attempts an inertia call before each standard LLM invocation, comprising two stages: ① Inertia Sensing predicts the next tool; ② Parameter Filling automates parameter assignment. The LLM is bypassed only when both stages succeed; otherwise, the system falls back to standard inference.

Module 1: Tool Inertia Graph (TIG) Construction¶

Hierarchical Node Structure: Tool Nodes encode functional descriptions and execution states; each Tool Node embeds a sub-graph of Parameter Nodes.
Two Types of Directed Edges:
Tool Sequence Edges: connect Tool Nodes and encode sequential dependencies.
Parameter Dependency Edges: connect Parameter Nodes and model inter-tool data flow.
Online Incremental Construction: the graph is learned dynamically from historical execution trajectories; edge weights are updated via positive/negative reinforcement based on success/failure feedback.
Edge weights are reinforced only from high-confidence LLM-generated sequences, preventing error propagation from inertia calls.

Module 2: Inertia Sensing (CIPS Score)¶

The Comprehensive Inertia Potential Score is defined as:

\[\text{CIPS} = (1-\alpha) \cdot \text{Score}_{\text{freq}} + \alpha \cdot \text{Score}_{\text{ctx}}\]

Frequency Score: historical usage frequency derived from TIG edge weights.
Contextual Score: semantic similarity between the current agent rationale and candidate tool descriptions, computed via SimCSE.
The parameter filling stage is entered only when \(\text{CIPS}(v^*) > \theta_{\text{inertial}}\).

Module 3: Hierarchical Parameter Filling¶

A strictly prioritized, LLM-free parameter filling strategy: 1. Dependency Backtracking: traverses Parameter Dependency Edges in the TIG to retrieve parameter values from the outputs of preceding tools. 2. Environment State Matching: extracts values from key states maintained by the agent (e.g., current location). 3. Heuristic Filling: infers values from the current state or task goal. - If any parameter cannot be resolved, the inertia call is abandoned and the LLM is invoked.

Safety Constraints¶

Inertia calls are capped at 30% of total operations.
Consecutive inertia calls are prohibited.
A fault-tolerance mechanism triggers a recovery path upon detection of consecutive tool failures.

Key Experimental Results¶

Main Results: Efficiency Gains (ReAct + AutoTool)¶

Dataset	Progress Rate	Token-In Speedup	Token-Out Speedup	LLM Call Speedup
AlfWorld	0.531 (↑ vs 0.394)	1.60×	2.87×	1.18×
ScienceWorld	0.708 (≈ 0.716)	1.30×	1.41×	1.31×
ToolQuery-Academic	0.895 (≈ 0.901)	1.15×	0.92×	1.20×

Main Results: Efficiency Gains (Reflexion + AutoTool)¶

Dataset	Progress Rate	Token-In Speedup	Token-Out Speedup	LLM Call Speedup
AlfWorld	0.453 (≈ 0.481)	1.33×	1.20×	1.29×
ScienceWorld	0.712 (≈ 0.730)	0.93×	1.20×	1.28×
ToolQuery-Academic	0.923 (↑ vs 0.917)	1.33×	1.19×	1.26×

Overhead Analysis¶

Module	Time Overhead
Semantic computation (SimCSE embedding + similarity)	2.7% ± 1.5% of total task time
Non-semantic modules (graph construction/search/parsing)	On the order of seconds; negligible

Highlights & Insights¶

Discovery and Quantification of Tool Usage Inertia: This work is the first to systematically validate sequential dependencies in tool invocations using information theory (conditional entropy) and statistical hypothesis testing, providing a rigorous theoretical foundation for the proposed method.
Novel LLM Offloading Perspective: Rather than optimizing the LLM itself, the paper identifies which decisions do not require LLM reasoning, embodying the key insight that not every step demands full inference.
Parameter-Level Data Flow Modeling: Beyond modeling tool sequences, the framework tracks parameter flow across tools to enable automatic parameter filling.
Plug-and-Play Compatibility: AutoTool can directly augment existing frameworks such as ReAct and Reflexion without any fine-tuning.

Limitations & Future Work¶

Cold Start: Online graph construction requires accumulated trajectories; inertia predictions are less reliable in early stages.
The inertia window is fixed at 2; dynamic adjustment is not explored.
Evaluation is limited to 3 datasets and does not cover more complex real-world API calling scenarios.
The semantic similarity module (SimCSE) introduces an additional model dependency.
The 30% inertia call cap is conservative and may limit further efficiency gains.

Compared with tool selection methods such as ToolNet and AnyTool, AutoTool is the only approach that simultaneously achieves LLM offloading, inertia sensing, and parameter flow modeling.
The approach shares conceptual similarity with Agent Workflow Memory (learning from historical interactions), but AutoTool focuses on statistical patterns at the tool level.
The inertia graph concept is generalizable to other agent scenarios requiring repetitive decision-making, such as code generation and data analysis pipelines.

Rating¶

Novelty: ⭐⭐⭐⭐ Tool usage inertia is a novel and well-supported empirical finding; the graph-based design is well-motivated.
Experimental Thoroughness: ⭐⭐⭐⭐ Three datasets with multi-model validation and detailed overhead/sensitivity analyses, though dataset scale is limited.
Writing Quality: ⭐⭐⭐⭐ Motivation is clearly articulated, theoretical analysis is rigorous, and figures are informative.
Value: ⭐⭐⭐⭐ Offers a new direction for agent efficiency optimization with strong practical applicability.