Skip to content
Learn Agentic AI
Learn Agentic AI11 min read2 views

Dynamic Tool Selection: AI Agents That Choose Tools Based on Context

Learn how AI agents select the right tool from a large toolset. Covers tool routing strategies, writing descriptions that guide selection, handling the too-many-tools problem, and building intelligent tool dispatchers.

The Tool Selection Problem

When an agent has 3 tools, the LLM picks the right one almost every time. At 10 tools, accuracy starts declining. At 50+ tools, the model frequently picks wrong tools, hallucinates parameters, or calls tools that are irrelevant to the task. This is the too-many-tools problem, and solving it is essential for building agents that work with large toolsets.

The fundamental insight is that tool selection is a search problem. The LLM needs enough information to discriminate between tools, but not so much that it is overwhelmed.

How LLMs Select Tools

When you provide tools to an LLM, the model uses three signals to decide which tool to call:

flowchart TD
    START["Dynamic Tool Selection: AI Agents That Choose Too…"] --> A
    A["The Tool Selection Problem"]
    A --> B
    B["How LLMs Select Tools"]
    B --> C
    C["Writing Descriptions That Discriminate"]
    C --> D
    D["Strategy 1: Tool Categories with Pre-Ro…"]
    D --> E
    E["Strategy 2: Two-Stage Tool Selection"]
    E --> F
    F["Strategy 3: Embedding-Based Tool Select…"]
    F --> G
    G["FAQ"]
    G --> DONE["Key Takeaways"]
    style START fill:#4f46e5,stroke:#4338ca,color:#fff
    style DONE fill:#059669,stroke:#047857,color:#fff
  1. The tool name — semantic meaning extracted from the function name
  2. The tool description — the primary source of selection guidance
  3. The parameter schema — structural hints about what data the tool expects

The description is by far the most important. A good description acts as a routing instruction.

Writing Descriptions That Discriminate

Each tool description should answer: what does this tool do, when should it be used, and when should a different tool be used instead.

# Bad: overlapping, ambiguous descriptions
tools_bad = [
    {"name": "search", "description": "Search for information"},
    {"name": "lookup", "description": "Look up data"},
    {"name": "find", "description": "Find results"},
]

# Good: clear boundaries between tools
tools_good = [
    {
        "name": "search_web",
        "description": "Search the public internet for current information. Use for recent events, general knowledge, or topics not in our internal database. Do NOT use for internal company data."
    },
    {
        "name": "search_knowledge_base",
        "description": "Search the internal company knowledge base for policies, procedures, and documentation. Use for company-specific questions. Do NOT use for general internet searches."
    },
    {
        "name": "search_customer_db",
        "description": "Look up a specific customer by name, email, or ID in the customer database. Use when the user asks about a specific customer's account, orders, or status. Requires at least one identifier."
    },
]

The "Do NOT use for" clause is surprisingly effective. It gives the LLM a negative signal that prevents common misrouting.

Strategy 1: Tool Categories with Pre-Routing

For large toolsets, pre-filter tools based on the conversation context before passing them to the LLM:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

from dataclasses import dataclass, field

@dataclass
class ToolCategory:
    name: str
    description: str
    keywords: list[str]
    tools: list[dict]

class ToolRouter:
    def __init__(self):
        self.categories: list[ToolCategory] = []

    def add_category(self, category: ToolCategory):
        self.categories.append(category)

    def select_tools(self, user_message: str, max_tools: int = 10) -> list[dict]:
        message_lower = user_message.lower()
        scored_categories = []

        for category in self.categories:
            score = sum(
                1 for kw in category.keywords
                if kw.lower() in message_lower
            )
            if score > 0:
                scored_categories.append((score, category))

        scored_categories.sort(key=lambda x: x[0], reverse=True)

        selected_tools = []
        for _, category in scored_categories:
            for tool in category.tools:
                if len(selected_tools) < max_tools:
                    selected_tools.append(tool)

        # Always include core tools
        if not selected_tools:
            return self.categories[0].tools[:max_tools]

        return selected_tools

Usage:

router = ToolRouter()

router.add_category(ToolCategory(
    name="data_analysis",
    description="Tools for querying and analyzing data",
    keywords=["data", "query", "sql", "analyze", "statistics", "count", "average"],
    tools=[query_db_tool, chart_tool, export_csv_tool],
))

router.add_category(ToolCategory(
    name="communication",
    description="Tools for sending messages and notifications",
    keywords=["send", "email", "message", "notify", "slack", "alert"],
    tools=[send_email_tool, slack_tool, sms_tool],
))

# At runtime, only pass relevant tools to the LLM
relevant_tools = router.select_tools(user_message, max_tools=8)
response = await client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=relevant_tools,
)

Strategy 2: Two-Stage Tool Selection

For very large toolsets (50+ tools), use a two-stage approach where the first LLM call selects the tool category, and the second call uses only tools from that category:

async def two_stage_tool_selection(user_message: str, all_categories: list[ToolCategory]):
    # Stage 1: Ask LLM to pick the right category
    category_descriptions = "\n".join(
        f"- {cat.name}: {cat.description}"
        for cat in all_categories
    )

    stage1_response = await client.chat.completions.create(
        model="gpt-4o-mini",  # Cheaper model for routing
        messages=[
            {"role": "system", "content": f"Select the tool category most relevant to the user's request. Available categories:\n{category_descriptions}\n\nRespond with only the category name."},
            {"role": "user", "content": user_message},
        ],
    )

    selected_name = stage1_response.choices[0].message.content.strip()

    # Stage 2: Run agent with only tools from selected category
    selected_category = next(
        (cat for cat in all_categories if cat.name == selected_name),
        all_categories[0]
    )

    return await run_agent(
        user_message,
        tools=selected_category.tools,
        system_prompt="You are a helpful assistant.",
    )

Using a cheaper model (GPT-4o-mini) for routing keeps costs low while ensuring the main agent only sees relevant tools.

Strategy 3: Embedding-Based Tool Selection

For the most sophisticated approach, use embeddings to match user intent to tool descriptions:

import numpy as np

class EmbeddingToolSelector:
    def __init__(self, tools: list[dict]):
        self.tools = tools
        self.embeddings = None

    async def build_index(self):
        descriptions = [
            f"{t['function']['name']}: {t['function']['description']}"
            for t in self.tools
        ]
        response = await client.embeddings.create(
            model="text-embedding-3-small",
            input=descriptions,
        )
        self.embeddings = np.array([e.embedding for e in response.data])

    async def select(self, query: str, top_k: int = 5) -> list[dict]:
        response = await client.embeddings.create(
            model="text-embedding-3-small",
            input=[query],
        )
        query_embedding = np.array(response.data[0].embedding)

        similarities = np.dot(self.embeddings, query_embedding)
        top_indices = np.argsort(similarities)[-top_k:][::-1]

        return [self.tools[i] for i in top_indices]

This approach scales to hundreds of tools and handles semantic matching — "show me revenue numbers" correctly routes to the database query tool even without the word "query" appearing.

FAQ

What is the maximum number of tools I should give an LLM at once?

Empirically, most models handle 10-15 tools well. Beyond 20, selection accuracy degrades noticeably. If you have more than 20 tools, use one of the pre-routing strategies described above to narrow the active toolset per conversation turn.

How do I debug tool selection mistakes?

Log the tool calls the LLM makes alongside the user message. Look for patterns: does the model confuse two specific tools? Add "Do NOT use for" clauses to their descriptions. Does it pick the right tool but with wrong parameters? The parameter descriptions need improvement. Track selection accuracy as a metric over time.

Should I fine-tune a model for tool selection?

Only as a last resort. For most applications, better tool descriptions, pre-routing, and the two-stage approach solve selection problems without fine-tuning. Fine-tuning makes sense when you have a very large, domain-specific toolset and can generate training data from production logs.


#ToolSelection #AgentArchitecture #FunctionCalling #AIAgents #AgenticAI #LearnAI #AIEngineering

Share
C

Written by

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

Use Cases

Automating Client Document Collection: How AI Agents Chase Missing Tax Documents and Reduce Filing Delays

See how AI agents automate tax document collection — chasing missing W-2s, 1099s, and receipts via calls and texts to eliminate the #1 CPA bottleneck.

Technical Guides

Building Voice Agents with the OpenAI Realtime API: Full Tutorial

Hands-on tutorial for building voice agents with the OpenAI Realtime API — WebSocket setup, PCM16 audio, server VAD, and function calling.

Learn Agentic AI

API Design for AI Agent Tool Functions: Best Practices and Anti-Patterns

How to design tool functions that LLMs can use effectively with clear naming, enum parameters, structured responses, informative error messages, and documentation.

Learn Agentic AI

Computer Use in GPT-5.4: Building AI Agents That Navigate Desktop Applications

Technical guide to GPT-5.4's computer use capabilities for building AI agents that interact with desktop UIs, browser automation, and real-world application workflows.

Learn Agentic AI

Prompt Engineering for AI Agents: System Prompts, Tool Descriptions, and Few-Shot Patterns

Agent-specific prompt engineering techniques: crafting effective system prompts, writing clear tool descriptions for function calling, and few-shot examples that improve complex task performance.

Learn Agentic AI

AI Agents for IT Helpdesk: L1 Automation, Ticket Routing, and Knowledge Base Integration

Build IT helpdesk AI agents with multi-agent architecture for triage, device, network, and security issues. RAG-powered knowledge base, automated ticket creation, routing, and escalation.