Tool Use in AI Agents: Extending LLM Capabilities with External Functions

Why Tools Are the Bridge Between Thinking and Doing

An LLM without tools is a brain without hands. It can reason, analyze, and generate text — but it cannot check the weather, query a database, send an email, or read a file. Tools are what turn a language model from a conversationalist into an agent that can affect the real world.

Tool use (also called function calling) is the mechanism by which an LLM requests the execution of an external function. The model does not run the function itself — it generates a structured request (function name + arguments), your code executes it, and the result is fed back into the model's context.

The Tool Execution Flow

Understanding the exact flow of a tool call is essential for debugging and designing reliable agents.

flowchart TD
    USER(["User message"])
    LLM["LLM call<br/>with tools schema"]
    DECIDE{"Model wants<br/>to call a tool?"}
    EXEC["Execute tool<br/>sandboxed runtime"]
    RESULT["Append tool_result<br/>to messages"]
    GUARD{"Output passes<br/>guardrails?"}
    DONE(["Final reply"])
    BLOCK(["Refuse and log"])
    USER --> LLM --> DECIDE
    DECIDE -->|Yes| EXEC --> RESULT --> LLM
    DECIDE -->|No| GUARD
    GUARD -->|Yes| DONE
    GUARD -->|No| BLOCK
    style LLM fill:#4f46e5,stroke:#4338ca,color:#fff
    style EXEC fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style DONE fill:#059669,stroke:#047857,color:#fff
    style BLOCK fill:#dc2626,stroke:#b91c1c,color:#fff

1. LLM receives messages + tool definitions
2. LLM decides to call a tool (instead of responding with text)
3. LLM outputs: {"tool": "search_db", "args": {"query": "overdue invoices"}}
4. Your code intercepts this, executes search_db(query="overdue invoices")
5. Your code appends the result as a tool message
6. LLM receives the result and decides what to do next
7. Repeat until LLM responds with text (no tool call)

The critical insight is that the LLM never executes anything. It only generates the intent to use a tool. Your application code is the executor, which means you have full control over permissions, validation, and error handling.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

Designing Effective Tools

Tool quality directly determines agent quality. A poorly designed tool confuses the LLM and leads to wrong arguments, unnecessary calls, or missed opportunities to use the right tool.

Good Tool Design Principles

# GOOD: Clear name, specific description, well-typed parameters
{
    "type": "function",
    "function": {
        "name": "search_invoices",
        "description": (
            "Search for invoices by status, client name, or date range. "
            "Returns up to 20 matching invoices with amount, status, and due date."
        ),
        "parameters": {
            "type": "object",
            "properties": {
                "status": {
                    "type": "string",
                    "enum": ["paid", "overdue", "pending", "cancelled"],
                    "description": "Filter by invoice status",
                },
                "client_name": {
                    "type": "string",
                    "description": "Partial or full client name to search for",
                },
                "due_before": {
                    "type": "string",
                    "description": "ISO date string. Return invoices due before this date.",
                },
            },
        },
    },
}

# BAD: Vague name, no description, untyped parameters
{
    "type": "function",
    "function": {
        "name": "search",
        "description": "Search for stuff",
        "parameters": {
            "type": "object",
            "properties": {
                "q": {"type": "string"},
            },
        },
    },
}

The description is the most important field. The LLM reads it to decide when and how to use the tool. Write descriptions as if you were explaining the tool to a new team member — be specific about what it does, what it returns, and any limitations.

Building a Tool Registry

In production, you need a systematic way to register, discover, and execute tools. Here is a clean pattern:

from typing import Callable, Any
import json
import inspect

class ToolRegistry:
    def __init__(self):
        self._tools: dict[str, dict] = {}
        self._executors: dict[str, Callable] = {}

    def register(self, func: Callable, description: str, parameters: dict):
        name = func.__name__
        self._tools[name] = {
            "type": "function",
            "function": {
                "name": name,
                "description": description,
                "parameters": parameters,
            },
        }
        self._executors[name] = func

    def get_tool_definitions(self) -> list[dict]:
        return list(self._tools.values())

    def execute(self, name: str, arguments: dict) -> Any:
        if name not in self._executors:
            return {"error": f"Unknown tool: {name}"}
        try:
            return self._executors[name](**arguments)
        except Exception as e:
            return {"error": f"Tool execution failed: {str(e)}"}

# Usage
registry = ToolRegistry()

def get_weather(city: str, units: str = "celsius") -> dict:
    # In production, call a real weather API
    return {"city": city, "temperature": 22, "units": units, "condition": "sunny"}

registry.register(
    func=get_weather,
    description="Get current weather for a city. Returns temperature and conditions.",
    parameters={
        "type": "object",
        "properties": {
            "city": {"type": "string", "description": "City name"},
            "units": {
                "type": "string",
                "enum": ["celsius", "fahrenheit"],
                "description": "Temperature units (default: celsius)",
            },
        },
        "required": ["city"],
    },
)

Error Handling in Tool Execution

Tools fail. APIs time out, databases go down, users pass invalid arguments. How you handle tool errors determines whether your agent recovers gracefully or spirals into confusion.

def safe_execute_tool(registry: ToolRegistry, name: str, raw_args: str) -> str:
    """Execute a tool with comprehensive error handling."""
    # Parse arguments
    try:
        arguments = json.loads(raw_args)
    except json.JSONDecodeError as e:
        return json.dumps({
            "error": "Invalid arguments format",
            "details": str(e),
            "suggestion": "Please provide valid JSON arguments",
        })

    # Execute with timeout protection
    try:
        result = registry.execute(name, arguments)
        return json.dumps(result, default=str)
    except TimeoutError:
        return json.dumps({
            "error": f"Tool '{name}' timed out",
            "suggestion": "Try again with a simpler query or different parameters",
        })
    except Exception as e:
        return json.dumps({
            "error": f"Tool '{name}' failed: {str(e)}",
            "suggestion": "Check the arguments and try again",
        })

The key insight is to always return structured error messages to the LLM, not raw exceptions. Include a suggestion field — it guides the LLM toward recovery instead of just repeating the same failing call.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Tool Permissions and Safety

Not all tools should be available to all agents. A customer-facing agent should not have access to delete_database. Implement tool-level permissions:

class PermissionedToolRegistry(ToolRegistry):
    def __init__(self):
        super().__init__()
        self._permissions: dict[str, str] = {}  # tool_name -> permission level

    def register(self, func, description, parameters, permission="read"):
        super().register(func, description, parameters)
        self._permissions[func.__name__] = permission

    def get_tools_for_level(self, level: str) -> list[dict]:
        levels = {"read": 0, "write": 1, "admin": 2}
        max_level = levels.get(level, 0)
        return [
            self._tools[name]
            for name, perm in self._permissions.items()
            if levels.get(perm, 0) <= max_level
        ]

FAQ

How many tools should an agent have access to?

Keep it under 20 for most agents. Research shows that LLM tool selection accuracy degrades as the number of available tools increases. If you need more, use a router pattern — a first LLM call selects the relevant tool category, then a second call picks the specific tool from a smaller set.

Should tool descriptions include examples?

Yes, especially for tools with complex parameters. Including a brief example in the description (like "Example: search_invoices(status='overdue', client_name='Acme')") significantly improves the LLM's ability to construct correct arguments.

How do I test tools independently from the agent?

Write unit tests for each tool function that verify correct outputs for valid inputs and proper error handling for invalid inputs. Then write integration tests that run the full agent loop with mock tool responses to verify the agent calls tools correctly. Test tools in isolation before testing them within the agent.

#ToolUse #FunctionCalling #AIAgents #Python #APIDesign #AgenticAI #LearnAI #AIEngineering

Tool Use in AI Agents: Extending LLM Capabilities with External Functions

Why Tools Are the Bridge Between Thinking and Doing

The Tool Execution Flow

Designing Effective Tools

Good Tool Design Principles

Building a Tool Registry

Error Handling in Tool Execution

Tool Permissions and Safety

FAQ

How many tools should an agent have access to?

Should tool descriptions include examples?

How do I test tools independently from the agent?

Try CallSphere AI Voice Agents

Related Articles You May Like

Personal AI Assistant: How to Pick One for Business in 2026

Free AI Agents in 2026: When Free Wins and When It Costs You

Graphiti: How Temporal Knowledge Graphs Give AI Voice Agents Persistent Memory (2026 Guide)

Chatbot App vs ChatGPT: What's the Difference, and Which Do I Need?

Building an HVAC After-Hours Emergency Escalation System: A Complete Engineering Guide

GPT-Realtime-2 Tool Use and Reasoning: GPT-5-Class Voice Agents