Skip to content
Tool Use in AI Agents: Extending LLM Capabilities with External Functions
Learn Agentic AI11 min read18 views

Tool Use in AI Agents: Extending LLM Capabilities with External Functions

Master the design and implementation of tools for AI agents — why tools matter, how to write effective tool descriptions, execution flow, error handling, and best practices for production tool systems.

Why Tools Are the Bridge Between Thinking and Doing

An LLM without tools is a brain without hands. It can reason, analyze, and generate text — but it cannot check the weather, query a database, send an email, or read a file. Tools are what turn a language model from a conversationalist into an agent that can affect the real world.

Tool use (also called function calling) is the mechanism by which an LLM requests the execution of an external function. The model does not run the function itself — it generates a structured request (function name + arguments), your code executes it, and the result is fed back into the model's context.

The Tool Execution Flow

Understanding the exact flow of a tool call is essential for debugging and designing reliable agents.

flowchart TD
    USER(["User message"])
    LLM["LLM call<br/>with tools schema"]
    DECIDE{"Model wants<br/>to call a tool?"}
    EXEC["Execute tool<br/>sandboxed runtime"]
    RESULT["Append tool_result<br/>to messages"]
    GUARD{"Output passes<br/>guardrails?"}
    DONE(["Final reply"])
    BLOCK(["Refuse and log"])
    USER --> LLM --> DECIDE
    DECIDE -->|Yes| EXEC --> RESULT --> LLM
    DECIDE -->|No| GUARD
    GUARD -->|Yes| DONE
    GUARD -->|No| BLOCK
    style LLM fill:#4f46e5,stroke:#4338ca,color:#fff
    style EXEC fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style DONE fill:#059669,stroke:#047857,color:#fff
    style BLOCK fill:#dc2626,stroke:#b91c1c,color:#fff
1. LLM receives messages + tool definitions
2. LLM decides to call a tool (instead of responding with text)
3. LLM outputs: {"tool": "search_db", "args": {"query": "overdue invoices"}}
4. Your code intercepts this, executes search_db(query="overdue invoices")
5. Your code appends the result as a tool message
6. LLM receives the result and decides what to do next
7. Repeat until LLM responds with text (no tool call)

The critical insight is that the LLM never executes anything. It only generates the intent to use a tool. Your application code is the executor, which means you have full control over permissions, validation, and error handling.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

Designing Effective Tools

Tool quality directly determines agent quality. A poorly designed tool confuses the LLM and leads to wrong arguments, unnecessary calls, or missed opportunities to use the right tool.

Good Tool Design Principles

# GOOD: Clear name, specific description, well-typed parameters
{
    "type": "function",
    "function": {
        "name": "search_invoices",
        "description": (
            "Search for invoices by status, client name, or date range. "
            "Returns up to 20 matching invoices with amount, status, and due date."
        ),
        "parameters": {
            "type": "object",
            "properties": {
                "status": {
                    "type": "string",
                    "enum": ["paid", "overdue", "pending", "cancelled"],
                    "description": "Filter by invoice status",
                },
                "client_name": {
                    "type": "string",
                    "description": "Partial or full client name to search for",
                },
                "due_before": {
                    "type": "string",
                    "description": "ISO date string. Return invoices due before this date.",
                },
            },
        },
    },
}

# BAD: Vague name, no description, untyped parameters
{
    "type": "function",
    "function": {
        "name": "search",
        "description": "Search for stuff",
        "parameters": {
            "type": "object",
            "properties": {
                "q": {"type": "string"},
            },
        },
    },
}

The description is the most important field. The LLM reads it to decide when and how to use the tool. Write descriptions as if you were explaining the tool to a new team member — be specific about what it does, what it returns, and any limitations.

Building a Tool Registry

In production, you need a systematic way to register, discover, and execute tools. Here is a clean pattern:

from typing import Callable, Any
import json
import inspect

class ToolRegistry:
    def __init__(self):
        self._tools: dict[str, dict] = {}
        self._executors: dict[str, Callable] = {}

    def register(self, func: Callable, description: str, parameters: dict):
        name = func.__name__
        self._tools[name] = {
            "type": "function",
            "function": {
                "name": name,
                "description": description,
                "parameters": parameters,
            },
        }
        self._executors[name] = func

    def get_tool_definitions(self) -> list[dict]:
        return list(self._tools.values())

    def execute(self, name: str, arguments: dict) -> Any:
        if name not in self._executors:
            return {"error": f"Unknown tool: {name}"}
        try:
            return self._executors[name](**arguments)
        except Exception as e:
            return {"error": f"Tool execution failed: {str(e)}"}

# Usage
registry = ToolRegistry()

def get_weather(city: str, units: str = "celsius") -> dict:
    # In production, call a real weather API
    return {"city": city, "temperature": 22, "units": units, "condition": "sunny"}

registry.register(
    func=get_weather,
    description="Get current weather for a city. Returns temperature and conditions.",
    parameters={
        "type": "object",
        "properties": {
            "city": {"type": "string", "description": "City name"},
            "units": {
                "type": "string",
                "enum": ["celsius", "fahrenheit"],
                "description": "Temperature units (default: celsius)",
            },
        },
        "required": ["city"],
    },
)

Error Handling in Tool Execution

Tools fail. APIs time out, databases go down, users pass invalid arguments. How you handle tool errors determines whether your agent recovers gracefully or spirals into confusion.

def safe_execute_tool(registry: ToolRegistry, name: str, raw_args: str) -> str:
    """Execute a tool with comprehensive error handling."""
    # Parse arguments
    try:
        arguments = json.loads(raw_args)
    except json.JSONDecodeError as e:
        return json.dumps({
            "error": "Invalid arguments format",
            "details": str(e),
            "suggestion": "Please provide valid JSON arguments",
        })

    # Execute with timeout protection
    try:
        result = registry.execute(name, arguments)
        return json.dumps(result, default=str)
    except TimeoutError:
        return json.dumps({
            "error": f"Tool '{name}' timed out",
            "suggestion": "Try again with a simpler query or different parameters",
        })
    except Exception as e:
        return json.dumps({
            "error": f"Tool '{name}' failed: {str(e)}",
            "suggestion": "Check the arguments and try again",
        })

The key insight is to always return structured error messages to the LLM, not raw exceptions. Include a suggestion field — it guides the LLM toward recovery instead of just repeating the same failing call.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Tool Permissions and Safety

Not all tools should be available to all agents. A customer-facing agent should not have access to delete_database. Implement tool-level permissions:

class PermissionedToolRegistry(ToolRegistry):
    def __init__(self):
        super().__init__()
        self._permissions: dict[str, str] = {}  # tool_name -> permission level

    def register(self, func, description, parameters, permission="read"):
        super().register(func, description, parameters)
        self._permissions[func.__name__] = permission

    def get_tools_for_level(self, level: str) -> list[dict]:
        levels = {"read": 0, "write": 1, "admin": 2}
        max_level = levels.get(level, 0)
        return [
            self._tools[name]
            for name, perm in self._permissions.items()
            if levels.get(perm, 0) <= max_level
        ]

FAQ

How many tools should an agent have access to?

Keep it under 20 for most agents. Research shows that LLM tool selection accuracy degrades as the number of available tools increases. If you need more, use a router pattern — a first LLM call selects the relevant tool category, then a second call picks the specific tool from a smaller set.

Should tool descriptions include examples?

Yes, especially for tools with complex parameters. Including a brief example in the description (like "Example: search_invoices(status='overdue', client_name='Acme')") significantly improves the LLM's ability to construct correct arguments.

How do I test tools independently from the agent?

Write unit tests for each tool function that verify correct outputs for valid inputs and proper error handling for invalid inputs. Then write integration tests that run the full agent loop with mock tool responses to verify the agent calls tools correctly. Test tools in isolation before testing them within the agent.


#ToolUse #FunctionCalling #AIAgents #Python #APIDesign #AgenticAI #LearnAI #AIEngineering

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

AI Agents

Personal AI Assistant: How to Pick One for Business in 2026

A founder's guide to the personal AI assistant market: best AI assistant apps, business-grade options, and how CallSphere's voice agent fits in.

AI Agents

Free AI Agents in 2026: When Free Wins and When It Costs You

A founder's guide to free AI agents, low-code AI agent builders, and how to know when you should pay for a real platform like CallSphere.

Agentic AI

Graphiti: How Temporal Knowledge Graphs Give AI Voice Agents Persistent Memory (2026 Guide)

Graphiti is the open-source temporal knowledge graph for AI agents in 2026. Learn how bi-temporal memory beats vector RAG for voice agents and long-running LLMs.

AI Agents

Chatbot App vs ChatGPT: What's the Difference, and Which Do I Need?

Chatbot app vs ChatGPT in 2026: a founder's clear take on the difference, when to use which, and how a real AI chatbot app development works.

HVAC

Building an HVAC After-Hours Emergency Escalation System: A Complete Engineering Guide

How we built a fault-tolerant HVAC emergency triage and tech-dispatch platform on Kubernetes — three-tier CQRS, 11 micro-agents on the OpenAI Agents SDK + LangGraph, NATS JetStream, DTMF/SMS/WebSocket acceptance, circuit breakers, and an evaluation pipeline that catches regressions before they wake a tech at 3 AM.

AI Engineering

GPT-Realtime-2 Tool Use and Reasoning: GPT-5-Class Voice Agents

GPT-Realtime-2 brings GPT-5-class reasoning into voice. What that means for tool-call reliability, structured output, and production agent design.