Skip to content
Tool Use in LLMs: How Function Calling Actually Works Under the Hood
Agentic AI & LLMs5 min read29 views

Tool Use in LLMs: How Function Calling Actually Works Under the Hood

By Sagar Shankaran, Founder of CallSphere

Quick answer

A deep technical walkthrough of how large language models invoke external tools via function calling, covering token-level mechanics, schema injection, and reliability patterns.

Key takeaways

From Text Completion to Tool Invocation

Large language models were originally designed to predict the next token in a sequence. Yet in 2025-2026, tool use has become a first-class capability across GPT-4o, Claude, Gemini, and open-source models like Llama 3.3. Understanding how function calling works beneath the surface is critical for anyone building AI-powered applications.

How Tool Definitions Reach the Model

When you define tools in an API call, the provider serializes your function schemas into the model's context. For example, with OpenAI's API:

{
  "tools": [{
    "type": "function",
    "function": {
      "name": "get_weather",
      "description": "Get current weather for a location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": { "type": "string" },
          "unit": { "type": "string", "enum": ["celsius", "fahrenheit"] }
        },
        "required": ["location"]
      }
    }
  }]
}

This JSON schema gets converted into a structured prompt segment that the model sees as part of its system context. The model has been fine-tuned (via RLHF and supervised fine-tuning on tool-use datasets) to recognize when a user query requires tool invocation and to emit a structured JSON response matching the schema.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

The Token-Level Mechanics

Under the hood, function calling works through constrained decoding:

flowchart TD
    HUB(("From Text Completion to<br/>Tool Invocation"))
    HUB --> L0["How Tool Definitions Reach<br/>the Model"]
    style L0 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
    HUB --> L1["The Token-Level Mechanics"]
    style L1 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
    HUB --> L2["Parallel and Sequential Tool<br/>Calls"]
    style L2 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
    HUB --> L3["Reliability Challenges"]
    style L3 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
    HUB --> L4["Production Hardening<br/>Patterns"]
    style L4 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
    HUB --> L5["The Bigger Picture"]
    style L5 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
    style HUB fill:#4f46e5,stroke:#4338ca,color:#fff
  1. Intent recognition: The model determines that the user's request maps to one of the available tools rather than a direct text answer
  2. Schema-guided generation: The model generates a JSON object with the function name and arguments, constrained by the provided schema
  3. Stop sequence: The model emits a special stop reason (e.g., tool_use or function_call) instead of the normal end-of-turn token
  4. Execution loop: The calling application executes the function and injects the result back into the conversation for the model to synthesize

Parallel and Sequential Tool Calls

Modern LLMs support parallel tool calling, where the model requests multiple function invocations in a single turn:

# Claude's tool_use response may contain multiple tool blocks
for block in response.content:
    if block.type == "tool_use":
        result = execute_tool(block.name, block.input)
        tool_results.append({
            "type": "tool_result",
            "tool_use_id": block.id,
            "content": result
        })

Sequential tool calls happen when the model needs the output of one tool to determine the input of the next. The model handles this by making a single tool call, receiving the result, then deciding whether to call another tool or respond to the user.

Reliability Challenges

Tool use introduces several failure modes:

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

  • Schema hallucination: The model invents parameters not in the schema or passes invalid types
  • Tool selection errors: The model picks the wrong tool for the task
  • Argument extraction failures: Ambiguous user input leads to incorrect parameter values
  • Infinite loops: The model repeatedly calls the same tool without making progress

Production Hardening Patterns

Teams shipping tool-use systems in production adopt several patterns:

  • Strict mode: OpenAI and Anthropic both support strict schema validation that guarantees the output conforms to the JSON schema
  • Retry with feedback: When a tool call fails, inject the error message back into the conversation so the model can self-correct
  • Tool call limits: Cap the number of tool calls per turn to prevent runaway loops
  • Fallback responses: If tool execution fails after retries, have the model respond gracefully without the tool result

The Bigger Picture

Tool use transforms LLMs from knowledge retrieval systems into action-taking agents. As tool ecosystems mature through standards like Anthropic's Model Context Protocol (MCP), the boundary between "chatbot" and "software agent" continues to blur.

Sources: Anthropic Tool Use Documentation | OpenAI Function Calling Guide | Gorilla LLM Research

flowchart LR
    IN(["Input prompt"])
    subgraph PRE["Pre processing"]
        TOK["Tokenize"]
        EMB["Embed"]
    end
    subgraph CORE["Model Core"]
        ATTN["Self attention layers"]
        MLP["Feed forward layers"]
    end
    subgraph POST["Post processing"]
        SAMP["Sampling"]
        DETOK["Detokenize"]
    end
    OUT(["Generated text"])
    IN --> TOK --> EMB --> ATTN --> MLP --> SAMP --> DETOK --> OUT
    style IN fill:#f1f5f9,stroke:#64748b,color:#0f172a
    style CORE fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff
flowchart TD
    HUB(("From Text Completion to<br/>Tool Invocation"))
    HUB --> L0["How Tool Definitions Reach<br/>the Model"]
    style L0 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
    HUB --> L1["The Token-Level Mechanics"]
    style L1 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
    HUB --> L2["Parallel and Sequential Tool<br/>Calls"]
    style L2 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
    HUB --> L3["Reliability Challenges"]
    style L3 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
    HUB --> L4["Production Hardening<br/>Patterns"]
    style L4 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
    HUB --> L5["The Bigger Picture"]
    style L5 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
    style HUB fill:#4f46e5,stroke:#4338ca,color:#fff
Share
S

Written by

Sagar Shankaran· Founder, CallSphere

Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.