Where Tool-Calling Reliability Comes From

Most production teams blame the model when tool calls fail. The truth in 2026: most failures are schema-design failures. Bad parameter names, vague descriptions, ambiguous types, and schemas that overlap each other produce far more failed tool calls than model limitations.

This piece walks through the schema-design patterns that hold up in production.

What a Tool Schema Has to Communicate

flowchart TB
    Schema[Tool Schema] --> Name[Name: clear, distinct]
    Schema --> Desc[Description: when to call]
    Schema --> Params[Parameters: well-typed]
    Schema --> Ret[Return: expected shape]
    Schema --> Side[Side effects: noted]

Five things the model must understand to call your tool correctly.

Naming

Function names matter. The 2026 best practices:

Verb-first: book_appointment not appointment_booking
Unambiguous: get_patient_by_phone not lookup
Distinct from siblings: avoid two tools named similarly
Lowercase snake_case unless the platform forces another convention

A common mistake: shipping search, find, get, and lookup as four different tools. The model picks one randomly.

Descriptions

The description is read by the model on every call. It should describe:

What the tool does
When to call it (the trigger condition)
When NOT to call it (the negative criterion)
Constraints on inputs

Example of a weak description: "Books an appointment."

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Try Live Demo ROI Calculator

Example of a strong description: "Book an appointment for an existing patient. Use this only after verifying the patient exists via lookup_patient_by_phone or lookup_patient_by_id. Do not use this to reschedule existing appointments — use reschedule_appointment for that. Returns the booking reference and confirmation details."

Parameter Types

The 2026 reliable types:

string for free-form text
integer and number for numerical values
boolean for binary
enum for fixed-vocabulary fields
array of typed elements
object with explicit nested properties

Avoid:

any (the model fills this with anything)
Stringified JSON ("send a JSON object as a string") — produces brittle outputs
Optional vs required confusion — be explicit

Required vs Optional

Mark only what is truly required as required. Required-on-every-call parameters that may be missing produce hallucinated values. Better to mark as optional and validate.

Examples in Schema

OpenAPI / JSON Schema's examples field is read by frontier LLMs. Use it. A few good examples in the schema dramatically improve call quality.

Avoiding Tool Overlap

flowchart LR
    Q[Query: 'find John's appointment'] --> Conf{Tools available}
    Conf --> A1[get_appointment_by_id]
    Conf --> A2[search_appointments]
    Conf --> A3[lookup_patient_appointments]

Three overlapping tools confuse the model. The fix: collapse to one tool with a richer parameter set, or differentiate clearly with non-overlapping descriptions.

Versioning

Tools change. Versioning patterns that work:

book_appointment_v2 if the new version has incompatible behavior
Same name with new optional parameters if backward compatible
Deprecate and remove old versions cleanly; do not leave them in the schema list

A Production Schema Example

{
  "name": "book_appointment",
  "description": "Book a new appointment for an existing patient. Returns booking reference. Use only after verifying patient exists.",
  "parameters": {
    "type": "object",
    "properties": {
      "patient_id": {
        "type": "string",
        "description": "Patient UUID. Must be obtained from lookup_patient_*. Do not invent."
      },
      "provider_id": {
        "type": "string",
        "description": "Provider UUID."
      },
      "start_time": {
        "type": "string",
        "format": "date-time",
        "description": "ISO 8601 with timezone. Must come from get_available_slots output."
      },
      "appointment_type": {
        "type": "string",
        "enum": ["new_patient", "follow_up", "emergency", "consultation"]
      },
      "notes": {
        "type": "string",
        "description": "Optional free-text notes from the call."
      }
    },
    "required": ["patient_id", "provider_id", "start_time", "appointment_type"]
  }
}

Note: the descriptions tell the model not just what each parameter is, but where to get it.

Validating Tool Calls

Schema validation is server-side, not just LLM-side. Treat the LLM as untrusted input:

Validate the JSON against the schema
Reject and return a structured error if invalid
Don't crash; let the LLM see the error and retry

flowchart LR
    LLM[LLM emits tool call] --> Val{Validate}
    Val -->|valid| Run[Execute]
    Val -->|invalid| Err[Return structured error]
    Err --> LLM

This loop is short and reliable. The LLM sees its mistake and fixes it; it does not loop indefinitely if your error messages are specific.

Common Failure Modes

Missing required parameter: usually a description weakness; add explicit guidance
Hallucinated IDs: usually a missing constraint; add "must come from lookup tool" to descriptions
Wrong type: usually a schema weakness; tighten types
Wrong tool selected: usually overlap; consolidate or differentiate
Repeated identical calls: a flag that the loop is broken — add caching, add observation that "you already called this"

Sources

OpenAI function calling — https://platform.openai.com/docs/guides/function-calling
Anthropic tool use — https://docs.anthropic.com/claude/docs/tool-use
"BFCL" Berkeley Function Calling — https://gorilla.cs.berkeley.edu
"Tool use in LLMs" survey — https://arxiv.org/abs/2304.08354
JSON Schema specification — https://json-schema.org

Tool-Calling Schemas That Don't Break: Robust Function Definitions

Where Tool-Calling Reliability Comes From

What a Tool Schema Has to Communicate

Naming

Descriptions

Parameter Types

Required vs Optional

Examples in Schema

Avoiding Tool Overlap

Versioning

A Production Schema Example

Validating Tool Calls

Common Failure Modes

Sources

Try CallSphere AI Voice Agents

Related Articles You May Like

Hierarchical Goal Trees in Production AI Agents

Agent Latency Budgets: How to Hit Sub-Second Decisions

Decision-Making in AI Agents: Bayesian, Utility, and Heuristic Approaches

Designing Agents for High-Stakes Decisions: Confidence Calibration in Production

Agent Loop Design Patterns: Plan-Execute-Reflect for Production Autonomy

Structured Output Prompts: JSON Schema, XML, and Function-Call Modes