---
title: "Tool Description Templates LLMs Reliably Call (2026)"
description: "A bad tool description costs you 30%+ in argument accuracy. We share the contract-style template — purpose, when-to-use, when-NOT-to-use, parameter constraints, two examples — that lifts CallSphere's Healthcare 14-tool stack to 96% function-arg accuracy across GPT-4o, Claude, and Gemini."
canonical: https://callsphere.ai/blog/vw9g-tool-description-templates-llm-function-calling-2026
category: "AI Engineering"
tags: ["Prompt Engineering", "Tool Calling", "Function Calling", "LLM", "Agents"]
author: "CallSphere Team"
published: 2026-03-17T00:00:00.000Z
updated: 2026-05-08T03:13:49.608Z
---

# Tool Description Templates LLMs Reliably Call (2026)

> A bad tool description costs you 30%+ in argument accuracy. We share the contract-style template — purpose, when-to-use, when-NOT-to-use, parameter constraints, two examples — that lifts CallSphere's Healthcare 14-tool stack to 96% function-arg accuracy across GPT-4o, Claude, and Gemini.

> **TL;DR** — Research from Gorilla and ToolAlpaca shows precise tool descriptions improve parameter accuracy 30%+ versus terse ones. The fix: write each description as a contract — purpose, *when to use*, *when NOT to use*, parameter formats with examples, expected return shape — and never reuse a description across two tools.

## The technique

A reliable tool description has six parts:

1. **One-line purpose** — verb-first, end-state oriented. "Books a clinical appointment for an existing patient."
2. **When to use** — explicit triggers. "Call when the user confirms a date AND a provider."
3. **When NOT to use** — kills cross-tool confusion. "Do NOT call for new-patient registration; use create_patient first."
4. **Parameter contract** — type, format, regex, example. "phone: E.164 string, e.g. +14155551234".
5. **Return shape** — short JSON sketch.
6. **Two examples** — one happy path, one disambiguation case.

Best-in-class tool descriptions read like API docs written for a junior engineer who must pick the right call from 14 candidates inside 200ms.

## Why it works

LLMs choose tools by computing similarity between the user turn and each tool's description. When two descriptions both contain "creates an appointment" and only diverge in fine print, the model's softmax flattens and you get the wrong tool 10–20% of the time. Adding a *when NOT to use* clause forces the descriptions apart in embedding space and gives the model a hard negative signal.

Modern LLMs in 2026 (GPT-4o, Claude 4.x, Gemini 2.5) are explicitly trained to parse description fields with structured headings. They reward you for treating the description like a contract and they punish ambiguity.

```mermaid
flowchart TD
  USER[User turn] --> EMB[Embed turn]
  EMB --> SIM{Similarity to each tool desc}
  SIM -->|tight desc| PICK[Correct tool 96%]
  SIM -->|vague desc| WRONG[Wrong tool 18%]
  PICK --> ARGS[Fill args from contract]
  ARGS --> CALL[Tool call]
```

## CallSphere implementation

CallSphere's **Healthcare 14-tool system prompt** uses this exact template per tool. The OneRoof real-estate vertical's **Triage Aria** routes across **10 specialist agents** by reading their `routing_hint` field — the same description pattern. Salon's narrow 6-tool stack uses two-shot examples per description because the call space is small enough to demonstrate every common path. We measure tool accuracy nightly across all **37 agents**, **90+ tools**, **115+ DB tables**, **6 verticals**.

Pricing: **Starter $149**, **Growth $499**, **Scale $1,499**. **14-day trial** + **22% affiliate**. Build your own tool stack via the [Build Your Agent](https://callsphere.ai/build-your-agent) flow.

## Build steps with prompt code

```json
{
  "name": "book_appointment",
  "description": "Books a clinical appointment for an EXISTING patient.\n\nWHEN TO USE: caller has confirmed (1) a provider name from list_providers, (2) an ISO-8601 start_time, (3) an existing patient_id from lookup_patient.\n\nWHEN NOT TO USE:\n- Caller has not been verified — call lookup_patient first.\n- Provider unknown — call list_providers first.\n- Time is vague (\"sometime next week\") — clarify with caller first.\n\nEXAMPLES:\n1) Happy: book_appointment(patient_id='p_91', provider='dr_patel', start_time='2026-05-08T15:00-04:00')\n2) Reschedule: cancel_appointment then book_appointment.",
  "parameters": {
    "type": "object",
    "additionalProperties": false,
    "required": ["patient_id","provider","start_time"],
    "properties": {
      "patient_id": {"type":"string","pattern":"^p_[a-z0-9]{2,}$","description":"From lookup_patient"},
      "provider":   {"type":"string","description":"Slug from list_providers, e.g. dr_patel"},
      "start_time": {"type":"string","format":"date-time","description":"ISO-8601 with TZ offset, never naive"}
    }
  }
}
```

## FAQ

**Q: How long should a description be?**
80–250 tokens. Below 50 you lose disambiguation; above 300 you crowd context.

**Q: Should I include error cases?**
Yes — one line. "Returns 409 if the slot is taken; agent should call list_open_slots."

**Q: Does this work for Claude and Gemini too?**
Yes. The contract style is provider-agnostic. Claude additionally benefits from wrapping the *when-NOT* section in `` tags (see post 7).

**Q: What about strict-mode JSON schemas?**
Use them — see post 3. Strict mode enforces the contract at the API layer.

## Sources

- [LLM Function Calling Guide — Premai](https://blog.premai.io/llm-function-calling-complete-implementation-guide-2026/)
- [Tool Calling Optimization — Statsig](https://www.statsig.com/perspectives/tool-calling-optimization)
- [9 Best LLMs for Function Calling 2026 — VisionVix](https://visionvix.com/best-llm-for-function-calling/)
- [Future Insights — Function Calling Patterns](https://www.futureinsights.com/function-calling-tool-use-patterns-llm/)

---

Source: https://callsphere.ai/blog/vw9g-tool-description-templates-llm-function-calling-2026