Skip to content
Agentic AI
Agentic AI8 min read0 views

MCP Design Patterns: Structuring Tools, Prompts & Context

Reusable MCP patterns for Claude agents — intent-shaped tools, tight schemas, lean outputs, idempotent writes, and curated context that scale reliably.

A working MCP server and a good MCP server are different things. The first answers a demo; the second survives a hundred agents calling it in production without the model misfiring, flooding context, or picking the wrong tool. The gap between them is patterns — the reusable, code-level decisions about how you name tools, shape their schemas, write their descriptions, and decide what reaches the model at all. This post collects the patterns that have repeatedly earned their keep when building MCP integrations for Claude.

These are not abstract principles. Each one changes specific lines of code, and each one fixes a specific failure mode you will otherwise hit. Treat them as a checklist you run your server against before you trust it with real traffic.

Design tools around intents, not endpoints

The instinct when wrapping an existing API is to mirror it: one MCP tool per REST endpoint. Resist it. A model does not think in endpoints; it thinks in intents. If completing a common task requires the model to call four tools in sequence and thread IDs between them, you have pushed orchestration complexity onto the model, and it will sometimes get the sequence wrong. The better pattern is a coarser tool that expresses an intent — schedule_appointment rather than separate check_availability, reserve_slot, and confirm_booking calls — with your server doing the orchestration internally.

Model Context Protocol lets a server expose tools, resources, and prompts, and the art is choosing the right granularity for each. Too fine, and the model drowns in choices and chains them badly. Too coarse, and a single tool tries to do everything and its schema becomes an unreadable mess of optional fields. Aim for tools that map to the verbs a user would actually say, and let the server hide the multi-step machinery behind each verb.

Write schemas that constrain, and descriptions that teach

Every tool carries two pieces of metadata the model reads: a JSON Schema and a description. They do different jobs and you should write them differently. The schema's job is to make invalid calls impossible — required fields marked required, enums instead of free strings for fixed choices, formats and patterns where input shape matters. A tight schema means the host rejects malformed arguments before your handler runs, turning a class of runtime errors into impossible states.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

The description's job is to teach the model when and how to use the tool. Write it for a competent colleague who has never seen your system. State what the tool does, what it returns, and crucially when not to use it — the negative guidance prevents the model from reaching for a tool in situations it half-fits. Put concrete detail in field descriptions too: "order_id, the alphanumeric ID printed on the confirmation email, e.g. A-1043" tells the model far more than "the order id." Descriptions are prompt engineering, and you should iterate on them with the same rigor.

flowchart TD
  A["Incoming tool request"] --> B{"Args valid vs schema?"}
  B -->|No| C["Host rejects before handler"]
  B -->|Yes| D["Handler runs intent logic"]
  D --> E{"Result large?"}
  E -->|Yes| F["Summarize / paginate output"]
  E -->|No| G["Return structured content"]
  F --> G
  G --> H["Model reasons with result"]

Shape outputs for a context window, not a log file

A pattern teams discover the hard way: tool outputs go into context, and context is finite and expensive. A tool that returns a 5,000-row query result does not help the model — it buries the answer and crowds out the rest of the conversation. The pattern is to shape outputs for consumption. Return the fields that matter, summarize the rest, and paginate large results with an explicit cursor the model can follow if it needs more.

This is where a token budget becomes a design constraint. Before returning, ask what the model actually needs to act. A status-check tool should return the status, not the entire order record with every historical event. If a tool genuinely produces large data, return a compact summary plus a handle the model can use to drill in — a follow-up tool call to fetch detail on demand. This keeps each turn lean and lets the model pull depth only when the task requires it.

Make side effects explicit and idempotent

Read tools are forgiving; write tools are not. Agents retry, and a model may call create_invoice twice if the first response is slow or it loses track of state. The pattern that saves you is idempotency: accept a client-supplied idempotency key on mutating tools, and have your server deduplicate so a retried call returns the original result instead of creating a second invoice. Bake this into the schema as an optional key the model can pass, and into the handler as a check before the write.

Equally important is making side effects legible. Name mutating tools with action verbs that signal consequence, and consider routing them through the host's approval mechanism so a human can confirm before money moves or records change. The architecture gives you that control point precisely because the model's judgment about when to act is good but not perfect. Use it for anything irreversible.

Use prompts as reusable, parameterized playbooks

The third MCP primitive — prompts — is underused. A prompt is a server-defined template the host can surface to the user or the model, parameterized with arguments. The pattern is to encode your team's standard procedures as prompts so they are invoked consistently rather than re-typed each time. A support server might expose a triage_ticket prompt that lays out the exact steps to classify and route an issue, with the ticket ID as a parameter.

This matters because consistency is reliability. When the same task is expressed slightly differently each time it runs, the agent's behavior drifts. A parameterized prompt fixes the procedure in one place, versioned and reviewable, and turns "how we do this" into something the server ships rather than something a user remembers to say. Pair prompts with skills — which teach Claude how to use a server's tools — and you get a server that is not just callable but genuinely teachable.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Keep the connected surface small and curated

The final pattern is restraint. Every connected server adds its tool descriptions to the model's context, and every tool is one more option the model weighs when choosing what to call. Connect thirty servers and the model spends reasoning on selection, sometimes badly, and your context fills with descriptions before the conversation even starts. The pattern is curation: connect the servers a task needs and no more, and within a server, expose the tools that earn their place. A focused toolset produces sharper tool selection than an exhaustive one, every time.

Frequently asked questions

How fine-grained should MCP tools be?

Coarse enough to map to a user intent, fine enough that the schema stays readable. Mirroring REST endpoints one-to-one usually creates too many tools and forces the model to orchestrate multi-step flows it will sometimes get wrong. Prefer intent-level tools that hide the orchestration inside the server.

Where does the model's tool-selection behavior come from?

Primarily from tool descriptions and schemas delivered at runtime. The description tells the model when a tool applies and the schema constrains how it is called. If the model picks the wrong tool or calls one at the wrong time, sharpen the descriptions — including explicit guidance on when not to use each tool.

How do I stop tool outputs from blowing up the context window?

Shape outputs for consumption rather than completeness. Return the fields that matter, summarize or paginate large results, and provide a follow-up tool the model can call to fetch detail on demand. Treat the token cost of every returned payload as a design constraint, not an afterthought.

Do I need idempotency on every tool?

Only on tools with side effects. Reads are naturally safe to repeat. For writes, accept an idempotency key and deduplicate, because agents retry and a model can issue the same mutating call twice. Reversible reads need none of this; irreversible writes need all of it.

Bringing agentic AI to your phone lines

These same patterns — intent-shaped tools, lean outputs, idempotent writes — are how CallSphere keeps its voice and chat agents reliable while they use tools mid-call and book work around the clock. See the patterns at work at callsphere.ai.


Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.