Wiring MCP Servers: Auth, Schemas & Error Handling
Production MCP wiring for Claude — OAuth and API keys, schema validation, structured error handling, idempotency keys, and safe retries that hold up in real use.
The demo version of an MCP server runs on stdio with no auth, returns happy-path data, and never sees a retry. The production version crosses a network, authenticates a real user, validates hostile input, fails gracefully, and survives an agent that calls it twice. Everything that makes the second version hard lives in four areas: authentication, schemas, error handling, and idempotency. Get these right and your server is something you can put in front of customers; get them wrong and you have a liability the model will eventually trip over.
This post is about wiring — the connective tissue between Claude and your systems — and it assumes you already know how to register a tool. The focus is on the concerns that only appear when the server stops being a toy.
Authentication: where the trust boundary lives
On stdio, there is no auth because the host launched the server as its own subprocess — the trust boundary is the machine. The moment you move to a remote server over streamable HTTP, that changes: now an arbitrary client could reach your server across a network, and you must establish who is calling and what they may do. Model Context Protocol is an open standard, and for remote servers it leans on standard web authentication rather than inventing its own — OAuth flows for user-delegated access and API keys or bearer tokens for service access.
The pattern that matters most is delegated authorization. When a user connects a remote server, you want the server to act as that specific user against the downstream system, not with a god-mode service account. OAuth gives you this: the user authorizes once, the server holds a scoped token, and every tool call inherits the user's permissions. This means a CRM server cannot read records the connected user could not read themselves — the authorization boundary is enforced downstream, not reimplemented in your tool logic. Reaching for a single shared API key is the shortcut that turns one compromised session into a full-database breach.
Schemas as your first line of defense
Every tool's JSON Schema is not just documentation for the model — it is a validation gate the host enforces before your handler runs. Use it aggressively. Mark required fields required so the model cannot omit them. Use enums for any field with a fixed value set so the model cannot invent options. Add patterns and formats where input shape is predictable, like an email or an ID format. Each constraint you encode is an entire class of bad input that never reaches your business logic.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
But schema validation is necessary, not sufficient. The schema knows that order_id is a string matching a pattern; it does not know that this particular order belongs to this particular user. Layer a second validation inside the handler for semantic and authorization checks the schema cannot express. The division of labor is clean: the schema rejects malformed input, the handler rejects unauthorized or nonsensical input, and only well-formed, permitted requests reach the system of record.
flowchart TD
A["Claude emits tool call"] --> B{"Authenticated?"}
B -->|No| C["Return auth error"]
B -->|Yes| D{"Args valid vs schema?"}
D -->|No| E["Return validation error"]
D -->|Yes| F{"Idempotency key seen?"}
F -->|Yes| G["Return cached result"]
F -->|No| H["Execute & record key"]
H --> I["Return structured result"]Error handling the model can reason about
When a tool fails, you have a choice that most people get wrong on the first pass: throw, or return a structured error. Throwing an unhandled exception surfaces to the host as an opaque failure, and the model sees a dead end with nothing to do. Returning a structured error — "order A-1043 not found; verify the ID from the confirmation email" — gives the model something to act on. It can ask the user to re-check the ID, try a different approach, or explain the problem clearly. Errors are part of your tool's interface, and you should design their messages for a model reader.
Distinguish the categories. A validation error means the input was wrong and the model should fix it. An authorization error means the user lacks permission and the model should say so rather than retry. A transient error — a timeout, a downstream 503 — means the call might succeed if tried again. Signal these differently so the model's response is appropriate: retrying a permission error is pointless and retrying a not-found is worse, but retrying a timeout is exactly right. Encoding that distinction in your error responses is what separates a server that degrades gracefully from one that confuses the agent.
Idempotency and safe retries
Agents retry. A slow response, a dropped connection, or the model losing track of whether a call completed can all produce a duplicate invocation, and on a mutating tool that means a second charge, a second ticket, a second email. The defense is idempotency. Accept a client-supplied idempotency key on every tool with side effects, record it when you execute, and on a repeat key return the original result instead of executing again. The model can call send_confirmation three times and the customer gets one email.
Make this concrete in the wiring. Add an optional idempotency key to the schema of mutating tools. In the handler, check the key against a store before the write; if seen, short-circuit to the recorded outcome. Choose a key with the right scope — tied to the logical operation, not the wall clock — so a legitimate retry collides but a genuinely new request does not. For read-only tools, skip all of this; idempotency is purely a write-path concern, and adding it to reads is wasted complexity.
Timeouts, rate limits, and the host as a circuit breaker
A tool that hangs is worse than one that fails, because a hung call stalls the entire agent turn. Set timeouts on every downstream call inside your handler and convert a timeout into a clean, retryable error rather than letting it block indefinitely. Pair this with rate limiting: a model in a loop can hammer a tool, and your server should protect the downstream system with limits that return a clear "slow down" error the model can respect.
Remember that the host is also a control point you can lean on. Because every tool call funnels through the host, it can require human approval for sensitive operations, enforce its own rate limits, and isolate a misbehaving server's session without affecting others. Design your server assuming the host may interpose — return clean, structured responses at every step so that whether a human or the model is reading the result, the next action is obvious.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Observability is not optional
When an agent does something surprising in production, logs are your only reconstruction of events. Log every tool call with its arguments, the authenticated identity, the outcome, and the latency. Redact secrets, but capture enough to answer "why did the agent do that?" after the fact. Without this, an autonomous system that takes real actions is a black box, and the first incident becomes an unsolvable mystery. With it, you can trace any action back through the host to the exact call and arguments that caused it.
Frequently asked questions
How do remote MCP servers handle authentication?
Through standard web auth — typically OAuth for user-delegated access and API keys or bearer tokens for service access. The strong pattern is delegated authorization, where the server holds a scoped token and acts as the specific connected user, so tool calls inherit that user's permissions rather than running with a shared, over-privileged account.
Should a failing tool throw or return an error?
Return a structured error. Throwing surfaces as an opaque failure the model cannot act on, whereas a clear error message — and a signal of whether it is a validation, authorization, or transient problem — lets the model fix input, explain the issue, or retry appropriately. Error messages are part of your tool's interface.
Where do I add idempotency keys?
Only on tools with side effects. Add an optional key to the mutating tool's schema, record it on execution, and return the original result on a repeat. This protects against the duplicate calls agents inevitably make on retries. Read-only tools are naturally safe to repeat and need no key.
Can schema validation replace handler validation?
No. The schema rejects malformed input — wrong types, missing required fields, invalid enums — before your handler runs, but it cannot express semantic or authorization rules like whether this record belongs to this user. You need both: the schema as the first gate and handler checks as the second.
Bringing agentic AI to your phone lines
Auth, validation, idempotency, and graceful failure are exactly what let CallSphere's voice and chat agents call real systems mid-conversation and book work safely, 24/7. See production-grade tool use in action at callsphere.ai.
Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.