Wiring tools and MCP servers into Claude batch jobs
Use tools and MCP servers in Claude Message Batches safely: strict schemas, auth, error handling, and custom_id-based idempotency across many requests.
Tools turn a batch from a text generator into a workhorse: each of your hundred thousand requests can call a function, hit an API, or run code on Anthropic's side. But the batch model — fire, wait hours, pull results — changes the rules for how tools behave. There is no human watching a tool call resolve in real time, no easy mid-flight retry, and a tool with side effects can do real damage when it runs fifty thousand times unattended. This post is about wiring tools and MCP into batch requests so they stay safe, recoverable, and idempotent.
Key takeaways
- Inside a batch, prefer server-side tools (code execution, web search) — they resolve entirely on Anthropic's infrastructure with no client-side loop to babysit across an async window.
- For client-side tools, a single batched request runs the full agentic loop server-side only up to the point a tool result is needed; plan your tool surface so common cases resolve without a round trip you cannot service mid-batch.
- Write tool schemas with strict typing and enums so the inputs Claude produces are valid across thousands of varied prompts, not just your happy-path test.
- Make every side-effecting tool idempotent — derive an idempotency key from the
custom_idso a resubmitted request cannot double-charge or double-write. - Handle tool failures as
is_errorresults the model can react to, and reconcile tool-call outcomes separately from message-level batch outcomes.
Server-side tools: the natural fit for batches
The cleanest tools to use in a batch are the ones you never have to execute yourself. Server-side tools — code execution and web search and fetch — run on Anthropic's infrastructure. You declare them in the tools array, and Claude runs the tool loop internally; the result that lands in your batch output already incorporates whatever the tool produced. There is no pause, no client callback, no partial state to manage across the hours your batch is in flight.
Request(
custom_id=f"analyze-{row_id}",
params=MessageCreateParamsNonStreaming(
model="claude-opus-4-8",
max_tokens=4096,
messages=[{"role": "user",
"content": f"Compute summary statistics for: {csv_blob}"}],
tools=[{"type": "code_execution_20260120", "name": "code_execution"}],
),
)For batch workloads that need computation or fresh information per item — running a calculation, checking a current fact — server-side tools are almost always the right call. They keep the entire interaction inside the single async request, which is exactly what the batch model wants.
Client-side and MCP tools: design for the async boundary
Client-side tools and MCP servers introduce a tool the platform cannot execute for you — your code, or your MCP endpoint, has to run it. In an interactive session that is a simple loop. In a batch, the boundary is harder: the request is processed once, asynchronously, and there is no live channel for you to return a tool result mid-flight. The Claude API does support an mcp_servers parameter that lets Claude connect directly to remote MCP servers, which keeps the resolution server-side — that is the pattern to reach for when you need MCP capabilities inside a batch.
The Model Context Protocol is an open standard that connects Claude to external tools and data through MCP servers, letting a single tool definition expose a whole capability surface. When you wire a remote MCP server into a batched request, Claude calls it over the connection during processing, and the resolved result is folded into the output you eventually pull — no client round trip required.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
flowchart TD
A["Batched request\nwith mcp_servers"] --> B["Claude processes async"]
B --> C{"Tool needed?"}
C -->|No| D["Compose answer"]
C -->|Yes, server-side| E["Claude calls remote\nMCP server over connection"]
E --> F["MCP server authenticates\n+ returns structured data"]
F --> G{"Tool error?"}
G -->|is_error| H["Model adapts\nor records failure"]
G -->|ok| D
H --> D
D --> I["Result written\nto batch store by custom_id"]Schema design that survives 100,000 varied prompts
A tool schema that works on your five test prompts can break on the ten-thousandth real one. The fix is to constrain the input space tightly. Use enum for any field with a fixed value set, mark strict: true to guarantee the parameters validate against your schema, and set additionalProperties: false so the model cannot invent fields your handler does not expect.
tool = {
"name": "lookup_account",
"description": "Fetch an account record. Call this when the message "
"references a specific account by ID or email.",
"strict": True,
"input_schema": {
"type": "object",
"properties": {
"identifier": {"type": "string"},
"id_type": {"type": "string", "enum": ["account_id", "email"]},
},
"required": ["identifier", "id_type"],
"additionalProperties": False,
},
}Note the description: it states when to call the tool, not just what it does. Recent Claude models are conservative about reaching for tools, and a prescriptive trigger condition in the description measurably improves the should-call rate across a diverse batch.
Idempotency: the rule that makes resubmission safe
Batches fail partially, and the remedy is resubmission. That means any tool with a side effect — charging a card, sending a message, writing a row — can run more than once for the same logical item. The defense is idempotency, and the batch hands you a perfect idempotency key for free: the custom_id. Derive a deterministic key from it and make your tool handler a no-op on repeats.
def handle_charge(tool_input, custom_id):
idem_key = f"charge:{custom_id}" # stable across resubmits
if store.seen(idem_key):
return store.prior_result(idem_key) # no double charge
result = payments.charge(**tool_input, idempotency_key=idem_key)
store.record(idem_key, result)
return resultWith this in place, resubmitting an expired or errored request is safe by construction. The first execution does the work; every replay returns the recorded result. This is the single most important property to get right when tools have side effects in a batch.
Error handling across two layers
Tool errors and batch errors are different things, and you reconcile them separately. A tool that fails should return a result with is_error: true and an informative message, so the model can adapt or record the failure inside its response. That is a layer below the batch: the request itself may still succeed at the message level even though a tool call inside it failed.
tool_result = {
"type": "tool_result",
"tool_use_id": block.id,
"content": "Error: account 'xyz' not found.",
"is_error": True,
}So your post-batch reconciliation has two passes. First, the message-level pass on request_counts and result.type. Second, an application-level pass over the succeeded messages to detect tool failures the model surfaced in its output. A batch can read "100% succeeded" while 3% of items hit a tool error — only the second pass catches that.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Common pitfalls
- Expecting to service a client tool mid-batch. There is no live channel during async processing. Use server-side tools or a remote MCP server so resolution stays on Anthropic's side.
- Non-idempotent side effects. Without an idempotency key derived from
custom_id, a resubmission double-acts. This is the highest-severity batch tooling bug. - Loose schemas. Omitting
enum,strict, oradditionalProperties: falselets the model emit inputs your handler chokes on at scale. - Conflating tool errors with batch errors. A message-level success can hide a tool-level failure. Reconcile both layers.
- Vague tool descriptions. Without an explicit trigger condition, conservative models under-call the tool across varied prompts.
Wire a batch tool safely in 5 steps
- Prefer a server-side tool or remote MCP server so the tool resolves without a client round trip you cannot service.
- Write a strict schema:
enumon fixed fields,strict: true,additionalProperties: false. - Put a prescriptive "call this when…" trigger in the tool description.
- Derive an idempotency key from
custom_idin every side-effecting handler. - Reconcile twice after the batch ends — message-level counts, then tool-level
is_errorin the succeeded outputs.
Tool options for batch work
| Tool type | Who executes | Fits a batch? |
|---|---|---|
| Code execution (server-side) | Anthropic | Excellent — fully self-contained |
| Web search / fetch (server-side) | Anthropic | Excellent — fresh data, no client loop |
Remote MCP server (mcp_servers) | Your endpoint, called server-side | Good — resolution stays off the client |
| Local client-side tool | Your code, post-hoc | Awkward — no live channel mid-batch |
Frequently asked questions
What is the Model Context Protocol in one sentence?
The Model Context Protocol is an open standard, introduced in November 2024, that connects Claude to external tools and data through MCP servers, so one connection can expose an entire capability surface to the model.
Can Claude execute my client-side tool during batch processing?
Not on a live channel — batch requests are processed asynchronously with no callback to your client mid-flight. Use a server-side tool or wire a remote MCP server via the mcp_servers parameter so the tool resolves on Anthropic's side and the result lands in your batch output.
Why is idempotency more critical in batches than in interactive use?
Because batches fail partially and you fix them by resubmission, any side-effecting tool can run twice for the same item. An idempotency key derived from custom_id makes the replay a safe no-op.
How do I know if a tool failed when the batch says succeeded?
Message-level success and tool-level failure are independent. Do a second reconciliation pass over the succeeded results to find tool calls the model surfaced as is_error in its response.
Bringing agentic AI to your phone lines
Safe tool wiring, strict schemas, and idempotent handlers are the backbone of any production agent — batch or real-time. CallSphere brings the same Claude tooling discipline to voice and chat: assistants that answer every call, call tools mid-conversation, and book work 24/7. See it live at callsphere.ai.
Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.