Reusable code patterns for Claude batch processing jobs

Your first Claude batch job is a script. Your tenth is a system. Somewhere between those two, the ad-hoc create() call you copy-pasted starts to creak: the prompt assembly is duplicated across files, the cache never hits, and reassembling results has become a fragile web of string parsing. This post is a set of reusable patterns — request factories, context layering, structured output, and result joins — that turn batch processing from a one-off script into a component you can trust at scale.

Key takeaways

Wrap request construction in a factory function so the stable parts (model, system, tools) live in one place and only the per-item content varies.
Layer your prompt as frozen prefix then volatile suffix — a cacheable shared block followed by the per-request question — to make prompt caching actually hit inside a batch.
Use structured outputs (output_config.format) so every result is machine-parseable JSON, eliminating brittle text scraping during reassembly.
Encode reassembly metadata in the custom_id itself (a delimited key) so the result join needs no side table.
Build a resubmission helper that takes errored and expired custom_ids and rebuilds exactly those requests from your source data.

Pattern 1: the request factory

The enemy of a maintainable batch job is duplication in request construction. Every request shares 90% of its body — same model, same system prompt, same tool list — and differs only in the user content. Centralize the constant part in a factory. This single move makes the shared prefix byte-identical across requests, which is also the precondition for caching to work.

from anthropic.types.message_create_params import MessageCreateParamsNonStreaming
from anthropic.types.messages.batch_create_params import Request

SHARED_SYSTEM = [
    {"type": "text", "text": "You are a precise data extraction engine."},
    {"type": "text", "text": EXTRACTION_GUIDE,
     "cache_control": {"type": "ephemeral"}},   # frozen, cacheable
]

def make_request(key: str, document: str) -> Request:
    return Request(
        custom_id=key,
        params=MessageCreateParamsNonStreaming(
            model="claude-opus-4-8",
            max_tokens=2048,
            system=SHARED_SYSTEM,            # identical bytes every call
            messages=[{"role": "user", "content": document}],
        ),
    )

requests = [make_request(k, doc) for k, doc in corpus]

Because SHARED_SYSTEM is constructed once and reused, every request renders the same prefix. Reorder the keys in a dict or interpolate a timestamp into that block and you would silently shatter the cache — keep the frozen prefix truly frozen.

Pattern 2: layer context, frozen before volatile

Caching is a prefix match: any byte change invalidates everything after it. The design rule that follows is mechanical. Put everything stable — persona, reference documents, few-shot examples — at the front, marked with cache_control. Put the one thing that changes per request — the actual question or document — after the breakpoint, unmarked.

flowchart TD
  A["Per-request body"] --> B["Frozen prefix:\npersona + guide + examples"]
  B --> C["cache_control breakpoint"]
  C --> D["Volatile suffix:\nthis item's content"]
  B --> E{"Prefix byte-identical\nacross requests?"}
  E -->|Yes| F["cache_read on later items\n~0.1x input price"]
  E -->|No| G["cache miss: full price\nevery request"]
  D --> H["Claude processes\nsuffix + cached prefix"]
  F --> H

The practical test is to inspect usage.cache_read_input_tokens on a few results. If it is consistently zero across requests that should share a prefix, a silent invalidator has crept in — most often a non-deterministic JSON serialization or a per-item value that leaked into the frozen block.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

One timing nuance is worth internalizing so you do not misread the metrics. A cache entry becomes readable only after the request that writes it begins processing, and in a batch the requests sharing a prefix do not all start at once. So the very first items to run will show a cache write rather than a read, and the read rate climbs as the batch drains. If you sample only the earliest results, you may conclude caching is broken when it is simply warming up. Sample across the run, and look at the aggregate write-versus-read ratio rather than any single request.

Pattern 3: structured outputs for clean reassembly

Text scraping is the second-most-common source of batch reassembly bugs after positional joins. If you ask for "the category and confidence" in prose, you will spend the afternoon writing regexes for the model's three favorite phrasings. Constrain the output to a schema instead and parse JSON deterministically.

SCHEMA = {
    "type": "object",
    "properties": {
        "category": {"type": "string",
                     "enum": ["billing", "bug", "feature", "other"]},
        "confidence": {"type": "string",
                       "enum": ["low", "medium", "high"]},
    },
    "required": ["category", "confidence"],
    "additionalProperties": False,
}

def make_request(key: str, text: str) -> Request:
    return Request(
        custom_id=key,
        params=MessageCreateParamsNonStreaming(
            model="claude-haiku-4-5",
            max_tokens=128,
            output_config={"format": {"type": "json_schema", "schema": SCHEMA}},
            messages=[{"role": "user", "content": text}],
        ),
    )

On the result side, the first text block is now guaranteed to be valid JSON matching your schema, so reassembly is json.loads(text) with no defensive parsing. Structured outputs compose cleanly with batches — the constraint applies per request exactly as it would synchronously.

Pattern 4: self-describing custom_ids

You can push reassembly metadata directly into the custom_id and avoid a side table entirely. A delimited composite key — entity type, primary key, and a version or shard tag — survives the round trip and tells you everything you need to route the result.

def encode_id(entity: str, pk: int, shard: str) -> str:
    return f"{entity}|{pk}|{shard}"

def decode_id(cid: str) -> tuple[str, int, str]:
    entity, pk, shard = cid.split("|")
    return entity, int(pk), shard

# On the way out:
for r in client.messages.batches.results(batch.id):
    if r.result.type == "succeeded":
        entity, pk, shard = decode_id(r.custom_id)
        route_result(entity, pk, shard, r.result.message)

Keep the delimiter out of your raw data values, and keep the whole string within length limits, but otherwise this pattern eliminates an entire class of "which row was this again?" bugs. A good rule of thumb is to pick a delimiter that cannot appear in any field you encode — a pipe or a double colon works well for numeric keys — and to validate on decode so a malformed id fails loudly rather than silently routing a result to the wrong place. When the keys themselves might contain arbitrary text, reach for a structured encoding like a short base64 blob instead of raw concatenation, so the round trip is lossless regardless of content.

Pattern 5: a resubmission helper

Errored and expired requests are normal at scale, not exceptional. Build the retry path as a first-class function from day one. It takes the list of custom_ids that need rework, rebuilds exactly those requests from your source data using the same factory, and submits a fresh, smaller batch.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

def resubmit(failed_ids: list[str], source: dict) -> str:
    retry_requests = [
        make_request(cid, source[cid]) for cid in failed_ids
    ]
    new_batch = client.messages.batches.create(requests=retry_requests)
    return new_batch.id

Because the factory is the single source of truth for request shape, retries are guaranteed to match the original job's configuration. No drift, no special-casing.

Common pitfalls

Rebuilding the system block per request. Constructing SHARED_SYSTEM inside the loop risks subtle byte differences and kills caching. Build it once, outside the comprehension.
Marking the volatile suffix with cache_control. If you put the breakpoint after the per-item content, every request writes a unique cache entry and nothing is ever read. Mark the end of the shared prefix only.
Prose outputs for structured data. Free-text answers force brittle parsing. Use output_config.format whenever the downstream consumer is code.
Unbounded custom_id length. Composite keys are great until they overflow length limits. Keep them compact.
No retry path. A batch job without a resubmission helper means hand-editing failures at 2am. Write it up front.

Adopt these patterns in 5 steps

Extract request construction into a single make_request() factory.
Split your prompt into a frozen, cache_control-marked prefix and an unmarked volatile suffix.
Replace prose instructions with an output_config.format schema wherever code consumes the result.
Adopt a delimited, self-describing custom_id scheme that encodes your join keys.
Write a resubmit() helper that rebuilds failed requests from the same factory.

Pattern tradeoffs at a glance

Pattern	Buys you	Costs you
Request factory	One source of truth, cache-friendly prefix	A little upfront structure
Frozen/volatile layering	Cache reads at ~0.1x input price	Discipline about what is frozen
Structured outputs	Deterministic, parse-free reassembly	Schema maintenance
Self-describing custom_id	No side table for the join	Length and delimiter care

Frequently asked questions

Does prompt caching really help inside a batch?

Yes, when many requests share a large identical prefix. The savings accrue as the batch drains rather than all at once, because cache entries become readable only after the first writing request begins, but you still pay roughly a tenth of the input price for the cached prefix on later requests — stacked on top of the 50% batch discount.

Can I mix models in one batch using a factory?

Yes. Each request carries its own model, so a factory can branch — Haiku for simple classification, Opus for the reasoning-heavy items — within a single submitted batch.

Why prefer structured outputs over a tool definition for extraction?

When you only need a typed JSON object back and nothing is executed, output_config.format is the lighter path: it constrains the response shape directly without the overhead of a tool-use round trip.

What goes in the custom_id versus a side table?

Put the minimal join keys you need to route the result — entity type and primary key — in the custom_id, and keep bulky context in your own store. The id is a routing label, not a payload.

Bringing agentic AI to your phone lines

These structuring patterns — factories, layered context, schema-bound outputs — are exactly what makes a Claude agent reliable in production. CallSphere applies them to voice and chat: agents that answer every call, use tools mid-conversation, and book work around the clock. See it live at callsphere.ai.

Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.

Reusable code patterns for Claude batch processing jobs

Key takeaways

Pattern 1: the request factory

Pattern 2: layer context, frozen before volatile

Pattern 3: structured outputs for clean reassembly

Pattern 4: self-describing custom_ids

Pattern 5: a resubmission helper

Common pitfalls

Adopt these patterns in 5 steps

Pattern tradeoffs at a glance

Frequently asked questions

Does prompt caching really help inside a batch?

Can I mix models in one batch using a factory?

Why prefer structured outputs over a tool definition for extraction?

What goes in the custom_id versus a side table?

Bringing agentic AI to your phone lines

Try CallSphere AI Voice Agents

Related Articles You May Like

Where Claude Cowork is heading and how to prepare

Where Claude Code GTM engineering is heading next

Measuring Claude Cowork success: metrics that prove it

How to measure success of Claude Code GTM workflows

Claude Cowork walkthrough: from problem to shipped

End-to-end Claude Code GTM workflow: a real rebuild