---
title: "Reusable code patterns for Claude batch processing jobs"
description: "Code-level patterns for Claude batch processing: request factories, cache-friendly context layering, structured outputs, and self-describing custom_ids."
canonical: https://callsphere.ai/blog/reusable-code-patterns-for-claude-batch-processing-jobs
category: "Agentic AI"
tags: ["agentic ai", "claude", "message batches api", "prompt engineering", "structured outputs", "prompt caching"]
author: "CallSphere Team"
published: 2026-02-14T08:46:22.000Z
updated: 2026-06-07T01:28:23.768Z
---

# Reusable code patterns for Claude batch processing jobs

> Code-level patterns for Claude batch processing: request factories, cache-friendly context layering, structured outputs, and self-describing custom_ids.

Your first Claude batch job is a script. Your tenth is a system. Somewhere between those two, the ad-hoc `create()` call you copy-pasted starts to creak: the prompt assembly is duplicated across files, the cache never hits, and reassembling results has become a fragile web of string parsing. This post is a set of reusable patterns — request factories, context layering, structured output, and result joins — that turn batch processing from a one-off script into a component you can trust at scale.

## Key takeaways

- Wrap request construction in a **factory function** so the stable parts (model, system, tools) live in one place and only the per-item content varies.
- Layer your prompt as **frozen prefix then volatile suffix** — a cacheable shared block followed by the per-request question — to make prompt caching actually hit inside a batch.
- Use **structured outputs** (`output_config.format`) so every result is machine-parseable JSON, eliminating brittle text scraping during reassembly.
- Encode reassembly metadata in the **`custom_id` itself** (a delimited key) so the result join needs no side table.
- Build a **resubmission helper** that takes errored and expired `custom_id`s and rebuilds exactly those requests from your source data.

## Pattern 1: the request factory

The enemy of a maintainable batch job is duplication in request construction. Every request shares 90% of its body — same model, same system prompt, same tool list — and differs only in the user content. Centralize the constant part in a factory. This single move makes the shared prefix byte-identical across requests, which is also the precondition for caching to work.

```python
from anthropic.types.message_create_params import MessageCreateParamsNonStreaming
from anthropic.types.messages.batch_create_params import Request

SHARED_SYSTEM = [
    {"type": "text", "text": "You are a precise data extraction engine."},
    {"type": "text", "text": EXTRACTION_GUIDE,
     "cache_control": {"type": "ephemeral"}},   # frozen, cacheable
]

def make_request(key: str, document: str) -> Request:
    return Request(
        custom_id=key,
        params=MessageCreateParamsNonStreaming(
            model="claude-opus-4-8",
            max_tokens=2048,
            system=SHARED_SYSTEM,            # identical bytes every call
            messages=[{"role": "user", "content": document}],
        ),
    )

requests = [make_request(k, doc) for k, doc in corpus]
```

Because `SHARED_SYSTEM` is constructed once and reused, every request renders the same prefix. Reorder the keys in a dict or interpolate a timestamp into that block and you would silently shatter the cache — keep the frozen prefix truly frozen.

## Pattern 2: layer context, frozen before volatile

Caching is a prefix match: any byte change invalidates everything after it. The design rule that follows is mechanical. Put everything stable — persona, reference documents, few-shot examples — at the front, marked with `cache_control`. Put the one thing that changes per request — the actual question or document — after the breakpoint, unmarked.

```mermaid
flowchart TD
  A["Per-request body"] --> B["Frozen prefix:\npersona + guide + examples"]
  B --> C["cache_control breakpoint"]
  C --> D["Volatile suffix:\nthis item's content"]
  B --> E{"Prefix byte-identical\nacross requests?"}
  E -->|Yes| F["cache_read on later items\n~0.1x input price"]
  E -->|No| G["cache miss: full price\nevery request"]
  D --> H["Claude processes\nsuffix + cached prefix"]
  F --> H
```

The practical test is to inspect `usage.cache_read_input_tokens` on a few results. If it is consistently zero across requests that should share a prefix, a silent invalidator has crept in — most often a non-deterministic JSON serialization or a per-item value that leaked into the frozen block.

One timing nuance is worth internalizing so you do not misread the metrics. A cache entry becomes readable only after the request that writes it begins processing, and in a batch the requests sharing a prefix do not all start at once. So the very first items to run will show a cache write rather than a read, and the read rate climbs as the batch drains. If you sample only the earliest results, you may conclude caching is broken when it is simply warming up. Sample across the run, and look at the aggregate write-versus-read ratio rather than any single request.

## Pattern 3: structured outputs for clean reassembly

Text scraping is the second-most-common source of batch reassembly bugs after positional joins. If you ask for "the category and confidence" in prose, you will spend the afternoon writing regexes for the model's three favorite phrasings. Constrain the output to a schema instead and parse JSON deterministically.

```python
SCHEMA = {
    "type": "object",
    "properties": {
        "category": {"type": "string",
                     "enum": ["billing", "bug", "feature", "other"]},
        "confidence": {"type": "string",
                       "enum": ["low", "medium", "high"]},
    },
    "required": ["category", "confidence"],
    "additionalProperties": False,
}

def make_request(key: str, text: str) -> Request:
    return Request(
        custom_id=key,
        params=MessageCreateParamsNonStreaming(
            model="claude-haiku-4-5",
            max_tokens=128,
            output_config={"format": {"type": "json_schema", "schema": SCHEMA}},
            messages=[{"role": "user", "content": text}],
        ),
    )
```

On the result side, the first text block is now guaranteed to be valid JSON matching your schema, so reassembly is `json.loads(text)` with no defensive parsing. Structured outputs compose cleanly with batches — the constraint applies per request exactly as it would synchronously.

## Pattern 4: self-describing custom_ids

You can push reassembly metadata directly into the `custom_id` and avoid a side table entirely. A delimited composite key — entity type, primary key, and a version or shard tag — survives the round trip and tells you everything you need to route the result.

```python
def encode_id(entity: str, pk: int, shard: str) -> str:
    return f"{entity}|{pk}|{shard}"

def decode_id(cid: str) -> tuple[str, int, str]:
    entity, pk, shard = cid.split("|")
    return entity, int(pk), shard

# On the way out:
for r in client.messages.batches.results(batch.id):
    if r.result.type == "succeeded":
        entity, pk, shard = decode_id(r.custom_id)
        route_result(entity, pk, shard, r.result.message)
```

Keep the delimiter out of your raw data values, and keep the whole string within length limits, but otherwise this pattern eliminates an entire class of "which row was this again?" bugs. A good rule of thumb is to pick a delimiter that cannot appear in any field you encode — a pipe or a double colon works well for numeric keys — and to validate on decode so a malformed id fails loudly rather than silently routing a result to the wrong place. When the keys themselves might contain arbitrary text, reach for a structured encoding like a short base64 blob instead of raw concatenation, so the round trip is lossless regardless of content.

## Pattern 5: a resubmission helper

Errored and expired requests are normal at scale, not exceptional. Build the retry path as a first-class function from day one. It takes the list of `custom_id`s that need rework, rebuilds exactly those requests from your source data using the same factory, and submits a fresh, smaller batch.

```python
def resubmit(failed_ids: list[str], source: dict) -> str:
    retry_requests = [
        make_request(cid, source[cid]) for cid in failed_ids
    ]
    new_batch = client.messages.batches.create(requests=retry_requests)
    return new_batch.id
```

Because the factory is the single source of truth for request shape, retries are guaranteed to match the original job's configuration. No drift, no special-casing.

## Common pitfalls

- **Rebuilding the system block per request.** Constructing `SHARED_SYSTEM` inside the loop risks subtle byte differences and kills caching. Build it once, outside the comprehension.
- **Marking the volatile suffix with cache_control.** If you put the breakpoint after the per-item content, every request writes a unique cache entry and nothing is ever read. Mark the end of the shared prefix only.
- **Prose outputs for structured data.** Free-text answers force brittle parsing. Use `output_config.format` whenever the downstream consumer is code.
- **Unbounded custom_id length.** Composite keys are great until they overflow length limits. Keep them compact.
- **No retry path.** A batch job without a resubmission helper means hand-editing failures at 2am. Write it up front.

## Adopt these patterns in 5 steps

1. Extract request construction into a single `make_request()` factory.
2. Split your prompt into a frozen, `cache_control`-marked prefix and an unmarked volatile suffix.
3. Replace prose instructions with an `output_config.format` schema wherever code consumes the result.
4. Adopt a delimited, self-describing `custom_id` scheme that encodes your join keys.
5. Write a `resubmit()` helper that rebuilds failed requests from the same factory.

## Pattern tradeoffs at a glance

| Pattern | Buys you | Costs you |
| --- | --- | --- |
| Request factory | One source of truth, cache-friendly prefix | A little upfront structure |
| Frozen/volatile layering | Cache reads at ~0.1x input price | Discipline about what is frozen |
| Structured outputs | Deterministic, parse-free reassembly | Schema maintenance |
| Self-describing custom_id | No side table for the join | Length and delimiter care |

## Frequently asked questions

### Does prompt caching really help inside a batch?

Yes, when many requests share a large identical prefix. The savings accrue as the batch drains rather than all at once, because cache entries become readable only after the first writing request begins, but you still pay roughly a tenth of the input price for the cached prefix on later requests — stacked on top of the 50% batch discount.

### Can I mix models in one batch using a factory?

Yes. Each request carries its own `model`, so a factory can branch — Haiku for simple classification, Opus for the reasoning-heavy items — within a single submitted batch.

### Why prefer structured outputs over a tool definition for extraction?

When you only need a typed JSON object back and nothing is executed, `output_config.format` is the lighter path: it constrains the response shape directly without the overhead of a tool-use round trip.

### What goes in the custom_id versus a side table?

Put the minimal join keys you need to route the result — entity type and primary key — in the `custom_id`, and keep bulky context in your own store. The id is a routing label, not a payload.

## Bringing agentic AI to your phone lines

These structuring patterns — factories, layered context, schema-bound outputs — are exactly what makes a Claude agent reliable in production. CallSphere applies them to **voice and chat**: agents that answer every call, use tools mid-conversation, and book work around the clock. See it live at [callsphere.ai](https://callsphere.ai).

---

*Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.*

---

Source: https://callsphere.ai/blog/reusable-code-patterns-for-claude-batch-processing-jobs
