Claude Message Batches: Processing Thousands of Agent Tasks with 50% Cost Savings

Why Batch Processing Matters for Agents

Many agent workloads are not real-time. Nightly data classification, bulk document summarization, mass email personalization, and dataset labeling can all tolerate minutes to hours of latency. The Claude Message Batches API is designed for exactly these scenarios — it processes up to 100,000 requests per batch at 50% of the standard API cost with a 24-hour processing window.

For agent systems, this means you can run thousands of independent agent tasks in parallel without managing rate limits, connection pools, or retry logic yourself. Anthropic handles the queuing and execution; you just submit the batch and poll for results.

How the Batches API Works

The flow is straightforward: create a batch of message requests, submit them, poll for completion, and retrieve results. Each request in the batch is a complete Messages API call — it can include tools, system prompts, multi-turn conversations, and all other features.

flowchart LR
    subgraph IN["Inputs"]
        I1["Monthly call volume"]
        I2["Average deal value"]
        I3["Current answer rate"]
        I4["Receptionist cost<br/>per month"]
    end
    subgraph CALC["CallSphere Captures"]
        C1["Missed calls converted<br/>at 24 by 7 coverage"]
        C2["Receptionist payroll<br/>displaced or freed"]
    end
    subgraph OUT["Outputs"]
        O1["Recovered revenue<br/>per month"]
        O2["Operating cost saved"]
        O3((Net ROI<br/>monthly))
    end
    I1 --> C1
    I2 --> C1
    I3 --> C1
    I4 --> C2
    C1 --> O1 --> O3
    C2 --> O2 --> O3
    style C1 fill:#4f46e5,stroke:#4338ca,color:#fff
    style C2 fill:#4f46e5,stroke:#4338ca,color:#fff
    style O3 fill:#059669,stroke:#047857,color:#fff

import anthropic
import json
import time

client = anthropic.Anthropic()

# Step 1: Define individual requests
requests = []
documents = load_documents()  # Your list of documents to process

for i, doc in enumerate(documents):
    requests.append({
        "custom_id": f"doc-{i}",
        "params": {
            "model": "claude-sonnet-4-20250514",
            "max_tokens": 1024,
            "messages": [
                {
                    "role": "user",
                    "content": f"Classify this document and extract key entities:\n\n{doc['text']}"
                }
            ],
        }
    })

Each request needs a custom_id that you use to match results back to inputs. The params field mirrors the standard Messages API parameters exactly.

Submitting a Batch

# Step 2: Create the batch
batch = client.messages.batches.create(requests=requests)

print(f"Batch ID: {batch.id}")
print(f"Status: {batch.processing_status}")
print(f"Total requests: {batch.request_counts.processing}")

The batch is now queued for processing. Anthropic guarantees completion within 24 hours, but most batches finish much faster — small batches (under 1,000 requests) typically complete in minutes.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

Monitoring Batch Progress

Poll the batch status to track progress:

def wait_for_batch(batch_id: str, poll_interval: int = 30) -> dict:
    """Poll batch status until completion."""
    while True:
        batch = client.messages.batches.retrieve(batch_id)

        succeeded = batch.request_counts.succeeded
        errored = batch.request_counts.errored
        total = batch.request_counts.processing + succeeded + errored

        print(f"Progress: {succeeded + errored}/{total} "
              f"(succeeded: {succeeded}, errored: {errored})")

        if batch.processing_status == "ended":
            return batch

        time.sleep(poll_interval)

completed_batch = wait_for_batch(batch.id)

For production systems, replace polling with webhooks or a task queue like Celery that checks batch status on a schedule.

Retrieving and Processing Results

Once the batch completes, stream the results:

# Step 3: Retrieve results
results = {}
for result in client.messages.batches.results(completed_batch.id):
    custom_id = result.custom_id

    if result.result.type == "succeeded":
        message = result.result.message
        text = message.content[0].text
        results[custom_id] = {"status": "success", "output": text}

    elif result.result.type == "errored":
        error = result.result.error
        results[custom_id] = {"status": "error", "error": str(error)}

    elif result.result.type == "expired":
        results[custom_id] = {"status": "expired"}

print(f"Processed {len(results)} results")
print(f"Succeeded: {sum(1 for r in results.values() if r['status'] == 'success')}")
print(f"Failed: {sum(1 for r in results.values() if r['status'] != 'success')}")

Results stream back as an iterator, so you can process them without loading everything into memory at once.

Batch Requests with Tool Use

Batch requests support the full tool use API. This means you can run agent-like workflows in batch mode, though each batch request gets a single turn — no iterative agent loop:

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

classification_tool = {
    "name": "classify_document",
    "description": "Classify a document into categories",
    "input_schema": {
        "type": "object",
        "properties": {
            "category": {
                "type": "string",
                "enum": ["legal", "financial", "technical", "marketing", "other"]
            },
            "confidence": {"type": "number"},
            "entities": {
                "type": "array",
                "items": {"type": "string"}
            }
        },
        "required": ["category", "confidence", "entities"]
    }
}

# Force structured output via tool_choice
batch_requests = []
for i, doc in enumerate(documents):
    batch_requests.append({
        "custom_id": f"classify-{i}",
        "params": {
            "model": "claude-sonnet-4-20250514",
            "max_tokens": 512,
            "tools": [classification_tool],
            "tool_choice": {"type": "tool", "name": "classify_document"},
            "messages": [
                {"role": "user", "content": f"Classify this document:\n\n{doc['text']}"}
            ],
        }
    })

By forcing tool use with tool_choice, every response will contain a structured tool_use block that you can parse directly — no text extraction needed.

Error Handling and Retries

Build resilience into your batch pipeline:

def submit_with_retry(requests: list, max_retries: int = 3) -> str:
    for attempt in range(max_retries):
        try:
            batch = client.messages.batches.create(requests=requests)
            return batch.id
        except anthropic.APIError as e:
            if attempt == max_retries - 1:
                raise
            print(f"Attempt {attempt + 1} failed: {e}. Retrying...")
            time.sleep(2 ** attempt)

def resubmit_failures(batch_id: str, original_requests: dict) -> str:
    """Collect failed requests and resubmit them as a new batch."""
    failed_requests = []
    for result in client.messages.batches.results(batch_id):
        if result.result.type != "succeeded":
            # Find the original request by custom_id
            original = original_requests[result.custom_id]
            failed_requests.append(original)

    if not failed_requests:
        return None

    print(f"Resubmitting {len(failed_requests)} failed requests")
    return submit_with_retry(failed_requests)

FAQ

What is the maximum batch size?

Each batch can contain up to 100,000 requests. If you have more than that, split them into multiple batches and submit them concurrently. Each request can use up to the model's full context window and max output tokens.

Can I cancel a running batch?

Yes, call client.messages.batches.cancel(batch_id) to cancel a batch in progress. Requests that have already completed will still be available in the results. Requests that were not yet processed will be marked as canceled.

How much does batch processing actually save?

Batch processing costs exactly 50% of the standard API pricing for both input and output tokens. For a workflow processing 10,000 documents at an average of 2,000 input tokens and 500 output tokens each, the savings are substantial — potentially hundreds of dollars per run compared to real-time API calls.

#Claude #BatchProcessing #CostOptimization #Async #Python #AgenticAI #LearnAI #AIEngineering

Claude Message Batches: Processing Thousands of Agent Tasks with 50% Cost Savings

Why Batch Processing Matters for Agents

How the Batches API Works

Submitting a Batch

Monitoring Batch Progress

Retrieving and Processing Results

Batch Requests with Tool Use

Error Handling and Retries

FAQ

What is the maximum batch size?

Can I cancel a running batch?

How much does batch processing actually save?

Try CallSphere AI Voice Agents

Related Articles You May Like

How to Use Multiple Chat AIs at Once (and Why You Might)

Desktop AI Agents in 2026: Project Arc, Claude Cowork, OpenAI Agents Compared

Reasoning models (Claude Mythos, o3, Opus 4.7, DeepSeek V4-Pro): Which Wins for Browser-side LLMs (WebGPU) in 2026?

Self-hosted on-prem stack for Browser-side LLMs (WebGPU): A May 2026 Comparison

Reasoning models (Claude Mythos, o3, Opus 4.7, DeepSeek V4-Pro): Which Wins for Edge / on-device LLM inference in 2026?

Self-hosted on-prem stack for Edge / on-device LLM inference: A May 2026 Comparison