Skip to content
AI Engineering
AI Engineering11 min read0 views

Anthropic Claude Fine-Tuning Patterns on Amazon Bedrock (2026)

Anthropic still does not expose fine-tuning through its public API in 2026 — Claude Haiku SFT lives exclusively on Amazon Bedrock (us-west-2). We document the JSONL format, system-message rules, the 4-tier constitution priorities Claude inherits, and when Bedrock SFT beats prompt caching.

TL;DR — Anthropic does not let you fine-tune Claude via its public API. The only supported path in 2026 is Claude 3 Haiku SFT on Amazon Bedrock in us-west-2. Use it for narrow, latency-sensitive verticals where Haiku's $0.25/$1.25 per 1M tokens beats Sonnet/Opus and prompt caching alone is not enough.

What it does

Bedrock SFT teaches Claude 3 Haiku domain-specific style, classification labels, and tool-call shapes. Anthropic's January 2026 constitution refresh hardcodes a 4-tier priority hierarchy (safety → ethics → compliance → helpfulness) that fine-tuning cannot override — your training data is layered on top of that prior, not under it.

For Sonnet 4.x, Opus 4.7, and any model post-Haiku 3, fine-tuning is not available. Anthropic's official position: lean on prompt caching (90% discount on cached system prompts), extended thinking, and memory tools instead.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

How it works

  1. Format JSONL: each line has an optional system message + an alternating userassistant array (must start with user, end with assistant, ≥ 2 messages).
  2. Upload to S3 in us-west-2.
  3. Create a Bedrock customization job referencing anthropic.claude-3-haiku-20240307-v1:0.
  4. Provision throughput on the resulting custom model (Bedrock requires Provisioned Throughput for fine-tuned Claudes — no on-demand inference).
  5. Hit the new model ARN exactly like any other Bedrock invoke.
flowchart LR
  S3[(S3 train.jsonl)] --> JOB[Bedrock SFT job]
  JOB --> CKPT[Custom Haiku checkpoint]
  CKPT --> PT[Provisioned Throughput unit]
  PT --> APP[Agent runtime]
  APP -->|invoke_model| PT

CallSphere implementation

We mostly don't fine-tune Claude. CallSphere ships 37 agents across 6 verticals powered by Claude Sonnet 4.6 + GPT-4o + Gemini 2.5 — orchestrated through our own router. The Healthcare post-call analytics path uses gpt-4o-mini (cheaper, fine-tunable). For deep reasoning we lean on prompt caching (Anthropic's 90% cached-token discount on a 12k system prompt saves us ~$3,800/mo at Scale-tier volume) rather than custom Haiku, because cache hits beat custom-throughput costs at our QPS.

When a buyer needs Claude SFT (regulated insurance routing, e.g.), we provision on Bedrock and bill it through the Scale plan ($1,499/mo) with a co-managed customization. 14-day trial + 22% affiliate still apply.

Build steps with code

import boto3
br = boto3.client("bedrock", region_name="us-west-2")

br.create_model_customization_job(
    customizationType="FINE_TUNING",
    baseModelIdentifier="anthropic.claude-3-haiku-20240307-v1:0",
    jobName="callsphere-claim-router-v3",
    customModelName="claude-haiku-claim-router",
    trainingDataConfig={"s3Uri":"s3://cs-sft/claims/train.jsonl"},
    validationDataConfig={"validators":[{"s3Uri":"s3://cs-sft/claims/val.jsonl"}]},
    hyperParameters={"epochCount":"2","batchSize":"32",
                     "learningRate":"0.00001","learningRateWarmupSteps":"50"},
    outputDataConfig={"s3Uri":"s3://cs-sft/claims/out/"},
)
{"system":"You are a claims router.",
 "messages":[
   {"role":"user","content":"Patient stage IV, denied prior auth, plan United"},
   {"role":"assistant","content":"ROUTE: appeals_specialist\nRATIONALE: oncology + denied PA"}
 ]}

Pitfalls

  • Region lock-in — only us-west-2. If your data has residency rules, this is a blocker.
  • Provisioned Throughput is expensive — minimum ~$50/hr for one model unit; under ~80 RPS sustained, prompt caching wins on cost.
  • No Sonnet/Opus SFT — don't promise customers Sonnet customization; it doesn't exist.
  • Constitutional priors win — even with SFT, Claude will refuse outputs that violate its 4-tier constitution.

FAQ

Q: Can I fine-tune Claude through claude.ai or the public API? No. Only via Amazon Bedrock as of May 2026.

Q: How much data do I need? Anthropic's docs suggest 50–10,000 examples; in our experience, narrow classification works at 200–500.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Q: Is prompt caching always better? At low QPS yes. Above ~80 sustained RPS where Provisioned Throughput is fully utilized, custom Haiku catches up.

Q: What about distilling Sonnet → Haiku? You can generate Sonnet outputs, store them, and use those as your Haiku SFT corpus. Anthropic's TOS allows it for your own internal models.

Q: Does fine-tuning weaken Claude's safety? The 2026 constitution refresh is enforced at runtime — SFT cannot remove safety refusals.

Sources

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.