Anthropic Claude Fine-Tuning Patterns on Amazon Bedrock (2026)
Anthropic still does not expose fine-tuning through its public API in 2026 — Claude Haiku SFT lives exclusively on Amazon Bedrock (us-west-2). We document the JSONL format, system-message rules, the 4-tier constitution priorities Claude inherits, and when Bedrock SFT beats prompt caching.
TL;DR — Anthropic does not let you fine-tune Claude via its public API. The only supported path in 2026 is Claude 3 Haiku SFT on Amazon Bedrock in us-west-2. Use it for narrow, latency-sensitive verticals where Haiku's $0.25/$1.25 per 1M tokens beats Sonnet/Opus and prompt caching alone is not enough.
What it does
Bedrock SFT teaches Claude 3 Haiku domain-specific style, classification labels, and tool-call shapes. Anthropic's January 2026 constitution refresh hardcodes a 4-tier priority hierarchy (safety → ethics → compliance → helpfulness) that fine-tuning cannot override — your training data is layered on top of that prior, not under it.
For Sonnet 4.x, Opus 4.7, and any model post-Haiku 3, fine-tuning is not available. Anthropic's official position: lean on prompt caching (90% discount on cached system prompts), extended thinking, and memory tools instead.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
How it works
- Format JSONL: each line has an optional
systemmessage + an alternatinguser→assistantarray (must start with user, end with assistant, ≥ 2 messages). - Upload to S3 in us-west-2.
- Create a Bedrock customization job referencing
anthropic.claude-3-haiku-20240307-v1:0. - Provision throughput on the resulting custom model (Bedrock requires Provisioned Throughput for fine-tuned Claudes — no on-demand inference).
- Hit the new model ARN exactly like any other Bedrock invoke.
flowchart LR
S3[(S3 train.jsonl)] --> JOB[Bedrock SFT job]
JOB --> CKPT[Custom Haiku checkpoint]
CKPT --> PT[Provisioned Throughput unit]
PT --> APP[Agent runtime]
APP -->|invoke_model| PT
CallSphere implementation
We mostly don't fine-tune Claude. CallSphere ships 37 agents across 6 verticals powered by Claude Sonnet 4.6 + GPT-4o + Gemini 2.5 — orchestrated through our own router. The Healthcare post-call analytics path uses gpt-4o-mini (cheaper, fine-tunable). For deep reasoning we lean on prompt caching (Anthropic's 90% cached-token discount on a 12k system prompt saves us ~$3,800/mo at Scale-tier volume) rather than custom Haiku, because cache hits beat custom-throughput costs at our QPS.
When a buyer needs Claude SFT (regulated insurance routing, e.g.), we provision on Bedrock and bill it through the Scale plan ($1,499/mo) with a co-managed customization. 14-day trial + 22% affiliate still apply.
Build steps with code
import boto3
br = boto3.client("bedrock", region_name="us-west-2")
br.create_model_customization_job(
customizationType="FINE_TUNING",
baseModelIdentifier="anthropic.claude-3-haiku-20240307-v1:0",
jobName="callsphere-claim-router-v3",
customModelName="claude-haiku-claim-router",
trainingDataConfig={"s3Uri":"s3://cs-sft/claims/train.jsonl"},
validationDataConfig={"validators":[{"s3Uri":"s3://cs-sft/claims/val.jsonl"}]},
hyperParameters={"epochCount":"2","batchSize":"32",
"learningRate":"0.00001","learningRateWarmupSteps":"50"},
outputDataConfig={"s3Uri":"s3://cs-sft/claims/out/"},
)
{"system":"You are a claims router.",
"messages":[
{"role":"user","content":"Patient stage IV, denied prior auth, plan United"},
{"role":"assistant","content":"ROUTE: appeals_specialist\nRATIONALE: oncology + denied PA"}
]}
Pitfalls
- Region lock-in — only us-west-2. If your data has residency rules, this is a blocker.
- Provisioned Throughput is expensive — minimum ~$50/hr for one model unit; under ~80 RPS sustained, prompt caching wins on cost.
- No Sonnet/Opus SFT — don't promise customers Sonnet customization; it doesn't exist.
- Constitutional priors win — even with SFT, Claude will refuse outputs that violate its 4-tier constitution.
FAQ
Q: Can I fine-tune Claude through claude.ai or the public API? No. Only via Amazon Bedrock as of May 2026.
Q: How much data do I need? Anthropic's docs suggest 50–10,000 examples; in our experience, narrow classification works at 200–500.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Q: Is prompt caching always better? At low QPS yes. Above ~80 sustained RPS where Provisioned Throughput is fully utilized, custom Haiku catches up.
Q: What about distilling Sonnet → Haiku? You can generate Sonnet outputs, store them, and use those as your Haiku SFT corpus. Anthropic's TOS allows it for your own internal models.
Q: Does fine-tuning weaken Claude's safety? The 2026 constitution refresh is enforced at runtime — SFT cannot remove safety refusals.
Sources
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.