Anthropic Claude Fine-Tuning Patterns on Amazon Bedrock (2026)
By Sagar Shankaran, Founder of CallSphere
Anthropic still does not expose fine-tuning through its public API in 2026 — Claude Haiku SFT lives exclusively on Amazon Bedrock (us-west-2). We document the JSONL format, system-message rules, the 4-tier constitution priorities Claude inherits, and when Bedrock SFT beats prompt caching.
Key takeaways
TL;DR — Anthropic does not let you fine-tune Claude via its public API. The only supported path in 2026 is Claude 3 Haiku SFT on Amazon Bedrock in us-west-2. Use it for narrow, latency-sensitive verticals where Haiku's $0.25/$1.25 per 1M tokens beats Sonnet/Opus and prompt caching alone is not enough.
What it does
Bedrock SFT teaches Claude 3 Haiku domain-specific style, classification labels, and tool-call shapes. Anthropic's January 2026 constitution refresh hardcodes a 4-tier priority hierarchy (safety → ethics → compliance → helpfulness) that fine-tuning cannot override — your training data is layered on top of that prior, not under it.
For Sonnet 4.x, Opus 4.7, and any model post-Haiku 3, fine-tuning is not available. Anthropic's official position: lean on prompt caching (90% discount on cached system prompts), extended thinking, and memory tools instead.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
How it works
- Format JSONL: each line has an optional
systemmessage + an alternatinguser→assistantarray (must start with user, end with assistant, ≥ 2 messages). - Upload to S3 in us-west-2.
- Create a Bedrock customization job referencing
anthropic.claude-3-haiku-20240307-v1:0. - Provision throughput on the resulting custom model (Bedrock requires Provisioned Throughput for fine-tuned Claudes — no on-demand inference).
- Hit the new model ARN exactly like any other Bedrock invoke.
flowchart LR
S3[(S3 train.jsonl)] --> JOB[Bedrock SFT job]
JOB --> CKPT[Custom Haiku checkpoint]
CKPT --> PT[Provisioned Throughput unit]
PT --> APP[Agent runtime]
APP -->|invoke_model| PT
CallSphere implementation
We mostly don't fine-tune Claude. CallSphere ships 37 agents across 6 verticals powered by Claude Sonnet 4.6 + GPT-4o + Gemini 2.5 — orchestrated through our own router. The Healthcare post-call analytics path uses gpt-4o-mini (cheaper, fine-tunable). For deep reasoning we lean on prompt caching (Anthropic's 90% cached-token discount on a 12k system prompt saves us ~$3,800/mo at Scale-tier volume) rather than custom Haiku, because cache hits beat custom-throughput costs at our QPS.
When a buyer needs Claude SFT (regulated insurance routing, e.g.), we provision on Bedrock and bill it through the Scale plan ($1,499/mo) with a co-managed customization. 14-day trial + 22% affiliate still apply.
Build steps with code
import boto3
br = boto3.client("bedrock", region_name="us-west-2")
br.create_model_customization_job(
customizationType="FINE_TUNING",
baseModelIdentifier="anthropic.claude-3-haiku-20240307-v1:0",
jobName="callsphere-claim-router-v3",
customModelName="claude-haiku-claim-router",
trainingDataConfig={"s3Uri":"s3://cs-sft/claims/train.jsonl"},
validationDataConfig={"validators":[{"s3Uri":"s3://cs-sft/claims/val.jsonl"}]},
hyperParameters={"epochCount":"2","batchSize":"32",
"learningRate":"0.00001","learningRateWarmupSteps":"50"},
outputDataConfig={"s3Uri":"s3://cs-sft/claims/out/"},
)
{"system":"You are a claims router.",
"messages":[
{"role":"user","content":"Patient stage IV, denied prior auth, plan United"},
{"role":"assistant","content":"ROUTE: appeals_specialist\nRATIONALE: oncology + denied PA"}
]}
Pitfalls
- Region lock-in — only us-west-2. If your data has residency rules, this is a blocker.
- Provisioned Throughput is expensive — minimum ~$50/hr for one model unit; under ~80 RPS sustained, prompt caching wins on cost.
- No Sonnet/Opus SFT — don't promise customers Sonnet customization; it doesn't exist.
- Constitutional priors win — even with SFT, Claude will refuse outputs that violate its 4-tier constitution.
FAQ
Q: Can I fine-tune Claude through claude.ai or the public API? No. Only via Amazon Bedrock as of May 2026.
Q: How much data do I need? Anthropic's docs suggest 50–10,000 examples; in our experience, narrow classification works at 200–500.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Q: Is prompt caching always better? At low QPS yes. Above ~80 sustained RPS where Provisioned Throughput is fully utilized, custom Haiku catches up.
Q: What about distilling Sonnet → Haiku? You can generate Sonnet outputs, store them, and use those as your Haiku SFT corpus. Anthropic's TOS allows it for your own internal models.
Q: Does fine-tuning weaken Claude's safety? The 2026 constitution refresh is enforced at runtime — SFT cannot remove safety refusals.
Sources
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.