By Sagar Shankaran, Founder of CallSphere
Anthropic still does not expose fine-tuning through its public API in 2026 — Claude Haiku SFT lives exclusively on Amazon Bedrock (us-west-2). We document the JSONL format, system-message rules, the 4-tier constitution priorities Claude inherits, and when Bedrock SFT beats prompt caching.
Key takeaways
TL;DR — Anthropic does not let you fine-tune Claude via its public API. The only supported path in 2026 is Claude 3 Haiku SFT on Amazon Bedrock in us-west-2. Use it for narrow, latency-sensitive verticals where Haiku's $0.25/$1.25 per 1M tokens beats Sonnet/Opus and prompt caching alone is not enough.
Bedrock SFT teaches Claude 3 Haiku domain-specific style, classification labels, and tool-call shapes. Anthropic's January 2026 constitution refresh hardcodes a 4-tier priority hierarchy (safety → ethics → compliance → helpfulness) that fine-tuning cannot override — your training data is layered on top of that prior, not under it.
For Sonnet 4.x, Opus 4.7, and any model post-Haiku 3, fine-tuning is not available. Anthropic's official position: lean on prompt caching (90% discount on cached system prompts), extended thinking, and memory tools instead.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
system message + an alternating user → assistant array (must start with user, end with assistant, ≥ 2 messages).anthropic.claude-3-haiku-20240307-v1:0.flowchart LR
S3[(S3 train.jsonl)] --> JOB[Bedrock SFT job]
JOB --> CKPT[Custom Haiku checkpoint]
CKPT --> PT[Provisioned Throughput unit]
PT --> APP[Agent runtime]
APP -->|invoke_model| PT
We mostly don't fine-tune Claude. CallSphere ships 37 agents across 6 verticals powered by Claude Sonnet 4.6 + GPT-4o + Gemini 2.5 — orchestrated through our own router. The Healthcare post-call analytics path uses gpt-4o-mini (cheaper, fine-tunable). For deep reasoning we lean on prompt caching (Anthropic's 90% cached-token discount on a 12k system prompt saves us ~$3,800/mo at Scale-tier volume) rather than custom Haiku, because cache hits beat custom-throughput costs at our QPS.
When a buyer needs Claude SFT (regulated insurance routing, e.g.), we provision on Bedrock and bill it through the Scale plan ($1,499/mo) with a co-managed customization. 14-day trial + 22% affiliate still apply.
import boto3
br = boto3.client("bedrock", region_name="us-west-2")
br.create_model_customization_job(
customizationType="FINE_TUNING",
baseModelIdentifier="anthropic.claude-3-haiku-20240307-v1:0",
jobName="callsphere-claim-router-v3",
customModelName="claude-haiku-claim-router",
trainingDataConfig={"s3Uri":"s3://cs-sft/claims/train.jsonl"},
validationDataConfig={"validators":[{"s3Uri":"s3://cs-sft/claims/val.jsonl"}]},
hyperParameters={"epochCount":"2","batchSize":"32",
"learningRate":"0.00001","learningRateWarmupSteps":"50"},
outputDataConfig={"s3Uri":"s3://cs-sft/claims/out/"},
)
{"system":"You are a claims router.",
"messages":[
{"role":"user","content":"Patient stage IV, denied prior auth, plan United"},
{"role":"assistant","content":"ROUTE: appeals_specialist\nRATIONALE: oncology + denied PA"}
]}
Q: Can I fine-tune Claude through claude.ai or the public API? No. Only via Amazon Bedrock as of May 2026.
Q: How much data do I need? Anthropic's docs suggest 50–10,000 examples; in our experience, narrow classification works at 200–500.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Q: Is prompt caching always better? At low QPS yes. Above ~80 sustained RPS where Provisioned Throughput is fully utilized, custom Haiku catches up.
Q: What about distilling Sonnet → Haiku? You can generate Sonnet outputs, store them, and use those as your Haiku SFT corpus. Anthropic's TOS allows it for your own internal models.
Q: Does fine-tuning weaken Claude's safety? The 2026 constitution refresh is enforced at runtime — SFT cannot remove safety refusals.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
Using multiple chat AIs at once is a real 2026 workflow. Here is when it makes sense, how to set it up, and how CallSphere handles multi-model routing.
The 2026 desktop AI agent landscape — ServiceNow Project Arc, Anthropic Claude offerings, OpenAI agents, and Google Mariner. A buyer's map.
A three-way comparison of Gemini Enterprise, Anthropic managed agents and OpenAI Frontier Platform after Cloud Next 2026 — strengths, gaps, buyer fit.
Anthropic's May 2026 push positions Claude as a vertical platform for financial services. The strategic positioning versus OpenAI and Google.
ServiceNow Project Arc vs Anthropic Managed Agents — runtime, governance, integration, and use cases. The 2026 enterprise autonomous agent comparison.
May 2026's biggest agent-architecture shift: planning, tool selection, and self-correction move inside the model. Framework code shrinks. Here is what changes.
© 2026 CallSphere LLC. All rights reserved.