TL;DR — Anthropic does not let you fine-tune Claude via its public API. The only supported path in 2026 is Claude 3 Haiku SFT on Amazon Bedrock in us-west-2. Use it for narrow, latency-sensitive verticals where Haiku's $0.25/$1.25 per 1M tokens beats Sonnet/Opus and prompt caching alone is not enough.

What it does

Bedrock SFT teaches Claude 3 Haiku domain-specific style, classification labels, and tool-call shapes. Anthropic's January 2026 constitution refresh hardcodes a 4-tier priority hierarchy (safety → ethics → compliance → helpfulness) that fine-tuning cannot override — your training data is layered on top of that prior, not under it.

For Sonnet 4.x, Opus 4.7, and any model post-Haiku 3, fine-tuning is not available. Anthropic's official position: lean on prompt caching (90% discount on cached system prompts), extended thinking, and memory tools instead.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

How it works

Format JSONL: each line has an optional system message + an alternating user → assistant array (must start with user, end with assistant, ≥ 2 messages).
Upload to S3 in us-west-2.
Create a Bedrock customization job referencing anthropic.claude-3-haiku-20240307-v1:0.
Provision throughput on the resulting custom model (Bedrock requires Provisioned Throughput for fine-tuned Claudes — no on-demand inference).
Hit the new model ARN exactly like any other Bedrock invoke.

flowchart LR
  S3[(S3 train.jsonl)] --> JOB[Bedrock SFT job]
  JOB --> CKPT[Custom Haiku checkpoint]
  CKPT --> PT[Provisioned Throughput unit]
  PT --> APP[Agent runtime]
  APP -->|invoke_model| PT

CallSphere implementation

We mostly don't fine-tune Claude. CallSphere ships 37 agents across 6 verticals powered by Claude Sonnet 4.6 + GPT-4o + Gemini 2.5 — orchestrated through our own router. The Healthcare post-call analytics path uses gpt-4o-mini (cheaper, fine-tunable). For deep reasoning we lean on prompt caching (Anthropic's 90% cached-token discount on a 12k system prompt saves us ~$3,800/mo at Scale-tier volume) rather than custom Haiku, because cache hits beat custom-throughput costs at our QPS.

When a buyer needs Claude SFT (regulated insurance routing, e.g.), we provision on Bedrock and bill it through the Scale plan ($1,499/mo) with a co-managed customization. 14-day trial + 22% affiliate still apply.

Build steps with code

import boto3
br = boto3.client("bedrock", region_name="us-west-2")

br.create_model_customization_job(
    customizationType="FINE_TUNING",
    baseModelIdentifier="anthropic.claude-3-haiku-20240307-v1:0",
    jobName="callsphere-claim-router-v3",
    customModelName="claude-haiku-claim-router",
    trainingDataConfig={"s3Uri":"s3://cs-sft/claims/train.jsonl"},
    validationDataConfig={"validators":[{"s3Uri":"s3://cs-sft/claims/val.jsonl"}]},
    hyperParameters={"epochCount":"2","batchSize":"32",
                     "learningRate":"0.00001","learningRateWarmupSteps":"50"},
    outputDataConfig={"s3Uri":"s3://cs-sft/claims/out/"},
)

{"system":"You are a claims router.",
 "messages":[
   {"role":"user","content":"Patient stage IV, denied prior auth, plan United"},
   {"role":"assistant","content":"ROUTE: appeals_specialist\nRATIONALE: oncology + denied PA"}
 ]}

Pitfalls

Region lock-in — only us-west-2. If your data has residency rules, this is a blocker.
Provisioned Throughput is expensive — minimum ~$50/hr for one model unit; under ~80 RPS sustained, prompt caching wins on cost.
No Sonnet/Opus SFT — don't promise customers Sonnet customization; it doesn't exist.
Constitutional priors win — even with SFT, Claude will refuse outputs that violate its 4-tier constitution.

FAQ

Q: Can I fine-tune Claude through claude.ai or the public API? No. Only via Amazon Bedrock as of May 2026.

Q: How much data do I need? Anthropic's docs suggest 50–10,000 examples; in our experience, narrow classification works at 200–500.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Q: Is prompt caching always better? At low QPS yes. Above ~80 sustained RPS where Provisioned Throughput is fully utilized, custom Haiku catches up.

Q: What about distilling Sonnet → Haiku? You can generate Sonnet outputs, store them, and use those as your Haiku SFT corpus. Anthropic's TOS allows it for your own internal models.

Q: Does fine-tuning weaken Claude's safety? The 2026 constitution refresh is enforced at runtime — SFT cannot remove safety refusals.

Anthropic Claude Fine-Tuning Patterns on Amazon Bedrock (2026)

What it does

How it works

CallSphere implementation

Build steps with code

Pitfalls

FAQ

Sources

Try CallSphere AI Voice Agents

Related Articles You May Like

AWS Multi-Agent Orchestrator: Supervisor Routing Patterns Guide

AWS HealthScribe 2026: The Open Medical Scribe API Layer

Raleigh Startups Building on the Claude Agent SDK

Claude-Powered Voice Agents for Salon and Spa Bookings

Building Customer Support Pipelines on Claude Sonnet 4.6

Building an Organization Skill Registry for Claude Agents