Skip to content
AI Infrastructure
AI Infrastructure11 min read0 views

AWS Bedrock + Transcribe + Polly Stitched vs Realtime: Real Cost

Bedrock Claude + Transcribe streaming + Polly Neural runs $0.06–$0.10 per minute on paper. The honest math reveals where the AWS-native stack beats and where it loses to OpenAI Realtime.

Bedrock Claude + Transcribe streaming + Polly Neural runs $0.06–$0.10 per minute on paper. The honest math reveals where the AWS-native stack beats and where it loses to OpenAI Realtime.

The cost problem

flowchart TD
  Client[Client] --> Edge[Cloudflare Worker]
  Edge -->|WS upgrade| DO[Durable Object]
  DO --> AI[(OpenAI Realtime WS)]
  AI --> DO
  DO --> Client
  DO -.hibernation.-> Storage[(Persisted state)]
CallSphere reference architecture

Enterprises with AWS commits often default-build voice agents on the AWS-native stack: Transcribe for STT, Bedrock for LLM, and Polly for TTS. The pitch is "use your committed spend, stay in VPC, single billing." The trap is that the AWS stack is a stitched cascade — three services with three latency penalties — and the per-minute cost looks great until you add Bedrock token cost honestly.

We modeled it against gpt-realtime to find the real break-even.

How AWS prices it

Amazon Transcribe (streaming):

  • Tier 1 (first 250k minutes/month): $0.024/min
  • Tier 2: $0.015/min (38% discount)
  • Tier 3: $0.0102/min (58% discount)
  • Speaker ID adds 20–40% extra

Amazon Polly:

  • Standard voices: $4.00 per 1M characters
  • Neural voices: $16.00 per 1M characters
  • Long-Form voices: $100.00 per 1M characters
  • Generative voices (newer): higher than Long-Form

Amazon Bedrock (May 2026):

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
  • Claude 3.5 Haiku: $0.80/M input · $4.00/M output
  • Claude 3.5 Sonnet: $3.00/M input · $15.00/M output
  • Bedrock prompt caching: 90% discount on cached input where supported
  • Provisioned Throughput: from $21.18/hour per model unit

Honest math

Profile A — 5-minute support call, Claude 3.5 Haiku, Polly Neural, Tier 1 Transcribe:

  • Transcribe: 5 × $0.024 = $0.12
  • Polly Neural (2 min × ~150 wpm × ~5 chars/word ÷ 1M × $16): $0.024
  • Bedrock Haiku (12k input cached + 2k output): ~$0.018
  • Total: ~$0.162/call → $0.032/min

But that uses Tier 1 Transcribe — $0.024/min. Most production fleets that hit Tier 2 ($0.015/min) drop the per-call total to $0.117 → $0.023/min.

Profile B — 12-minute healthcare intake, Claude Sonnet, Polly Neural, 22k prompt:

  • Transcribe: 12 × $0.024 = $0.288
  • Polly Neural (5 min × 150 wpm × 5 chars ÷ 1M × $16): $0.060
  • Bedrock Sonnet (22k cached input over 18 turns + 8k output): ~$0.21
  • Total: ~$0.558 → $0.047/min

Profile C — Same as B but on gpt-realtime cached:

  • ~$0.96 → $0.080/min

So AWS stitched is ~40% cheaper than OpenAI Realtime cached on long, complex calls. The savings come from cheap Transcribe tier-2 + Bedrock prompt caching + Polly Neural.

The downside: latency. The cascaded AWS stack runs 700–900ms voice-to-voice on best-tuned configurations. gpt-realtime sits at ~430ms.

When AWS wins, when it loses

AWS wins when:

  • You have a Transcribe commit pulling you to Tier 2 or 3
  • Your prompt is huge (Bedrock cache rate is competitive)
  • Latency tolerance is 600ms+ (not premium support flows)
  • Compliance requires AWS VPC + KMS + CloudTrail end-to-end
  • You already pay for Bedrock provisioned throughput

AWS loses when:

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

  • Sub-500ms voice-to-voice is required
  • You are below 250k Transcribe minutes/month (Tier 1 is meh)
  • You want the latest emotional voices (Polly is solid but not v3)
  • Your team is not deep in AWS — operational complexity is real

How CallSphere optimizes

CallSphere does not run pure AWS-stitched in production today, but we do use AWS for non-voice paths where it makes sense — AWS SES for cold outreach mail, S3 for call recording archives, and Bedrock as a fallback LLM for one Healthcare post-call analytics pipeline that needs the data residency story.

For voice itself we land on OpenAI Realtime + ElevenLabs for premium and Deepgram + GPT-4o-mini + Aura-2 for cost-sensitive — see our other posts in this batch for the math. Across 6 verticals — 37 agents, 90+ tools, 115+ DB tables — AWS is part of the back-of-house but not the realtime hot path.

If you are running on AWS already and considering a switch, the ROI calculator on our site lets you plug in your current AWS unit cost and compare to our pricing tiers ($149 / $499 / $1499). The 14-day no-card trial lets you A/B against your AWS-stitched baseline.

Optimization checklist

  1. Compute your real Transcribe tier — Tier 1 is rough; Tier 2/3 unlocks AWS savings.
  2. Use Polly Neural unless you need Long-Form quality (4× price for marginal gains).
  3. Use Bedrock prompt caching aggressively — same 90% discount as Anthropic direct.
  4. Choose Claude Haiku for short flows, Sonnet for complex.
  5. Watch out for Bedrock Provisioned Throughput — only worth it at very high concurrency.
  6. Consider Polly's Generative Voices for brand voice — but benchmark vs ElevenLabs.
  7. Stay in one region to avoid cross-region egress charges.
  8. Use Speaker Diarization only if you need it — adds 20–40%.
  9. Pre-warm Bedrock with a small inference at start-of-shift to dodge cold-start.
  10. Monitor latency p95 with X-Ray; add Lambda Provisioned Concurrency if cold starts hurt.

FAQ

Is AWS Transcribe cheaper than Deepgram? On Tier 1, no — Deepgram Nova-3 ($0.0048/min) beats Transcribe Tier 1 ($0.024/min) 5×. On Tier 3, Transcribe ($0.0102) gets close.

Can I use Bedrock with prompt caching? Yes — Bedrock supports prompt caching for Claude models with up to 90% discount on cached input.

Should I use Polly Long-Form voices? Only for brand voice or audiobook use cases. The 4× price multiplier is hard to justify for live agents.

What about AWS Lex for the orchestration? Lex bundles intents and slot filling, but its LLM is dated. Most teams skip Lex and orchestrate directly.

Can I bring HIPAA workloads here? Yes — Transcribe, Polly, and Bedrock are all HIPAA-eligible with a BAA in place. Same as our Healthcare Voice Agent stack.

Sources

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.