---
title: "AWS Bedrock + Transcribe + Polly Stitched vs Realtime: Real Cost"
description: "Bedrock Claude + Transcribe streaming + Polly Neural runs $0.06–$0.10 per minute on paper. The honest math reveals where the AWS-native stack beats and where it loses to OpenAI Realtime."
canonical: https://callsphere.ai/blog/vw2c-aws-bedrock-transcribe-polly-stitched-vs-realtime-cost
category: "AI Infrastructure"
tags: ["AWS", "Bedrock", "Transcribe", "Polly", "Cost"]
author: "CallSphere Team"
published: 2026-04-29T00:00:00.000Z
updated: 2026-05-07T09:32:11.114Z
---

# AWS Bedrock + Transcribe + Polly Stitched vs Realtime: Real Cost

> Bedrock Claude + Transcribe streaming + Polly Neural runs $0.06–$0.10 per minute on paper. The honest math reveals where the AWS-native stack beats and where it loses to OpenAI Realtime.

> Bedrock Claude + Transcribe streaming + Polly Neural runs $0.06–$0.10 per minute on paper. The honest math reveals where the AWS-native stack beats and where it loses to OpenAI Realtime.

## The cost problem

```mermaid
flowchart TD
  Client[Client] --> Edge[Cloudflare Worker]
  Edge -->|WS upgrade| DO[Durable Object]
  DO --> AI[(OpenAI Realtime WS)]
  AI --> DO
  DO --> Client
  DO -.hibernation.-> Storage[(Persisted state)]
```

CallSphere reference architecture

Enterprises with AWS commits often default-build voice agents on the AWS-native stack: Transcribe for STT, Bedrock for LLM, and Polly for TTS. The pitch is "use your committed spend, stay in VPC, single billing." The trap is that the AWS stack is a stitched cascade — three services with three latency penalties — and the per-minute cost looks great until you add Bedrock token cost honestly.

We modeled it against gpt-realtime to find the real break-even.

## How AWS prices it

**Amazon Transcribe (streaming):**

- Tier 1 (first 250k minutes/month): $0.024/min
- Tier 2: $0.015/min (38% discount)
- Tier 3: $0.0102/min (58% discount)
- Speaker ID adds 20–40% extra

**Amazon Polly:**

- Standard voices: $4.00 per 1M characters
- Neural voices: $16.00 per 1M characters
- Long-Form voices: $100.00 per 1M characters
- Generative voices (newer): higher than Long-Form

**Amazon Bedrock (May 2026):**

- Claude 3.5 Haiku: $0.80/M input · $4.00/M output
- Claude 3.5 Sonnet: $3.00/M input · $15.00/M output
- Bedrock prompt caching: 90% discount on cached input where supported
- Provisioned Throughput: from $21.18/hour per model unit

## Honest math

**Profile A — 5-minute support call, Claude 3.5 Haiku, Polly Neural, Tier 1 Transcribe:**

- Transcribe: 5 × $0.024 = $0.12
- Polly Neural (2 min × ~150 wpm × ~5 chars/word ÷ 1M × $16): $0.024
- Bedrock Haiku (12k input cached + 2k output): ~$0.018
- **Total: ~$0.162/call → $0.032/min**

But that uses **Tier 1 Transcribe** — $0.024/min. Most production fleets that hit Tier 2 ($0.015/min) drop the per-call total to $0.117 → $0.023/min.

**Profile B — 12-minute healthcare intake, Claude Sonnet, Polly Neural, 22k prompt:**

- Transcribe: 12 × $0.024 = $0.288
- Polly Neural (5 min × 150 wpm × 5 chars ÷ 1M × $16): $0.060
- Bedrock Sonnet (22k cached input over 18 turns + 8k output): ~$0.21
- **Total: ~$0.558 → $0.047/min**

**Profile C — Same as B but on gpt-realtime cached:**

- ~$0.96 → $0.080/min

So **AWS stitched is ~40% cheaper than OpenAI Realtime cached on long, complex calls.** The savings come from cheap Transcribe tier-2 + Bedrock prompt caching + Polly Neural.

The downside: latency. The cascaded AWS stack runs 700–900ms voice-to-voice on best-tuned configurations. gpt-realtime sits at ~430ms.

## When AWS wins, when it loses

**AWS wins when:**

- You have a Transcribe commit pulling you to Tier 2 or 3
- Your prompt is huge (Bedrock cache rate is competitive)
- Latency tolerance is 600ms+ (not premium support flows)
- Compliance requires AWS VPC + KMS + CloudTrail end-to-end
- You already pay for Bedrock provisioned throughput

**AWS loses when:**

- Sub-500ms voice-to-voice is required
- You are below 250k Transcribe minutes/month (Tier 1 is meh)
- You want the latest emotional voices (Polly is solid but not v3)
- Your team is not deep in AWS — operational complexity is real

## How CallSphere optimizes

CallSphere does not run pure AWS-stitched in production today, but we do use AWS for non-voice paths where it makes sense — AWS SES for cold outreach mail, S3 for call recording archives, and Bedrock as a fallback LLM for one Healthcare post-call analytics pipeline that needs the data residency story.

For voice itself we land on OpenAI Realtime + ElevenLabs for premium and Deepgram + GPT-4o-mini + Aura-2 for cost-sensitive — see our other posts in this batch for the math. Across 6 verticals — 37 agents, 90+ tools, 115+ DB tables — AWS is part of the back-of-house but not the realtime hot path.

If you are running on AWS already and considering a switch, the [ROI calculator](/tools/roi-calculator) on our site lets you plug in your current AWS unit cost and compare to our [pricing tiers](/pricing) ($149 / $499 / $1499). The [14-day no-card trial](/trial) lets you A/B against your AWS-stitched baseline.

## Optimization checklist

1. Compute your real Transcribe tier — Tier 1 is rough; Tier 2/3 unlocks AWS savings.
2. Use Polly Neural unless you need Long-Form quality (4× price for marginal gains).
3. Use Bedrock prompt caching aggressively — same 90% discount as Anthropic direct.
4. Choose Claude Haiku for short flows, Sonnet for complex.
5. Watch out for Bedrock Provisioned Throughput — only worth it at very high concurrency.
6. Consider Polly's Generative Voices for brand voice — but benchmark vs ElevenLabs.
7. Stay in one region to avoid cross-region egress charges.
8. Use Speaker Diarization only if you need it — adds 20–40%.
9. Pre-warm Bedrock with a small inference at start-of-shift to dodge cold-start.
10. Monitor latency p95 with X-Ray; add Lambda Provisioned Concurrency if cold starts hurt.

## FAQ

**Is AWS Transcribe cheaper than Deepgram?**
On Tier 1, no — Deepgram Nova-3 ($0.0048/min) beats Transcribe Tier 1 ($0.024/min) 5×. On Tier 3, Transcribe ($0.0102) gets close.

**Can I use Bedrock with prompt caching?**
Yes — Bedrock supports prompt caching for Claude models with up to 90% discount on cached input.

**Should I use Polly Long-Form voices?**
Only for brand voice or audiobook use cases. The 4× price multiplier is hard to justify for live agents.

**What about AWS Lex for the orchestration?**
Lex bundles intents and slot filling, but its LLM is dated. Most teams skip Lex and orchestrate directly.

**Can I bring HIPAA workloads here?**
Yes — Transcribe, Polly, and Bedrock are all HIPAA-eligible with a BAA in place. Same as our Healthcare Voice Agent stack.

## Sources

- Amazon Transcribe Pricing — [https://aws.amazon.com/transcribe/pricing/](https://aws.amazon.com/transcribe/pricing/)
- Amazon Polly Pricing — [https://aws.amazon.com/polly/pricing/](https://aws.amazon.com/polly/pricing/)
- Amazon Bedrock Pricing — [https://aws.amazon.com/bedrock/pricing/](https://aws.amazon.com/bedrock/pricing/)
- CostGoat AWS Transcribe Calculator — [https://costgoat.com/pricing/amazon-transcribe](https://costgoat.com/pricing/amazon-transcribe)

---

Source: https://callsphere.ai/blog/vw2c-aws-bedrock-transcribe-polly-stitched-vs-realtime-cost
