By Sagar Shankaran, Founder of CallSphere
Bedrock Claude + Transcribe streaming + Polly Neural runs $0.06–$0.10 per minute on paper. The honest math reveals where the AWS-native stack beats and where it loses to OpenAI Realtime.
Key takeaways
Bedrock Claude + Transcribe streaming + Polly Neural runs $0.06–$0.10 per minute on paper. The honest math reveals where the AWS-native stack beats and where it loses to OpenAI Realtime.
flowchart TD
Client[Client] --> Edge[Cloudflare Worker]
Edge -->|WS upgrade| DO[Durable Object]
DO --> AI[(OpenAI Realtime WS)]
AI --> DO
DO --> Client
DO -.hibernation.-> Storage[(Persisted state)]Enterprises with AWS commits often default-build voice agents on the AWS-native stack: Transcribe for STT, Bedrock for LLM, and Polly for TTS. The pitch is "use your committed spend, stay in VPC, single billing." The trap is that the AWS stack is a stitched cascade — three services with three latency penalties — and the per-minute cost looks great until you add Bedrock token cost honestly.
We modeled it against gpt-realtime to find the real break-even.
Amazon Transcribe (streaming):
Amazon Polly:
Amazon Bedrock (May 2026):
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Profile A — 5-minute support call, Claude 3.5 Haiku, Polly Neural, Tier 1 Transcribe:
But that uses Tier 1 Transcribe — $0.024/min. Most production fleets that hit Tier 2 ($0.015/min) drop the per-call total to $0.117 → $0.023/min.
Profile B — 12-minute healthcare intake, Claude Sonnet, Polly Neural, 22k prompt:
Profile C — Same as B but on gpt-realtime cached:
So AWS stitched is ~40% cheaper than OpenAI Realtime cached on long, complex calls. The savings come from cheap Transcribe tier-2 + Bedrock prompt caching + Polly Neural.
The downside: latency. The cascaded AWS stack runs 700–900ms voice-to-voice on best-tuned configurations. gpt-realtime sits at ~430ms.
AWS wins when:
AWS loses when:
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
CallSphere does not run pure AWS-stitched in production today, but we do use AWS for non-voice paths where it makes sense — AWS SES for cold outreach mail, S3 for call recording archives, and Bedrock as a fallback LLM for one Healthcare post-call analytics pipeline that needs the data residency story.
For voice itself we land on OpenAI Realtime + ElevenLabs for premium and Deepgram + GPT-4o-mini + Aura-2 for cost-sensitive — see our other posts in this batch for the math. Across 6 verticals — 37 agents, 90+ tools, 115+ DB tables — AWS is part of the back-of-house but not the realtime hot path.
If you are running on AWS already and considering a switch, the ROI calculator on our site lets you plug in your current AWS unit cost and compare to our pricing tiers ($149 / $499 / $1499). The 14-day no-card trial lets you A/B against your AWS-stitched baseline.
Is AWS Transcribe cheaper than Deepgram? On Tier 1, no — Deepgram Nova-3 ($0.0048/min) beats Transcribe Tier 1 ($0.024/min) 5×. On Tier 3, Transcribe ($0.0102) gets close.
Can I use Bedrock with prompt caching? Yes — Bedrock supports prompt caching for Claude models with up to 90% discount on cached input.
Should I use Polly Long-Form voices? Only for brand voice or audiobook use cases. The 4× price multiplier is hard to justify for live agents.
What about AWS Lex for the orchestration? Lex bundles intents and slot filling, but its LLM is dated. Most teams skip Lex and orchestrate directly.
Can I bring HIPAA workloads here? Yes — Transcribe, Polly, and Bedrock are all HIPAA-eligible with a BAA in place. Same as our Healthcare Voice Agent stack.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
Every 100ms of latency costs you. So does every cent per minute. Here is the decision matrix we use across 6 verticals to pick where to spend and where to save on voice AI infrastructure.
AWS HealthScribe became the open scribe layer EHR vendors built on top of in 2026. Here's the API surface, the per-encounter pricing, the BAA terms.
AWS Multi-Agent Orchestrator ships supervisor routing, classifier, and shared memory. How to compose a customer-support agent team on Bedrock that scales cleanly.
AWS Trainium 2 supply caught up with demand in April 2026, prompting a re-set of EC2 Trn2 instance pricing and a fresh push into mid-market AI workloads.
Amazon's late-April 2026 earnings confirmed AWS AI revenue is 'multi-billion-dollar quarterly run-rate' with Trainium 2 supply outpacing demand for the first time.
Embeddings, vector storage, graph nodes, and recall API calls all add up faster than expected. The cost model for serving 100k users with agent memory at scale.
© 2026 CallSphere LLC. All rights reserved.