By Sagar Shankaran, Founder of CallSphere
We modeled 10,000 concurrent voice agent WebSockets on Cloudflare. With hibernation and the 20:1 message ratio, the bill lands surprisingly low. Here is the line-by-line math.
Key takeaways
We modeled 10,000 concurrent voice agent WebSockets on Cloudflare. With hibernation and the 20:1 message ratio, the bill lands surprisingly low. Here is the line-by-line math.
flowchart LR
Twilio["Twilio Media Streams"] -- "WS · μlaw 8kHz" --> Bridge["FastAPI Bridge :8084"]
Bridge -- "PCM16 24kHz" --> OAI["OpenAI Realtime"]
OAI --> Bridge
Bridge --> Twilio
Bridge --> Logs[(structured logs · OTel)]If you are building a chat or voice agent platform that needs to hold persistent WebSocket connections — for control messages, transcript streaming, or session state — the cheapest place to do that in 2026 is almost always Cloudflare Workers + Durable Objects.
But the pricing has three knobs (requests, GB-seconds, WebSocket message ratios) and people confuse "incoming WebSocket message" with "request" and end up with billing surprises. Let us walk it.
Workers Paid plan ($5/month minimum) includes:
Durable Objects pricing on top of Workers Paid:
Storage (SQLite-backed DO, billed January 2026 onward):
Hibernation API:
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Pretend a typical voice agent control plane:
Connection count math:
Connection cost (each new = 1 request):
Incoming WebSocket message cost:
GB-seconds (assume 32MB per DO instance, hibernated 50% of the time):
That is the big line item: GB-seconds. Hibernation matters enormously here — if you hibernate 80% of the time instead of 50%, GB-seconds drop to ~$2,070.
Storage:
Storage reads (assume 5x per write):
Egress / Workers requests:
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Total at 10k concurrent: ~$5,200 on GB-seconds + ~$4,270 storage writes + $18 requests = **$9,488/month**.
That is roughly $0.95 per 1,000 concurrent voice sessions — extraordinary if you are coming from Pusher, Ably, or self-hosted Erlang.
CallSphere uses Cloudflare Workers + Durable Objects for the chat agent control plane on three of the 6 verticals (Sales, Salon GlamBook, OneRoof Real Estate) — voice audio itself flows over OpenAI Realtime or LiveKit, but the session state, transcript streaming, and per-tenant routing live on Cloudflare.
We hit ~85% hibernation rate on idle DOs, batch row writes to 8 per call, and use a single Worker route for all 6 verticals (multi-tenant) with the tenant ID hashed into the DO ID. Net cost across 6 verticals — 37 agents, 90+ tools, 115+ DB tables — is well under $400/mo on Cloudflare for the realtime control plane.
That savings is part of why our pricing tiers ($149 / $499 / $1499) work for SMB margins and the affiliate program is sustainable. Try the 14-day no-card trial to see the snappy chat product cards on /demo — that is the Cloudflare-DO pipeline in action.
What is the 20:1 WebSocket ratio? Cloudflare counts 20 incoming WebSocket messages as 1 billable request — making chatty real-time apps cheaper.
Does hibernation work mid-call? Yes — if no JavaScript handler is actively running, the DO can hibernate and the WebSocket stays open. Costs only resume when a handler runs.
Can I run STT and LLM in a Worker? You can call out to OpenAI/Deepgram from Workers, but you should not run inference inside a Worker — use Workers AI or external GPU.
Is this cheaper than self-hosted Erlang/Phoenix? At under 50k concurrent, Cloudflare wins by 5–10× on TCO. Above 250k, self-hosted starts to compete.
What about R2 for audio storage? $0.015/GB-month with zero egress is the cheapest place to keep call recordings. Pair with DO for control plane.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
How to actually observe a WebSocket fleet: ping/pong heartbeats, Prometheus metrics that matter, dead-man switches, and the alerts that fire before customers notice.
Every 100ms of latency costs you. So does every cent per minute. Here is the decision matrix we use across 6 verticals to pick where to spend and where to save on voice AI infrastructure.
On May 4 2026 OpenAI published its Realtime stack rebuild — split-relay plus transceiver edge. Here is what changed and what it means for production voice agents.
Each Cloudflare agent runs on a Durable Object with its own SQLite, WebSockets, and scheduling. Agents Week 2026 shipped MCP, Code Mode, and 10GB SQLite per agent.
Bedrock Claude + Transcribe streaming + Polly Neural runs $0.06–$0.10 per minute on paper. The honest math reveals where the AWS-native stack beats and where it loses to OpenAI Realtime.
Embeddings, vector storage, graph nodes, and recall API calls all add up faster than expected. The cost model for serving 100k users with agent memory at scale.
© 2026 CallSphere LLC. All rights reserved.