Cloudflare Workers + Durable Objects at 10k Concurrent: Real Cost
By Sagar Shankaran, Founder of CallSphere
We modeled 10,000 concurrent voice agent WebSockets on Cloudflare. With hibernation and the 20:1 message ratio, the bill lands surprisingly low. Here is the line-by-line math.
Key takeaways
We modeled 10,000 concurrent voice agent WebSockets on Cloudflare. With hibernation and the 20:1 message ratio, the bill lands surprisingly low. Here is the line-by-line math.
The cost problem
flowchart LR
Twilio["Twilio Media Streams"] -- "WS · μlaw 8kHz" --> Bridge["FastAPI Bridge :8084"]
Bridge -- "PCM16 24kHz" --> OAI["OpenAI Realtime"]
OAI --> Bridge
Bridge --> Twilio
Bridge --> Logs[(structured logs · OTel)]If you are building a chat or voice agent platform that needs to hold persistent WebSocket connections — for control messages, transcript streaming, or session state — the cheapest place to do that in 2026 is almost always Cloudflare Workers + Durable Objects.
But the pricing has three knobs (requests, GB-seconds, WebSocket message ratios) and people confuse "incoming WebSocket message" with "request" and end up with billing surprises. Let us walk it.
How Cloudflare prices it
Workers Paid plan ($5/month minimum) includes:
- 10M Workers requests/month
- 30M CPU-ms/month
Durable Objects pricing on top of Workers Paid:
- 1M DO requests/month included; $0.15 per million after
- 400k GB-seconds/month included; $12.50 per million GB-s after
- WebSocket incoming messages: 20:1 billing ratio (20 messages = 1 billable request)
- Outgoing messages and protocol pings: free
- Each new WebSocket connection counts as 1 request
Storage (SQLite-backed DO, billed January 2026 onward):
- 25B row reads/month free, then $0.001/M
- 50M row writes/month free, then $1.00/M
- 5 GB-month included, then $0.20/GB-month
Hibernation API:
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
- Clients stay connected while the DO is hibernated
- GB-second charges do NOT accrue during hibernation
Honest math: 10,000 concurrent WebSockets
Pretend a typical voice agent control plane:
- 10,000 concurrent connections held for an average of 8 minutes each
- 5 control messages per second per connection (transcript chunks, tool events)
- Each connection makes 80 storage row writes (turn-by-turn log)
Connection count math:
- 10,000 concurrent × (60 / 8) connections per hour per slot = 75,000 new connections/hour
- 75k × 24 × 30 = 54M connections/month
Connection cost (each new = 1 request):
- 54M × $0.15 / 1M = $8.10
Incoming WebSocket message cost:
- 54M conns × 8 min × 60s × 5 msgs/s = 1.296B incoming messages
- 1.296B × (1 / 20) ratio = 64.8M billable requests
- 64.8M × $0.15 / 1M = $9.72
GB-seconds (assume 32MB per DO instance, hibernated 50% of the time):
- Active DO-seconds: 54M × 8 min × 60s × 0.5 = 12.96B DO-seconds active
- Active DO GB-seconds: 12.96B × 0.032 = 415M GB-s
- Cost: (415M − 0.4M free) × $12.50 / 1M = $5,180
That is the big line item: GB-seconds. Hibernation matters enormously here — if you hibernate 80% of the time instead of 50%, GB-seconds drop to ~$2,070.
Storage:
- 54M conns × 80 writes = 4.32B row writes/month
- (4.32B − 50M free) × $1 / 1M = $4,270
Storage reads (assume 5x per write):
- 21.6B reads, free at 25B/month included for typical pricing → ~$0
Egress / Workers requests:
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
- 54M × 1 = $0.15/M handled in DO request cost above
Total at 10k concurrent: ~$5,200 on GB-seconds + ~$4,270 storage writes + $18 requests = **$9,488/month**.
That is roughly $0.95 per 1,000 concurrent voice sessions — extraordinary if you are coming from Pusher, Ably, or self-hosted Erlang.
Optimization wins
- Aggressive hibernation. The 80% hibernated case cuts the bill by 40%.
- Batch row writes. 80 per call to 12 per call cuts storage from $4,270 to ~$640.
- Use Workers WebSockets directly without a DO when you do not need state — that path bills at flat Workers rates and avoids the 20:1 ratio entirely. Best for fanout-only patterns.
How CallSphere optimizes
CallSphere uses Cloudflare Workers + Durable Objects for the chat agent control plane on three of the 6 verticals (Sales, Salon GlamBook, OneRoof Real Estate) — voice audio itself flows over OpenAI Realtime or LiveKit, but the session state, transcript streaming, and per-tenant routing live on Cloudflare.
We hit ~85% hibernation rate on idle DOs, batch row writes to 8 per call, and use a single Worker route for all 6 verticals (multi-tenant) with the tenant ID hashed into the DO ID. Net cost across 6 verticals — 37 agents, 90+ tools, 115+ DB tables — is well under $400/mo on Cloudflare for the realtime control plane.
That savings is part of why our pricing tiers ($149 / $499 / $1499) work for SMB margins and the affiliate program is sustainable. Try the 14-day no-card trial to see the snappy chat product cards on /demo — that is the Cloudflare-DO pipeline in action.
Optimization checklist
- Use Hibernation API everywhere idle WebSocket connections sit.
- Batch row writes to once per turn instead of per message.
- Compact transcript snapshots — store deltas, not full state.
- Use Workers WebSockets without a DO if you do not need state (for fanout).
- Avoid storing audio in DO storage — push it to R2 or upstream.
- Pin DO instances per tenant — better cache locality, lower CPU-ms.
- Use SQLite-backed DO over KV-backed (cheaper, better included tier).
- Watch the 20:1 ratio: chatty clients eat your request budget.
- Use heartbeats only on idle paths — frequent pings still wake the DO.
- Re-test cost monthly — Cloudflare added storage billing in January 2026.
FAQ
What is the 20:1 WebSocket ratio? Cloudflare counts 20 incoming WebSocket messages as 1 billable request — making chatty real-time apps cheaper.
Does hibernation work mid-call? Yes — if no JavaScript handler is actively running, the DO can hibernate and the WebSocket stays open. Costs only resume when a handler runs.
Can I run STT and LLM in a Worker? You can call out to OpenAI/Deepgram from Workers, but you should not run inference inside a Worker — use Workers AI or external GPU.
Is this cheaper than self-hosted Erlang/Phoenix? At under 50k concurrent, Cloudflare wins by 5–10× on TCO. Above 250k, self-hosted starts to compete.
What about R2 for audio storage? $0.015/GB-month with zero egress is the cheapest place to keep call recordings. Pair with DO for control plane.
Sources
- Cloudflare Durable Objects Pricing — https://developers.cloudflare.com/durable-objects/platform/pricing/
- Cloudflare Workers Pricing — https://developers.cloudflare.com/workers/platform/pricing/
- Cloudflare Hibernation API — https://developers.cloudflare.com/durable-objects/best-practices/websockets/
- New Workers Pricing announcement — https://blog.cloudflare.com/workers-pricing-scale-to-zero/
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.