Cloudflare Workers + Durable Objects at 10k Concurrent: Real Cost
We modeled 10,000 concurrent voice agent WebSockets on Cloudflare. With hibernation and the 20:1 message ratio, the bill lands surprisingly low. Here is the line-by-line math.
We modeled 10,000 concurrent voice agent WebSockets on Cloudflare. With hibernation and the 20:1 message ratio, the bill lands surprisingly low. Here is the line-by-line math.
The cost problem
flowchart LR
Twilio["Twilio Media Streams"] -- "WS · μlaw 8kHz" --> Bridge["FastAPI Bridge :8084"]
Bridge -- "PCM16 24kHz" --> OAI["OpenAI Realtime"]
OAI --> Bridge
Bridge --> Twilio
Bridge --> Logs[(structured logs · OTel)]If you are building a chat or voice agent platform that needs to hold persistent WebSocket connections — for control messages, transcript streaming, or session state — the cheapest place to do that in 2026 is almost always Cloudflare Workers + Durable Objects.
But the pricing has three knobs (requests, GB-seconds, WebSocket message ratios) and people confuse "incoming WebSocket message" with "request" and end up with billing surprises. Let us walk it.
How Cloudflare prices it
Workers Paid plan ($5/month minimum) includes:
- 10M Workers requests/month
- 30M CPU-ms/month
Durable Objects pricing on top of Workers Paid:
- 1M DO requests/month included; $0.15 per million after
- 400k GB-seconds/month included; $12.50 per million GB-s after
- WebSocket incoming messages: 20:1 billing ratio (20 messages = 1 billable request)
- Outgoing messages and protocol pings: free
- Each new WebSocket connection counts as 1 request
Storage (SQLite-backed DO, billed January 2026 onward):
- 25B row reads/month free, then $0.001/M
- 50M row writes/month free, then $1.00/M
- 5 GB-month included, then $0.20/GB-month
Hibernation API:
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
- Clients stay connected while the DO is hibernated
- GB-second charges do NOT accrue during hibernation
Honest math: 10,000 concurrent WebSockets
Pretend a typical voice agent control plane:
- 10,000 concurrent connections held for an average of 8 minutes each
- 5 control messages per second per connection (transcript chunks, tool events)
- Each connection makes 80 storage row writes (turn-by-turn log)
Connection count math:
- 10,000 concurrent × (60 / 8) connections per hour per slot = 75,000 new connections/hour
- 75k × 24 × 30 = 54M connections/month
Connection cost (each new = 1 request):
- 54M × $0.15 / 1M = $8.10
Incoming WebSocket message cost:
- 54M conns × 8 min × 60s × 5 msgs/s = 1.296B incoming messages
- 1.296B × (1 / 20) ratio = 64.8M billable requests
- 64.8M × $0.15 / 1M = $9.72
GB-seconds (assume 32MB per DO instance, hibernated 50% of the time):
- Active DO-seconds: 54M × 8 min × 60s × 0.5 = 12.96B DO-seconds active
- Active DO GB-seconds: 12.96B × 0.032 = 415M GB-s
- Cost: (415M − 0.4M free) × $12.50 / 1M = $5,180
That is the big line item: GB-seconds. Hibernation matters enormously here — if you hibernate 80% of the time instead of 50%, GB-seconds drop to ~$2,070.
Storage:
- 54M conns × 80 writes = 4.32B row writes/month
- (4.32B − 50M free) × $1 / 1M = $4,270
Storage reads (assume 5x per write):
- 21.6B reads, free at 25B/month included for typical pricing → ~$0
Egress / Workers requests:
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
- 54M × 1 = $0.15/M handled in DO request cost above
Total at 10k concurrent: ~$5,200 on GB-seconds + ~$4,270 storage writes + $18 requests = **$9,488/month**.
That is roughly $0.95 per 1,000 concurrent voice sessions — extraordinary if you are coming from Pusher, Ably, or self-hosted Erlang.
Optimization wins
- Aggressive hibernation. The 80% hibernated case cuts the bill by 40%.
- Batch row writes. 80 per call to 12 per call cuts storage from $4,270 to ~$640.
- Use Workers WebSockets directly without a DO when you do not need state — that path bills at flat Workers rates and avoids the 20:1 ratio entirely. Best for fanout-only patterns.
How CallSphere optimizes
CallSphere uses Cloudflare Workers + Durable Objects for the chat agent control plane on three of the 6 verticals (Sales, Salon GlamBook, OneRoof Real Estate) — voice audio itself flows over OpenAI Realtime or LiveKit, but the session state, transcript streaming, and per-tenant routing live on Cloudflare.
We hit ~85% hibernation rate on idle DOs, batch row writes to 8 per call, and use a single Worker route for all 6 verticals (multi-tenant) with the tenant ID hashed into the DO ID. Net cost across 6 verticals — 37 agents, 90+ tools, 115+ DB tables — is well under $400/mo on Cloudflare for the realtime control plane.
That savings is part of why our pricing tiers ($149 / $499 / $1499) work for SMB margins and the affiliate program is sustainable. Try the 14-day no-card trial to see the snappy chat product cards on /demo — that is the Cloudflare-DO pipeline in action.
Optimization checklist
- Use Hibernation API everywhere idle WebSocket connections sit.
- Batch row writes to once per turn instead of per message.
- Compact transcript snapshots — store deltas, not full state.
- Use Workers WebSockets without a DO if you do not need state (for fanout).
- Avoid storing audio in DO storage — push it to R2 or upstream.
- Pin DO instances per tenant — better cache locality, lower CPU-ms.
- Use SQLite-backed DO over KV-backed (cheaper, better included tier).
- Watch the 20:1 ratio: chatty clients eat your request budget.
- Use heartbeats only on idle paths — frequent pings still wake the DO.
- Re-test cost monthly — Cloudflare added storage billing in January 2026.
FAQ
What is the 20:1 WebSocket ratio? Cloudflare counts 20 incoming WebSocket messages as 1 billable request — making chatty real-time apps cheaper.
Does hibernation work mid-call? Yes — if no JavaScript handler is actively running, the DO can hibernate and the WebSocket stays open. Costs only resume when a handler runs.
Can I run STT and LLM in a Worker? You can call out to OpenAI/Deepgram from Workers, but you should not run inference inside a Worker — use Workers AI or external GPU.
Is this cheaper than self-hosted Erlang/Phoenix? At under 50k concurrent, Cloudflare wins by 5–10× on TCO. Above 250k, self-hosted starts to compete.
What about R2 for audio storage? $0.015/GB-month with zero egress is the cheapest place to keep call recordings. Pair with DO for control plane.
Sources
- Cloudflare Durable Objects Pricing — https://developers.cloudflare.com/durable-objects/platform/pricing/
- Cloudflare Workers Pricing — https://developers.cloudflare.com/workers/platform/pricing/
- Cloudflare Hibernation API — https://developers.cloudflare.com/durable-objects/best-practices/websockets/
- New Workers Pricing announcement — https://blog.cloudflare.com/workers-pricing-scale-to-zero/
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.