We modeled 10,000 concurrent voice agent WebSockets on Cloudflare. With hibernation and the 20:1 message ratio, the bill lands surprisingly low. Here is the line-by-line math.

The cost problem

flowchart LR
  Twilio["Twilio Media Streams"] -- "WS · μlaw 8kHz" --> Bridge["FastAPI Bridge :8084"]
  Bridge -- "PCM16 24kHz" --> OAI["OpenAI Realtime"]
  OAI --> Bridge
  Bridge --> Twilio
  Bridge --> Logs[(structured logs · OTel)]

CallSphere reference architecture

If you are building a chat or voice agent platform that needs to hold persistent WebSocket connections — for control messages, transcript streaming, or session state — the cheapest place to do that in 2026 is almost always Cloudflare Workers + Durable Objects.

But the pricing has three knobs (requests, GB-seconds, WebSocket message ratios) and people confuse "incoming WebSocket message" with "request" and end up with billing surprises. Let us walk it.

How Cloudflare prices it

Workers Paid plan ($5/month minimum) includes:

10M Workers requests/month
30M CPU-ms/month

Durable Objects pricing on top of Workers Paid:

1M DO requests/month included; $0.15 per million after
400k GB-seconds/month included; $12.50 per million GB-s after
WebSocket incoming messages: 20:1 billing ratio (20 messages = 1 billable request)
Outgoing messages and protocol pings: free
Each new WebSocket connection counts as 1 request

Storage (SQLite-backed DO, billed January 2026 onward):

25B row reads/month free, then $0.001/M
50M row writes/month free, then $1.00/M
5 GB-month included, then $0.20/GB-month

Hibernation API:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

Clients stay connected while the DO is hibernated
GB-second charges do NOT accrue during hibernation

Honest math: 10,000 concurrent WebSockets

Pretend a typical voice agent control plane:

10,000 concurrent connections held for an average of 8 minutes each
5 control messages per second per connection (transcript chunks, tool events)
Each connection makes 80 storage row writes (turn-by-turn log)

Connection count math:

10,000 concurrent × (60 / 8) connections per hour per slot = 75,000 new connections/hour
75k × 24 × 30 = 54M connections/month

Connection cost (each new = 1 request):

54M × $0.15 / 1M = $8.10

Incoming WebSocket message cost:

54M conns × 8 min × 60s × 5 msgs/s = 1.296B incoming messages
1.296B × (1 / 20) ratio = 64.8M billable requests
64.8M × $0.15 / 1M = $9.72

GB-seconds (assume 32MB per DO instance, hibernated 50% of the time):

Active DO-seconds: 54M × 8 min × 60s × 0.5 = 12.96B DO-seconds active
Active DO GB-seconds: 12.96B × 0.032 = 415M GB-s
Cost: (415M − 0.4M free) × $12.50 / 1M = $5,180

That is the big line item: GB-seconds. Hibernation matters enormously here — if you hibernate 80% of the time instead of 50%, GB-seconds drop to ~$2,070.

Storage:

54M conns × 80 writes = 4.32B row writes/month
(4.32B − 50M free) × $1 / 1M = $4,270

Storage reads (assume 5x per write):

21.6B reads, free at 25B/month included for typical pricing → ~$0

Egress / Workers requests:

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

54M × 1 = $0.15/M handled in DO request cost above

Total at 10k concurrent: ~$5,200 on GB-seconds + ~$4,270 storage writes + $18 requests = **$9,488/month**.

That is roughly $0.95 per 1,000 concurrent voice sessions — extraordinary if you are coming from Pusher, Ably, or self-hosted Erlang.

Optimization wins

Aggressive hibernation. The 80% hibernated case cuts the bill by 40%.
Batch row writes. 80 per call to 12 per call cuts storage from $4,270 to ~$640.
Use Workers WebSockets directly without a DO when you do not need state — that path bills at flat Workers rates and avoids the 20:1 ratio entirely. Best for fanout-only patterns.

How CallSphere optimizes

CallSphere uses Cloudflare Workers + Durable Objects for the chat agent control plane on three of the 6 verticals (Sales, Salon GlamBook, OneRoof Real Estate) — voice audio itself flows over OpenAI Realtime or LiveKit, but the session state, transcript streaming, and per-tenant routing live on Cloudflare.

We hit ~85% hibernation rate on idle DOs, batch row writes to 8 per call, and use a single Worker route for all 6 verticals (multi-tenant) with the tenant ID hashed into the DO ID. Net cost across 6 verticals — 37 agents, 90+ tools, 115+ DB tables — is well under $400/mo on Cloudflare for the realtime control plane.

That savings is part of why our pricing tiers ($149 / $499 / $1499) work for SMB margins and the affiliate program is sustainable. Try the 14-day no-card trial to see the snappy chat product cards on /demo — that is the Cloudflare-DO pipeline in action.

Optimization checklist

Use Hibernation API everywhere idle WebSocket connections sit.
Batch row writes to once per turn instead of per message.
Compact transcript snapshots — store deltas, not full state.
Use Workers WebSockets without a DO if you do not need state (for fanout).
Avoid storing audio in DO storage — push it to R2 or upstream.
Pin DO instances per tenant — better cache locality, lower CPU-ms.
Use SQLite-backed DO over KV-backed (cheaper, better included tier).
Watch the 20:1 ratio: chatty clients eat your request budget.
Use heartbeats only on idle paths — frequent pings still wake the DO.
Re-test cost monthly — Cloudflare added storage billing in January 2026.

FAQ

What is the 20:1 WebSocket ratio? Cloudflare counts 20 incoming WebSocket messages as 1 billable request — making chatty real-time apps cheaper.

Does hibernation work mid-call? Yes — if no JavaScript handler is actively running, the DO can hibernate and the WebSocket stays open. Costs only resume when a handler runs.

Can I run STT and LLM in a Worker? You can call out to OpenAI/Deepgram from Workers, but you should not run inference inside a Worker — use Workers AI or external GPU.

Is this cheaper than self-hosted Erlang/Phoenix? At under 50k concurrent, Cloudflare wins by 5–10× on TCO. Above 250k, self-hosted starts to compete.

What about R2 for audio storage? $0.015/GB-month with zero egress is the cheapest place to keep call recordings. Pair with DO for control plane.

Sources

Cloudflare Durable Objects Pricing — https://developers.cloudflare.com/durable-objects/platform/pricing/
Cloudflare Workers Pricing — https://developers.cloudflare.com/workers/platform/pricing/
Cloudflare Hibernation API — https://developers.cloudflare.com/durable-objects/best-practices/websockets/
New Workers Pricing announcement — https://blog.cloudflare.com/workers-pricing-scale-to-zero/

Cloudflare Workers + Durable Objects at 10k Concurrent: Real Cost

The cost problem

How Cloudflare prices it

Honest math: 10,000 concurrent WebSockets

Optimization wins

How CallSphere optimizes

Optimization checklist

FAQ

Sources

Try CallSphere AI Voice Agents

Related Articles You May Like

Latency vs Cost: A Decision Matrix for Voice AI Spend in 2026

Monitoring WebSocket Health: Heartbeats and Prometheus in 2026

OpenAI's May 2026 WebRTC Rearchitecture: How Voice Latency Got Real

Cloudflare Agents SDK 2026: Durable Objects, MCP, and Code Mode at the Edge

AWS Bedrock + Transcribe + Polly Stitched vs Realtime: Real Cost

Agent Memory Cost Modeling in 2026: An Honest Numbers Walkthrough