Skip to content
AI Infrastructure
AI Infrastructure11 min read0 views

Cloudflare Workers + Durable Objects at 10k Concurrent: Real Cost

We modeled 10,000 concurrent voice agent WebSockets on Cloudflare. With hibernation and the 20:1 message ratio, the bill lands surprisingly low. Here is the line-by-line math.

We modeled 10,000 concurrent voice agent WebSockets on Cloudflare. With hibernation and the 20:1 message ratio, the bill lands surprisingly low. Here is the line-by-line math.

The cost problem

flowchart LR
  Twilio["Twilio Media Streams"] -- "WS · μlaw 8kHz" --> Bridge["FastAPI Bridge :8084"]
  Bridge -- "PCM16 24kHz" --> OAI["OpenAI Realtime"]
  OAI --> Bridge
  Bridge --> Twilio
  Bridge --> Logs[(structured logs · OTel)]
CallSphere reference architecture

If you are building a chat or voice agent platform that needs to hold persistent WebSocket connections — for control messages, transcript streaming, or session state — the cheapest place to do that in 2026 is almost always Cloudflare Workers + Durable Objects.

But the pricing has three knobs (requests, GB-seconds, WebSocket message ratios) and people confuse "incoming WebSocket message" with "request" and end up with billing surprises. Let us walk it.

How Cloudflare prices it

Workers Paid plan ($5/month minimum) includes:

  • 10M Workers requests/month
  • 30M CPU-ms/month

Durable Objects pricing on top of Workers Paid:

  • 1M DO requests/month included; $0.15 per million after
  • 400k GB-seconds/month included; $12.50 per million GB-s after
  • WebSocket incoming messages: 20:1 billing ratio (20 messages = 1 billable request)
  • Outgoing messages and protocol pings: free
  • Each new WebSocket connection counts as 1 request

Storage (SQLite-backed DO, billed January 2026 onward):

  • 25B row reads/month free, then $0.001/M
  • 50M row writes/month free, then $1.00/M
  • 5 GB-month included, then $0.20/GB-month

Hibernation API:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
  • Clients stay connected while the DO is hibernated
  • GB-second charges do NOT accrue during hibernation

Honest math: 10,000 concurrent WebSockets

Pretend a typical voice agent control plane:

  • 10,000 concurrent connections held for an average of 8 minutes each
  • 5 control messages per second per connection (transcript chunks, tool events)
  • Each connection makes 80 storage row writes (turn-by-turn log)

Connection count math:

  • 10,000 concurrent × (60 / 8) connections per hour per slot = 75,000 new connections/hour
  • 75k × 24 × 30 = 54M connections/month

Connection cost (each new = 1 request):

  • 54M × $0.15 / 1M = $8.10

Incoming WebSocket message cost:

  • 54M conns × 8 min × 60s × 5 msgs/s = 1.296B incoming messages
  • 1.296B × (1 / 20) ratio = 64.8M billable requests
  • 64.8M × $0.15 / 1M = $9.72

GB-seconds (assume 32MB per DO instance, hibernated 50% of the time):

  • Active DO-seconds: 54M × 8 min × 60s × 0.5 = 12.96B DO-seconds active
  • Active DO GB-seconds: 12.96B × 0.032 = 415M GB-s
  • Cost: (415M − 0.4M free) × $12.50 / 1M = $5,180

That is the big line item: GB-seconds. Hibernation matters enormously here — if you hibernate 80% of the time instead of 50%, GB-seconds drop to ~$2,070.

Storage:

  • 54M conns × 80 writes = 4.32B row writes/month
  • (4.32B − 50M free) × $1 / 1M = $4,270

Storage reads (assume 5x per write):

  • 21.6B reads, free at 25B/month included for typical pricing → ~$0

Egress / Workers requests:

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

  • 54M × 1 = $0.15/M handled in DO request cost above

Total at 10k concurrent: ~$5,200 on GB-seconds + ~$4,270 storage writes + $18 requests = **$9,488/month**.

That is roughly $0.95 per 1,000 concurrent voice sessions — extraordinary if you are coming from Pusher, Ably, or self-hosted Erlang.

Optimization wins

  1. Aggressive hibernation. The 80% hibernated case cuts the bill by 40%.
  2. Batch row writes. 80 per call to 12 per call cuts storage from $4,270 to ~$640.
  3. Use Workers WebSockets directly without a DO when you do not need state — that path bills at flat Workers rates and avoids the 20:1 ratio entirely. Best for fanout-only patterns.

How CallSphere optimizes

CallSphere uses Cloudflare Workers + Durable Objects for the chat agent control plane on three of the 6 verticals (Sales, Salon GlamBook, OneRoof Real Estate) — voice audio itself flows over OpenAI Realtime or LiveKit, but the session state, transcript streaming, and per-tenant routing live on Cloudflare.

We hit ~85% hibernation rate on idle DOs, batch row writes to 8 per call, and use a single Worker route for all 6 verticals (multi-tenant) with the tenant ID hashed into the DO ID. Net cost across 6 verticals — 37 agents, 90+ tools, 115+ DB tables — is well under $400/mo on Cloudflare for the realtime control plane.

That savings is part of why our pricing tiers ($149 / $499 / $1499) work for SMB margins and the affiliate program is sustainable. Try the 14-day no-card trial to see the snappy chat product cards on /demo — that is the Cloudflare-DO pipeline in action.

Optimization checklist

  1. Use Hibernation API everywhere idle WebSocket connections sit.
  2. Batch row writes to once per turn instead of per message.
  3. Compact transcript snapshots — store deltas, not full state.
  4. Use Workers WebSockets without a DO if you do not need state (for fanout).
  5. Avoid storing audio in DO storage — push it to R2 or upstream.
  6. Pin DO instances per tenant — better cache locality, lower CPU-ms.
  7. Use SQLite-backed DO over KV-backed (cheaper, better included tier).
  8. Watch the 20:1 ratio: chatty clients eat your request budget.
  9. Use heartbeats only on idle paths — frequent pings still wake the DO.
  10. Re-test cost monthly — Cloudflare added storage billing in January 2026.

FAQ

What is the 20:1 WebSocket ratio? Cloudflare counts 20 incoming WebSocket messages as 1 billable request — making chatty real-time apps cheaper.

Does hibernation work mid-call? Yes — if no JavaScript handler is actively running, the DO can hibernate and the WebSocket stays open. Costs only resume when a handler runs.

Can I run STT and LLM in a Worker? You can call out to OpenAI/Deepgram from Workers, but you should not run inference inside a Worker — use Workers AI or external GPU.

Is this cheaper than self-hosted Erlang/Phoenix? At under 50k concurrent, Cloudflare wins by 5–10× on TCO. Above 250k, self-hosted starts to compete.

What about R2 for audio storage? $0.015/GB-month with zero egress is the cheapest place to keep call recordings. Pair with DO for control plane.

Sources

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

AI Engineering

Latency vs Cost: A Decision Matrix for Voice AI Spend in 2026

Every 100ms of latency costs you. So does every cent per minute. Here is the decision matrix we use across 6 verticals to pick where to spend and where to save on voice AI infrastructure.

AI Infrastructure

Monitoring WebSocket Health: Heartbeats and Prometheus in 2026

How to actually observe a WebSocket fleet: ping/pong heartbeats, Prometheus metrics that matter, dead-man switches, and the alerts that fire before customers notice.

AI Infrastructure

OpenAI's May 2026 WebRTC Rearchitecture: How Voice Latency Got Real

On May 4 2026 OpenAI published its Realtime stack rebuild — split-relay plus transceiver edge. Here is what changed and what it means for production voice agents.

AI Infrastructure

Cloudflare Agents SDK 2026: Durable Objects, MCP, and Code Mode at the Edge

Each Cloudflare agent runs on a Durable Object with its own SQLite, WebSockets, and scheduling. Agents Week 2026 shipped MCP, Code Mode, and 10GB SQLite per agent.

AI Infrastructure

AWS Bedrock + Transcribe + Polly Stitched vs Realtime: Real Cost

Bedrock Claude + Transcribe streaming + Polly Neural runs $0.06–$0.10 per minute on paper. The honest math reveals where the AWS-native stack beats and where it loses to OpenAI Realtime.

AI Strategy

Agent Memory Cost Modeling in 2026: An Honest Numbers Walkthrough

Embeddings, vector storage, graph nodes, and recall API calls all add up faster than expected. The cost model for serving 100k users with agent memory at scale.