By Sagar Shankaran, Founder of CallSphere
Durable Objects became the cheapest way to fan out WebSocket events at planet scale in 2026. How hibernation works, when to use it, and the limits to plan around.
Key takeaways
Cloudflare's WebSocket Hibernation API turned an idle connection from "hold a process" to "hold a row in a database." That changes the math on stateful realtime fan-out.
flowchart LR
Browser["Browser / Phone"] -- "WebSocket /ws" --> LB["Load Balancer<br/>sticky session"]
LB --> Pod1["Node A · Socket.IO"]
LB --> Pod2["Node B · Socket.IO"]
Pod1 -- "pub/sub" --> Redis[("Redis cluster")]
Pod2 -- "pub/sub" --> Redis
Pod1 --> AI["AI Worker · OpenAI Realtime"]
Pod2 --> AIThey solve the "stateful WebSocket room without a server" problem. In a traditional architecture, every chat room or call session needs at least one process holding open WebSockets and routing messages between participants. Idle rooms still cost CPU and RAM. Durable Objects flip that: each room is a single-instance object on Cloudflare's edge, every WebSocket can hibernate while idle, and the platform charges you only when something actually happens.
The result is a fan-out primitive where one Durable Object can hold thousands of clients, you can spawn millions of objects, and the cost graph tracks active conversations instead of provisioned capacity.
A Durable Object opens WebSockets via state.acceptWebSocket() instead of the standard server.accept(). After accept, the object can return to dormancy. When a client sends a message, Cloudflare's runtime resurrects the object, calls webSocketMessage(ws, msg), and lets it go back to sleep when done.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Three things change because of this:
state.storage.web_socket_auto_reply_to_close makes connection teardowns transition cleanly so you stop seeing zombie sockets in the CLOSING state.For AI agents, this is gold for use cases like idle sessions waiting for the next utterance, multi-participant rooms with long pauses, and dashboard subscriptions where the user is logged in but not actively interacting.
CallSphere uses Durable Objects for two specific surfaces:
The core Sales Calling and Healthcare paths still use Socket.IO and OpenAI Realtime over WebSocket because they need server-side audio access and on-prem control. Durable Objects own the lighter surfaces where edge proximity and cost shape matter more than custom audio handling.
export class CallRoom {
constructor(private state: DurableObjectState) {}
async fetch(req: Request): Promise<Response> {
const pair = new WebSocketPair();
const [client, server] = Object.values(pair);
this.state.acceptWebSocket(server); // hibernation-aware
return new Response(null, { status: 101, webSocket: client });
}
webSocketMessage(ws: WebSocket, msg: string) {
for (const sock of this.state.getWebSockets()) {
if (sock !== ws) sock.send(msg);
}
}
webSocketClose(ws: WebSocket) { ws.close(); }
}
compatibility_date = "2026-04-07" or later in wrangler.toml to enable auto-close-handshake.state.acceptWebSocket(ws) not ws.accept() to opt into hibernation.state.storage because the object can hibernate between events.getWebSockets() for fan-out instead of holding your own Set.How many WebSockets per object? Cloudflare advises planning for low thousands per DO, then sharding by chat room or session ID across many DOs.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
What happens during deploys? Durable Objects relocate cleanly; clients see a brief reconnect. Implement reconnection with exponential backoff and you will not notice.
Can I run AI inference inside a DO? You can call out to Workers AI, OpenAI, or any HTTP endpoint. Long-running inference inside the DO event handler should be avoided — use Queues to push to a worker.
How does pricing compare to Socket.IO on EC2? Below 5k peak concurrent, DO is dramatically cheaper. Above 100k peak concurrent and constant traffic, a self-managed cluster still wins on per-message cost.
Is the API stable? As of 2026 yes — the hibernation API is GA and the auto-reply-to-close flag is the default.
CallSphere combines Cloudflare edge with our 115+ database tables for $149/$499/$1499 plans. Start the 14-day trial or book a demo.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
How to actually observe a WebSocket fleet: ping/pong heartbeats, Prometheus metrics that matter, dead-man switches, and the alerts that fire before customers notice.
The 2024 NPRM proposes mandatory penetration tests every 12 months and vulnerability scans every 6 months. Here is how an AI voice agent should be tested in 2026.
Each Cloudflare agent runs on a Durable Object with its own SQLite, WebSockets, and scheduling. Agents Week 2026 shipped MCP, Code Mode, and 10GB SQLite per agent.
By April 2026 CoreWeave shares are trading roughly 60% above its March 2024 IPO price, with Q1 2026 earnings re-rating the AI infrastructure cohort.
Infrastructure-level look at Claude Sonnet 4.6 Bedrock, including AWS AI, deployment topology, region availability, and cost considerations.
Infrastructure-level look at Claude Vertex Oregon, including Pacific Northwest cloud, deployment topology, region availability, and cost considerations.
© 2026 CallSphere LLC. All rights reserved.