By Sagar Shankaran, Founder of CallSphere
Per-call state with Durable Objects, voice transport with Cloudflare Realtime, and tools via the Agents SDK. Real Workers code that scales globally.
Key takeaways
TL;DR — Cloudflare's Agents SDK gives you per-call
Agentinstances backed by Durable Objects, with WebSocket voice transport and SQLite-backed conversation history. ~30 lines of server code.
A Cloudflare Worker exposing a /voice endpoint. Each connecting client gets a dedicated Durable Object (one per call) running the Agents SDK's withVoice mixin. STT comes from Workers AI Whisper Flux, TTS from Aura, and the LLM from @cf/meta/llama-3.3-70b-instruct.
npm create cloudflare@latest -- --template cloudflare/agents-starter.wrangler 4+ and AI binding enabled.wss://your-worker/voice.flowchart LR
B[Browser] -- ws --> W[Worker]
W -- routeAgentRequest --> DO[(Durable Object: VoiceAgent)]
DO -- Workers AI --> ST[Whisper Flux]
DO -- Workers AI --> LL[Llama 3.3 70B]
DO -- Workers AI --> TT[Aura TTS]
wrangler.jsonc```jsonc { "name": "callsphere-voice", "main": "src/index.ts", "compatibility_date": "2026-05-01", "ai": { "binding": "AI" }, "durable_objects": { "bindings": [{ "name": "VoiceAgent", "class_name": "VoiceAgent" }] }, "migrations": [ { "tag": "v1", "new_sqlite_classes": ["VoiceAgent"] } ] } ```
```typescript import { Agent, routeAgentRequest } from "agents"; import { withVoice, WorkersAIFluxSTT, WorkersAITTS } from "agents/voice";
type Env = { AI: Ai; VoiceAgent: DurableObjectNamespace };
export class VoiceAgent extends withVoice(Agent
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
```typescript
export default {
async fetch(req: Request, env: Env): Promise
routeAgentRequest automatically routes /agents/voice-agent/<id>/voice to the Durable Object.
```html
```
The Agents SDK exposes this.callable:
```typescript async getNextAppointment(params: { customerId: string }) { const r = await fetch(`https://crm.callsphere.ai/appt/${params.customerId}\`, { headers: { Authorization: `Bearer ${this.env.CRM_TOKEN}` } }); return r.json(); } ```
Reference it in the system prompt; onChatMessage will route the call.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
```bash wrangler deploy ```
Cloudflare instantiates one Durable Object per session ID, runs it on the closest colo, and persists conversation history in SQLite-backed DO storage.
new_sqlite_classes migration — without it, this.sql is unavailable.hibernatable WebSockets or your DC will time out.CallSphere uses Cloudflare for edge cache + image resize, but our voice plane is Pion Go for Real Estate and FastAPI :8084 for Healthcare — both feeding the same 115-table Postgres. CF Workers is a great fit for low-volume verticals; we use it for our affiliate referral tracking.
Cold start? ~10ms — DOs hibernate but resume nearly instantly.
SQLite limits? 10GB per DO, 1k writes/sec.
Can I bring my own LLM? Yes — proxy from onChatMessage to OpenAI or Anthropic.
Pricing for 1k calls/day? ~$8/mo CF + LLM tokens.
Voice + WebRTC? Use Cloudflare Realtime SFU; it converts Opus to PCM for your DO.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
Haystack 2.7's Agent component plus an Ollama-served Llama 3.2 gives you tool-calling RAG with citations. Here's a complete pipeline against your own document store.
Run STT, LLM, and TTS entirely on Cloudflare's edge — no OpenAI, no ElevenLabs. Real working code with Whisper, Llama 3.3 70B, and Deepgram Aura.
Version your prompts in git, run a 50-case eval suite on every PR, block merges below threshold, and ship a new agent prompt with confidence — full GitHub Actions tutorial.
Replace expensive outbound SDR tooling with a self-hosted dialer that runs OpenAI Realtime agents at 100 concurrent calls. Full architecture and code.
Each Cloudflare agent runs on a Durable Object with its own SQLite, WebSockets, and scheduling. Agents Week 2026 shipped MCP, Code Mode, and 10GB SQLite per agent.
HVAC companies miss 40–60% of inbound. Build a 4-agent dispatch (intake, scheduling, parts, emergency) that integrates with ServiceTitan in 600 lines.
© 2026 CallSphere LLC. All rights reserved.
Watch how CallSphere handles real customer calls, schedules appointments, and processes payments — live.
Try Live DemoBook a DemoCalculate Your ROI