Build a Cloudflare Workers + Durable Objects Voice Agent
Per-call state with Durable Objects, voice transport with Cloudflare Realtime, and tools via the Agents SDK. Real Workers code that scales globally.
TL;DR — Cloudflare's Agents SDK gives you per-call
Agentinstances backed by Durable Objects, with WebSocket voice transport and SQLite-backed conversation history. ~30 lines of server code.
What you'll build
A Cloudflare Worker exposing a /voice endpoint. Each connecting client gets a dedicated Durable Object (one per call) running the Agents SDK's withVoice mixin. STT comes from Workers AI Whisper Flux, TTS from Aura, and the LLM from @cf/meta/llama-3.3-70b-instruct.
Prerequisites
- Cloudflare account with Workers paid plan ($5/mo) for DO compute.
npm create cloudflare@latest -- --template cloudflare/agents-starter.wrangler 4+andAIbinding enabled.- A simple HTML client that opens
wss://your-worker/voice. - Familiarity with Durable Objects.
Architecture
flowchart LR
B[Browser] -- ws --> W[Worker]
W -- routeAgentRequest --> DO[(Durable Object: VoiceAgent)]
DO -- Workers AI --> ST[Whisper Flux]
DO -- Workers AI --> LL[Llama 3.3 70B]
DO -- Workers AI --> TT[Aura TTS]
Step 1 — wrangler.jsonc
```jsonc { "name": "callsphere-voice", "main": "src/index.ts", "compatibility_date": "2026-05-01", "ai": { "binding": "AI" }, "durable_objects": { "bindings": [{ "name": "VoiceAgent", "class_name": "VoiceAgent" }] }, "migrations": [ { "tag": "v1", "new_sqlite_classes": ["VoiceAgent"] } ] } ```
Step 2 — The Agent class
```typescript import { Agent, routeAgentRequest } from "agents"; import { withVoice, WorkersAIFluxSTT, WorkersAITTS } from "agents/voice";
type Env = { AI: Ai; VoiceAgent: DurableObjectNamespace };
export class VoiceAgent extends withVoice(Agent
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Step 3 — Worker entry
```typescript
export default {
async fetch(req: Request, env: Env): Promise
routeAgentRequest automatically routes /agents/voice-agent/<id>/voice to the Durable Object.
Step 4 — Browser client
```html
```
Step 5 — Add a tool that hits your CRM
The Agents SDK exposes this.callable:
```typescript async getNextAppointment(params: { customerId: string }) { const r = await fetch(`https://crm.callsphere.ai/appt/${params.customerId}\`, { headers: { Authorization: `Bearer ${this.env.CRM_TOKEN}` } }); return r.json(); } ```
Reference it in the system prompt; onChatMessage will route the call.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Step 6 — Deploy
```bash wrangler deploy ```
Cloudflare instantiates one Durable Object per session ID, runs it on the closest colo, and persists conversation history in SQLite-backed DO storage.
Common pitfalls
- Forgetting
new_sqlite_classesmigration — without it,this.sqlis unavailable. - High DO request bills — DOs charge per request; batch updates if you can.
- Aura TTS sample rate — defaults to 24kHz; resample on the client if needed.
- WebSocket hibernation — DOs hibernate; use
hibernatable WebSocketsor your DC will time out.
How CallSphere does this in production
CallSphere uses Cloudflare for edge cache + image resize, but our voice plane is Pion Go for Real Estate and FastAPI :8084 for Healthcare — both feeding the same 115-table Postgres. CF Workers is a great fit for low-volume verticals; we use it for our affiliate referral tracking.
FAQ
Cold start? ~10ms — DOs hibernate but resume nearly instantly.
SQLite limits? 10GB per DO, 1k writes/sec.
Can I bring my own LLM? Yes — proxy from onChatMessage to OpenAI or Anthropic.
Pricing for 1k calls/day? ~$8/mo CF + LLM tokens.
Voice + WebRTC? Use Cloudflare Realtime SFU; it converts Opus to PCM for your DO.
Sources
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.