---
title: "Build a Cloudflare Workers + Durable Objects Voice Agent"
description: "Per-call state with Durable Objects, voice transport with Cloudflare Realtime, and tools via the Agents SDK. Real Workers code that scales globally."
canonical: https://callsphere.ai/blog/vw2h-build-cloudflare-workers-durable-objects-voice-agent
category: "AI Infrastructure"
tags: ["Tutorial", "Build", "Cloudflare Workers", "Durable Objects", "Agents SDK"]
author: "CallSphere Team"
published: 2026-04-09T00:00:00.000Z
updated: 2026-05-07T09:27:40.812Z
---

# Build a Cloudflare Workers + Durable Objects Voice Agent

> Per-call state with Durable Objects, voice transport with Cloudflare Realtime, and tools via the Agents SDK. Real Workers code that scales globally.

> **TL;DR** — Cloudflare's Agents SDK gives you per-call `Agent` instances backed by Durable Objects, with WebSocket voice transport and SQLite-backed conversation history. ~30 lines of server code.

## What you'll build

A Cloudflare Worker exposing a `/voice` endpoint. Each connecting client gets a dedicated Durable Object (one per call) running the Agents SDK's `withVoice` mixin. STT comes from Workers AI Whisper Flux, TTS from Aura, and the LLM from `@cf/meta/llama-3.3-70b-instruct`.

## Prerequisites

1. Cloudflare account with Workers paid plan ($5/mo) for DO compute.
2. `npm create cloudflare@latest -- --template cloudflare/agents-starter`.
3. `wrangler 4+` and `AI` binding enabled.
4. A simple HTML client that opens `wss://your-worker/voice`.
5. Familiarity with Durable Objects.

## Architecture

```mermaid
flowchart LR
  B[Browser] -- ws --> W[Worker]
  W -- routeAgentRequest --> DO[(Durable Object: VoiceAgent)]
  DO -- Workers AI --> ST[Whisper Flux]
  DO -- Workers AI --> LL[Llama 3.3 70B]
  DO -- Workers AI --> TT[Aura TTS]
```

## Step 1 — `wrangler.jsonc`

```jsonc
{
  "name": "callsphere-voice",
  "main": "src/index.ts",
  "compatibility_date": "2026-05-01",
  "ai": { "binding": "AI" },
  "durable_objects": {
    "bindings": [{ "name": "VoiceAgent", "class_name": "VoiceAgent" }]
  },
  "migrations": [
    { "tag": "v1", "new_sqlite_classes": ["VoiceAgent"] }
  ]
}
```

## Step 2 — The Agent class

```typescript
import { Agent, routeAgentRequest } from "agents";
import { withVoice, WorkersAIFluxSTT, WorkersAITTS } from "agents/voice";

type Env = { AI: Ai; VoiceAgent: DurableObjectNamespace };

export class VoiceAgent extends withVoice(Agent, {
  stt: new WorkersAIFluxSTT({ model: "@cf/openai/whisper-large-v3-turbo" }),
  tts: new WorkersAITTS({ model: "@cf/deepgram/aura-1" }),
}) {
  async onChatMessage(messages: { role: string; content: string }[]) {
    const res = await this.env.AI.run("@cf/meta/llama-3.3-70b-instruct", {
      messages: [
        { role: "system",
          content: "You are CallSphere's CF agent. Be brief." },
        ...messages,
      ],
    });
    return res.response as string;
  }
}
```

## Step 3 — Worker entry

```typescript
export default {
  async fetch(req: Request, env: Env): Promise {
    return (await routeAgentRequest(req, env)) ??
           new Response("not found", { status: 404 });
  },
};
```

`routeAgentRequest` automatically routes `/agents/voice-agent//voice` to the Durable Object.

## Step 4 — Browser client

```html

```

## Step 5 — Add a tool that hits your CRM

The Agents SDK exposes `this.callable`:

```typescript
async getNextAppointment(params: { customerId: string }) {
  const r = await fetch(`[https://crm.callsphere.ai/appt/${params.customerId}\`](https://crm.callsphere.ai/appt/$%7Bparams.customerId%7D%5C%60), {
    headers: { Authorization: `Bearer ${this.env.CRM_TOKEN}` }
  });
  return r.json();
}
```

Reference it in the system prompt; `onChatMessage` will route the call.

## Step 6 — Deploy

```bash
wrangler deploy
```

Cloudflare instantiates one Durable Object per session ID, runs it on the closest colo, and persists conversation history in SQLite-backed DO storage.

## Common pitfalls

- **Forgetting `new_sqlite_classes` migration** — without it, `this.sql` is unavailable.
- **High DO request bills** — DOs charge per request; batch updates if you can.
- **Aura TTS sample rate** — defaults to 24kHz; resample on the client if needed.
- **WebSocket hibernation** — DOs hibernate; use `hibernatable WebSockets` or your DC will time out.

## How CallSphere does this in production

CallSphere uses Cloudflare for edge cache + image resize, but our voice plane is Pion Go for [Real Estate](/industries/real-estate) and FastAPI :8084 for [Healthcare](/lp/healthcare) — both feeding the same 115-table Postgres. CF Workers is a great fit for low-volume verticals; we use it for our [affiliate](/affiliate) referral tracking.

## FAQ

**Cold start?** ~10ms — DOs hibernate but resume nearly instantly.

**SQLite limits?** 10GB per DO, 1k writes/sec.

**Can I bring my own LLM?** Yes — proxy from `onChatMessage` to OpenAI or Anthropic.

**Pricing for 1k calls/day?** ~$8/mo CF + LLM tokens.

**Voice + WebRTC?** Use Cloudflare Realtime SFU; it converts Opus to PCM for your DO.

## Sources

- [Cloudflare Agents docs — Voice agents](https://developers.cloudflare.com/agents/api-reference/voice/)
- [Cloudflare blog — Add voice to your agent](https://blog.cloudflare.com/voice-agents/)
- [cloudflare/agents repo](https://github.com/cloudflare/agents)
- [Build a voice agent guide](https://developers.cloudflare.com/agents/guides/build-a-voice-agent/)

---

Source: https://callsphere.ai/blog/vw2h-build-cloudflare-workers-durable-objects-voice-agent
