---
title: "Build a Voice Agent on Cloudflare Workers AI (No External LLM)"
description: "Run STT, LLM, and TTS entirely on Cloudflare's edge — no OpenAI, no ElevenLabs. Real working code with Whisper, Llama 3.3 70B, and Deepgram Aura."
canonical: https://callsphere.ai/blog/vw2h-build-voice-agent-cloudflare-workers-ai-no-external-llm
category: "AI Engineering"
tags: ["Tutorial", "Build", "Cloudflare Workers AI", "Llama", "Whisper", "Aura"]
author: "CallSphere Team"
published: 2026-05-07T00:00:00.000Z
updated: 2026-05-07T09:27:42.650Z
---

# Build a Voice Agent on Cloudflare Workers AI (No External LLM)

> Run STT, LLM, and TTS entirely on Cloudflare's edge — no OpenAI, no ElevenLabs. Real working code with Whisper, Llama 3.3 70B, and Deepgram Aura.

> **TL;DR** — Cloudflare Workers AI ships Whisper, Llama 3.3 70B, and Deepgram Aura behind one `AI` binding. Build a voice agent with zero external API keys, zero per-token surprise bills, and global edge co-location for free.

## What you'll build

A Worker that takes a WebSocket of PCM16 audio frames, transcribes via `@cf/openai/whisper-large-v3-turbo`, generates a reply via `@cf/meta/llama-3.3-70b-instruct`, synthesizes via `@cf/deepgram/aura-1`, and streams audio back. End-to-end on the Cloudflare edge.

## Prerequisites

1. Cloudflare account with Workers Paid ($5/mo) and Workers AI access.
2. `wrangler 4+`.
3. `npm i agents` (the Cloudflare Agents SDK).
4. A static client that records 16kHz PCM via `AudioWorklet`.
5. Familiarity with TypeScript.

## Architecture

```mermaid
flowchart LR
  B[Browser PCM16] -- ws --> W[Worker]
  W -- AI binding --> ST[@cf Whisper]
  W -- AI binding --> LL[@cf Llama 3.3 70B]
  W -- AI binding --> TT[@cf Aura]
  W -- ws --> B
```

## Step 1 — `wrangler.jsonc`

```jsonc
{
  "name": "callsphere-cf-only",
  "main": "src/index.ts",
  "compatibility_date": "2026-05-01",
  "compatibility_flags": ["nodejs_compat"],
  "ai": { "binding": "AI" }
}
```

## Step 2 — Worker that upgrades to WebSocket

```typescript
type Env = { AI: Ai };

export default {
  async fetch(req: Request, env: Env): Promise {
    const url = new URL(req.url);
    if (url.pathname !== "/voice") return new Response("nf", { status: 404 });
    const upgrade = req.headers.get("Upgrade");
    if (upgrade !== "websocket") return new Response("ws only", { status: 400 });

```
const pair = new WebSocketPair();
const [client, server] = Object.values(pair) as [WebSocket, WebSocket];
server.accept();
handle(server, env);
return new Response(null, { status: 101, webSocket: client });
```

},
};
```

## Step 3 — Receive audio + run Whisper

Workers AI Whisper accepts an `audio` array (Uint8 of WAV/Opus/raw):

```typescript
async function handle(ws: WebSocket, env: Env) {
  const buffer: number[] = [];
  let history: { role: string; content: string }[] = [
    { role: "system", content: "You are CallSphere on Cloudflare. Reply in 1-2 sentences." },
  ];

ws.addEventListener("message", async (e) => {
    if (typeof e.data === "string") {
      if (e.data === "flush") await transcribeAndReply(ws, env, buffer, history);
      return;
    }
    const u8 = new Uint8Array(e.data as ArrayBuffer);
    for (const b of u8) buffer.push(b);
  });
}
```

```typescript
async function transcribeAndReply(
  ws: WebSocket, env: Env, buffer: number[],
  history: { role: string; content: string }[]
) {
  const audio = Array.from(buffer);
  buffer.length = 0;
  const stt = await env.AI.run("@cf/openai/whisper-large-v3-turbo", { audio });
  const text = (stt as any).text as string;
  if (!text || text.length < 2) return;

history.push({ role: "user", content: text });
  ws.send(JSON.stringify({ type: "transcript", role: "user", text }));
```

## Step 4 — LLM with Llama 3.3 70B

```typescript
  const llm = await env.AI.run("@cf/meta/llama-3.3-70b-instruct", {
    messages: history,
    max_tokens: 200,
  });
  const reply = (llm as any).response as string;
  history.push({ role: "assistant", content: reply });
  ws.send(JSON.stringify({ type: "transcript", role: "assistant", text: reply }));
```

## Step 5 — TTS with Aura, stream chunks back

```typescript
  const tts = await env.AI.run("@cf/deepgram/aura-1", {
    text: reply,
    speaker: "asteria-en",
    encoding: "linear16",
    sample_rate: 16000,
  });
  // tts is a ReadableStream
  const reader = (tts as ReadableStream).getReader();
  for (;;) {
    const { value, done } = await reader.read();
    if (done) break;
    ws.send(value);
  }
}
```

## Step 6 — Browser client (16kHz mic, AudioWorklet)

```html

```

## Common pitfalls

- **Whisper expects an array, not a typed array** — `Array.from`.
- **Aura sample-rate mismatch** — match client `AudioContext` rate (16k or 24k).
- **Worker CPU cap** — large LLM calls run as async `AI.run`; CPU is fine.
- **Audio buffer leaks across sessions** — reset on each `flush`.

## How CallSphere does this in production

We use Cloudflare Workers AI for `/llms-full.txt` rendering and lightweight FAQ agents on landing pages — see [/lp/healthcare](/lp/healthcare) and [/lp/salon](/lp/salon). For full call routing, our 24/7 voice plane stays on dedicated GPUs (37 agents, 6 verticals, 90+ tools, HIPAA + SOC 2). Pricing on [/pricing](/pricing); 14-day [trial](/trial); 22% [affiliate](/affiliate).

## FAQ

**Cost?** Workers AI is per-neuron; ~$0.003 per voice round-trip (Whisper + Llama + Aura).

**Quality vs OpenAI?** Llama 3.3 70B holds its own for short replies; long agentic chains favor GPT-4o.

**Latency?** ~700–900ms end-to-end on the same colo.

**Can I add my own model?** Yes — `@cf/custom/...` via Workers AI Custom Models.

**Persistence?** Pair with Durable Objects (see post #8) for chat history.

## Sources

- [Cloudflare Workers AI models](https://developers.cloudflare.com/workers-ai/models/)
- [Cloudflare blog — voice agents](https://blog.cloudflare.com/voice-agents/)
- [Cloudflare blog — best place for realtime voice](https://blog.cloudflare.com/cloudflare-realtime-voice-ai/)
- [Whisper-large-v3-turbo on Workers AI](https://developers.cloudflare.com/workers-ai/models/whisper-large-v3-turbo/)

---

Source: https://callsphere.ai/blog/vw2h-build-voice-agent-cloudflare-workers-ai-no-external-llm