TL;DR — Cloudflare Workers AI ships Whisper, Llama 3.3 70B, and Deepgram Aura behind one AI binding. Build a voice agent with zero external API keys, zero per-token surprise bills, and global edge co-location for free.

What you'll build

A Worker that takes a WebSocket of PCM16 audio frames, transcribes via @cf/openai/whisper-large-v3-turbo, generates a reply via @cf/meta/llama-3.3-70b-instruct, synthesizes via @cf/deepgram/aura-1, and streams audio back. End-to-end on the Cloudflare edge.

Prerequisites

Cloudflare account with Workers Paid ($5/mo) and Workers AI access.
wrangler 4+.
npm i agents (the Cloudflare Agents SDK).
A static client that records 16kHz PCM via AudioWorklet.
Familiarity with TypeScript.

Architecture

flowchart LR
  B[Browser PCM16] -- ws --> W[Worker]
  W -- AI binding --> ST[@cf Whisper]
  W -- AI binding --> LL[@cf Llama 3.3 70B]
  W -- AI binding --> TT[@cf Aura]
  W -- ws --> B

Step 1 — `wrangler.jsonc`

```jsonc { "name": "callsphere-cf-only", "main": "src/index.ts", "compatibility_date": "2026-05-01", "compatibility_flags": ["nodejs_compat"], "ai": { "binding": "AI" } } ```

Step 2 — Worker that upgrades to WebSocket

```typescript type Env = { AI: Ai };

export default { async fetch(req: Request, env: Env): Promise { const url = new URL(req.url); if (url.pathname !== "/voice") return new Response("nf", { status: 404 }); const upgrade = req.headers.get("Upgrade"); if (upgrade !== "websocket") return new Response("ws only", { status: 400 });

const pair = new WebSocketPair();
const [client, server] = Object.values(pair) as [WebSocket, WebSocket];
server.accept();
handle(server, env);
return new Response(null, { status: 101, webSocket: client });

}, }; ```

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

Step 3 — Receive audio + run Whisper

Workers AI Whisper accepts an audio array (Uint8 of WAV/Opus/raw):

```typescript async function handle(ws: WebSocket, env: Env) { const buffer: number[] = []; let history: { role: string; content: string }[] = [ { role: "system", content: "You are CallSphere on Cloudflare. Reply in 1-2 sentences." }, ];

ws.addEventListener("message", async (e) => { if (typeof e.data === "string") { if (e.data === "flush") await transcribeAndReply(ws, env, buffer, history); return; } const u8 = new Uint8Array(e.data as ArrayBuffer); for (const b of u8) buffer.push(b); }); } ```

```typescript async function transcribeAndReply( ws: WebSocket, env: Env, buffer: number[], history: { role: string; content: string }[] ) { const audio = Array.from(buffer); buffer.length = 0; const stt = await env.AI.run("@cf/openai/whisper-large-v3-turbo", { audio }); const text = (stt as any).text as string; if (!text || text.length < 2) return;

history.push({ role: "user", content: text }); ws.send(JSON.stringify({ type: "transcript", role: "user", text })); ```

Step 4 — LLM with Llama 3.3 70B

```typescript const llm = await env.AI.run("@cf/meta/llama-3.3-70b-instruct", { messages: history, max_tokens: 200, }); const reply = (llm as any).response as string; history.push({ role: "assistant", content: reply }); ws.send(JSON.stringify({ type: "transcript", role: "assistant", text: reply })); ```

Step 5 — TTS with Aura, stream chunks back

```typescript const tts = await env.AI.run("@cf/deepgram/aura-1", { text: reply, speaker: "asteria-en", encoding: "linear16", sample_rate: 16000, }); // tts is a ReadableStream const reader = (tts as ReadableStream).getReader(); for (;;) { const { value, done } = await reader.read(); if (done) break; ws.send(value); } } ```

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Step 6 — Browser client (16kHz mic, AudioWorklet)

```html

```

Common pitfalls

Whisper expects an array, not a typed array — Array.from.
Aura sample-rate mismatch — match client AudioContext rate (16k or 24k).
Worker CPU cap — large LLM calls run as async AI.run; CPU is fine.
Audio buffer leaks across sessions — reset on each flush.

How CallSphere does this in production

We use Cloudflare Workers AI for /llms-full.txt rendering and lightweight FAQ agents on landing pages — see /lp/healthcare and /lp/salon. For full call routing, our 24/7 voice plane stays on dedicated GPUs (37 agents, 6 verticals, 90+ tools, HIPAA + SOC 2). Pricing on /pricing; 14-day trial; 22% affiliate.

FAQ

Cost? Workers AI is per-neuron; ~$0.003 per voice round-trip (Whisper + Llama + Aura).

Quality vs OpenAI? Llama 3.3 70B holds its own for short replies; long agentic chains favor GPT-4o.

Latency? ~700–900ms end-to-end on the same colo.

Can I add my own model? Yes — @cf/custom/... via Workers AI Custom Models.

Persistence? Pair with Durable Objects (see post #8) for chat history.

Build a Voice Agent on Cloudflare Workers AI (No External LLM)

What you'll build

Prerequisites

Architecture

Step 1 — `wrangler.jsonc`

Step 2 — Worker that upgrades to WebSocket

Step 3 — Receive audio + run Whisper

Step 4 — LLM with Llama 3.3 70B

Step 5 — TTS with Aura, stream chunks back

Step 6 — Browser client (16kHz mic, AudioWorklet)

Common pitfalls

How CallSphere does this in production

FAQ

Sources

Try CallSphere AI Voice Agents

Related Articles You May Like

Build a Chat Agent with Haystack RAG + Open LLM (Llama 3.2, 2026)

How to Build Voice Agent CI/CD with Evals as Gate (GitHub Actions)

Llama Guard 4: Open Safety for Open Models — Builder Brief

Build a CallSphere-Style Outbound Voice Campaign Tool

Code Llama 70B Refresh vs Codestral 25.05 — North Carolina Edition

Build a CallSphere-Style Multi-Agent for HVAC Dispatch

What you'll build

Prerequisites

Architecture

Step 1 — wrangler.jsonc

Step 2 — Worker that upgrades to WebSocket

Step 3 — Receive audio + run Whisper

Step 4 — LLM with Llama 3.3 70B

Step 5 — TTS with Aura, stream chunks back

Step 6 — Browser client (16kHz mic, AudioWorklet)

Common pitfalls

How CallSphere does this in production

FAQ

Sources

Try CallSphere AI Voice Agents

Related Articles You May Like

Build a Chat Agent with Haystack RAG + Open LLM (Llama 3.2, 2026)

How to Build Voice Agent CI/CD with Evals as Gate (GitHub Actions)

Llama Guard 4: Open Safety for Open Models — Builder Brief

Build a CallSphere-Style Outbound Voice Campaign Tool

Code Llama 70B Refresh vs Codestral 25.05 — North Carolina Edition

Build a CallSphere-Style Multi-Agent for HVAC Dispatch

Step 1 — `wrangler.jsonc`