By Sagar Shankaran, Founder of CallSphere
Build a voice agent that runs entirely on Cloudflare's edge: Calls SFU for WebRTC, withVoice mixin for STT/TTS, Workers AI for inference. No external infra, sub-300ms hops.
Key takeaways
TL;DR — Cloudflare's Agents SDK now ships
withVoice, a mixin that adds STT (Deepgram Nova/Flux), sentence chunking, TTS (Aura), and conversation persistence to a regular Agent class. Combined with Cloudflare Calls (their WebRTC SFU) and Workers AI, you get an end-to-end voice agent on the same edge network — no external API keys for the happy path.
A @cloudflare/voice-powered Agent deployed to Workers, with audio transported over WebSocket from a browser client. The agent uses Workers AI's Llama 3.3 70B for reasoning, Deepgram Flux for streaming STT, and Aura for TTS — all bound natively. Optional: hand the WebSocket to Cloudflare Calls for browser-to-browser group voice rooms with the AI as a participant.
wrangler@4 CLI authenticated.flowchart LR
B[Browser React] -->|wss| AGT[Worker Agent withVoice]
AGT -->|STT| DG[Deepgram Flux Workers AI]
AGT -->|LLM| WA[Workers AI Llama 3.3 70B]
AGT -->|TTS| AU[Deepgram Aura]
AGT --> B
B <-->|WebRTC SFU| CALLS[Cloudflare Calls]
```bash npm create cloudflare@latest voice-agent -- --type=workers-ai cd voice-agent npm install @cloudflare/voice agents ```
```ts // src/agent.ts import { Agent } from "agents"; import { withVoice } from "@cloudflare/voice";
export class Receptionist extends withVoice(Agent) { async onChatMessage(message: string) { const reply = await this.env.AI.run("@cf/meta/llama-3.3-70b-instruct-fp8-fast", { messages: [ { role: "system", content: "You are a friendly receptionist. Keep replies short." }, { role: "user", content: message } ] }); return reply.response; } } ```
The withVoice mixin handles audio in/out automatically; you only implement onChatMessage(text).
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
```toml name = "voice-agent" main = "src/index.ts" compatibility_date = "2026-05-01"
[ai] binding = "AI"
[[durable_objects.bindings]] name = "RECEPTIONIST" class_name = "Receptionist"
[[migrations]] tag = "v1" new_sqlite_classes = ["Receptionist"] ```
```ts import { Receptionist } from "./agent"; export { Receptionist }; export default { async fetch(req: Request, env: Env) { const url = new URL(req.url); if (url.pathname === "/voice") { const id = env.RECEPTIONIST.idFromName(url.searchParams.get("session") ?? crypto.randomUUID()); return env.RECEPTIONIST.get(id).fetch(req); } return new Response("ok"); } }; ```
```tsx import { VoiceClient } from "@cloudflare/voice/client"; const client = new VoiceClient({ url: `wss://voice-agent.you.workers.dev/voice?session=${crypto.randomUUID()}`, inputSampleRate: 16000, outputSampleRate: 24000, }); await client.start(); ```
The client handles getUserMedia, AudioWorklet capture, WS framing, and Web Audio playback.
In Receptionist.constructor:
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
```ts super(state, env, { voice: { stt: { provider: "deepgram-flux", model: "flux", language: "en" }, tts: { provider: "deepgram-aura", voice: "aura-asteria-en" }, vad: { silenceThresholdMs: 500 } } }); ```
If you want the AI to join a multi-party WebRTC room (caller + AI + supervisor), use Cloudflare Calls' SFU API to publish/subscribe tracks. The agent's withVoice audio output becomes a track published into the SFU room; humans subscribe via standard WebRTC.
fetch("https://rtc.live.cloudflare.com/v1/apps/{appId}/sessions/new", { method: "POST", headers: { Authorization: \Bearer ${env.CALLS_TOKEN}` } })`
withVoice mixin runs inside a Durable Object so this isn't a problem, but plain Workers wouldn't work.Promise.race against a smaller model (8B) for first-token TTS.Nova is batch and adds 200ms.withVoice expects PCM16 mono; resample on the client if your mic capture differs.CallSphere doesn't deploy the voice path on Cloudflare Workers because our HIPAA Healthcare vertical needs Postgres-resident audit logs that fit our 115+ table schema. We do use Cloudflare in front of FastAPI :8084 as a CDN + DDoS layer, and we've experimented with Workers AI for the OneRoof multi-family vertical's chat fallback. 37 voice agents, 90+ tools, 6 verticals, $149/$499/$1499, 14-day trial, 22% affiliate.
Q: How do I add my own STT/TTS provider?
withVoice accepts stt: { provider: "custom", run: async (pcm) => string }. Plug in OpenAI Whisper, Azure, anything.
Q: Can I use Workers AI alone without Deepgram?
Yes — Workers AI ships @cf/openai/whisper-large-v3-turbo for STT and @cf/myshell-ai/melotts for TTS. Quality is lower than Deepgram on Aura but free under the binding.
Q: Latency target? ~500-700ms voice-to-voice on Cloudflare's network because everything is colocated. The win vs other clouds is no inter-service hops.
Q: PSTN?
Twilio Media Streams as a SIP frontend; the Worker handles the Media Stream WS and translates frames to withVoice's expected format.
Q: Cost?
Workers AI Llama 3.3 70B is $0.4 per 1M input tokens, $0.6 per 1M output. Deepgram Flux STT is included in the @cloudflare/voice binding pricing — call it $0.005/min STT + $0.012/min TTS + $0.02/min LLM = ~$0.04/min all-in.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
Haystack 2.7's Agent component plus an Ollama-served Llama 3.2 gives you tool-calling RAG with citations. Here's a complete pipeline against your own document store.
Run STT, LLM, and TTS entirely on Cloudflare's edge — no OpenAI, no ElevenLabs. Real working code with Whisper, Llama 3.3 70B, and Deepgram Aura.
On May 4 2026 OpenAI published its Realtime stack rebuild — split-relay plus transceiver edge. Here is what changed and what it means for production voice agents.
Version your prompts in git, run a 50-case eval suite on every PR, block merges below threshold, and ship a new agent prompt with confidence — full GitHub Actions tutorial.
Replace expensive outbound SDR tooling with a self-hosted dialer that runs OpenAI Realtime agents at 100 concurrent calls. Full architecture and code.
Each Cloudflare agent runs on a Durable Object with its own SQLite, WebSockets, and scheduling. Agents Week 2026 shipped MCP, Code Mode, and 10GB SQLite per agent.
© 2026 CallSphere LLC. All rights reserved.
Watch how CallSphere handles real customer calls, schedules appointments, and processes payments — live.
Try Live DemoBook a DemoCalculate Your ROI