By Sagar Shankaran, Founder of CallSphere
Stream microphone audio from a browser to FastAPI, fan out to OpenAI Realtime over WebSocket, and play model audio back — full Python tutorial with PCM16 24kHz.
Key takeaways
TL;DR — FastAPI's
websocketsroute fits naturally between a browser microphone and OpenAI Realtime. Use PCM16 at 24kHz, run two async tasks per session, and you get a clean speech-to-speech loop in ~120 lines of Python.
A FastAPI server that accepts a browser WebSocket carrying PCM16 24kHz audio chunks, forwards them to OpenAI Realtime, and streams model audio deltas back. A simple HTML page captures the microphone, downsamples to 24kHz Int16, and plays the response through the Web Audio API. End-to-end latency: 700–1100ms.
pip install fastapi uvicorn websockets.OPENAI_API_KEY exported in your shell.pip install python-dotenv for env loading.flowchart LR
Mic[Browser Mic Float32] --> DS[Downsample 24kHz Int16]
DS -- WS --> FA[FastAPI /ws]
FA -- WS --> OA[OpenAI Realtime]
OA -- audio.delta --> FA
FA -- WS --> AP[AudioPlayer Web Audio]
```python
import os, json, asyncio, base64, websockets from fastapi import FastAPI, WebSocket from fastapi.responses import HTMLResponse
app = FastAPI() OPENAI_URL = "wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2025-06-03" HEADERS = { "Authorization": f"Bearer {os.environ['OPENAI_API_KEY']}", "OpenAI-Beta": "realtime=v1", } ```
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
```python SESSION = { "type": "session.update", "session": { "voice": "alloy", "input_audio_format": "pcm16", "output_audio_format": "pcm16", "input_audio_transcription": {"model": "whisper-1"}, "turn_detection": {"type": "server_vad", "threshold": 0.55, "prefix_padding_ms": 300, "silence_duration_ms": 500}, "instructions": "You are a concise voice assistant. Reply in 1-2 short sentences." } } ```
Use asyncio.gather so each direction runs independently. Don't await one before pumping the other — that's how you get echo and choppy audio.
```python @app.websocket("/ws") async def ws(client: WebSocket): await client.accept() async with websockets.connect(OPENAI_URL, additional_headers=HEADERS) as oai: await oai.send(json.dumps(SESSION))
async def client_to_oai():
try:
while True:
chunk = await client.receive_bytes() # raw int16 PCM
await oai.send(json.dumps({
"type": "input_audio_buffer.append",
"audio": base64.b64encode(chunk).decode(),
}))
except Exception:
pass
async def oai_to_client():
async for raw in oai:
ev = json.loads(raw)
if ev["type"] == "response.audio.delta":
pcm = base64.b64decode(ev["delta"])
await client.send_bytes(pcm)
elif ev["type"] == "response.audio_transcript.done":
await client.send_text(json.dumps({"role": "assistant",
"text": ev["transcript"]}))
await asyncio.gather(client_to_oai(), oai_to_client())
```
```html
```
```js ws.onmessage = (e) => { if (typeof e.data === "string") return; // transcript const i16 = new Int16Array(e.data); const f32 = new Float32Array(i16.length); for (let i = 0; i < i16.length; i++) f32[i] = i16[i] / 0x7fff; const buf = ctx.createBuffer(1, f32.length, 24000); buf.copyToChannel(f32, 0); const s = ctx.createBufferSource(); s.buffer = buf; s.connect(ctx.destination); s.start(); }; ```
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
```bash uvicorn app:app --host 0.0.0.0 --port 8000 --reload ```
Open http://localhost:8000 (serve the HTML separately or mount it on FastAPI), grant mic access, and start talking. The model should reply within ~1 second.
new AudioContext({ sampleRate: 24000 }).Int16Array then send .buffer.asyncio.gather.additional_headers vs extra_headers: depends on your websockets lib version (>=12 uses additional_headers).CallSphere's Healthcare line uses this exact PCM16 24kHz pattern with server VAD at 0.55 — chosen because clinicians often pause mid-sentence and a stricter threshold cuts them off. After each call we run a post-call analytics job that scores sentiment (–1.0 to 1.0) and lead intent (0–100) from the transcript. The Salon vertical adds 4 specialist agents and ElevenLabs voices with GB-YYYYMMDD-### booking refs. See it live or start a trial.
Why PCM16 24kHz instead of mu-law? Browsers can't encode mu-law cheaply, but PCM16 is one downsample step away from getUserMedia output. Mu-law is for telephony.
Can I use asyncio.create_task? Yes, but gather cancels both on exception, which is what you want.
How do I add streaming text output? Subscribe to response.audio_transcript.delta and forward strings — useful for live captions.
Production hosting? Deploy to Fly.io or k3s. Keep one process per region; FastAPI scales horizontally just fine.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
Haystack 2.7's Agent component plus an Ollama-served Llama 3.2 gives you tool-calling RAG with citations. Here's a complete pipeline against your own document store.
Run STT, LLM, and TTS entirely on Cloudflare's edge — no OpenAI, no ElevenLabs. Real working code with Whisper, Llama 3.3 70B, and Deepgram Aura.
Step-by-step build of a working agent with the OpenAI Agents SDK — Agent class, tools, handoffs, tracing — plus an eval pipeline that catches regressions before merge.
On May 4 2026 OpenAI published its Realtime stack rebuild — split-relay plus transceiver edge. Here is what changed and what it means for production voice agents.
Version your prompts in git, run a 50-case eval suite on every PR, block merges below threshold, and ship a new agent prompt with confidence — full GitHub Actions tutorial.
Replace expensive outbound SDR tooling with a self-hosted dialer that runs OpenAI Realtime agents at 100 concurrent calls. Full architecture and code.
© 2026 CallSphere LLC. All rights reserved.
Watch how CallSphere handles real customer calls, schedules appointments, and processes payments — live.
Try Live DemoBook a DemoCalculate Your ROI