By Sagar Shankaran, Founder of CallSphere
How to run FastAPI WebSockets in production for AI voice agents: connection managers, the broadcast bug everyone hits, JWT auth on upgrade, and Uvicorn worker tuning.
Key takeaways
One synchronous database call blocks every WebSocket connection on the worker. That is the bug that takes FastAPI WebSocket deployments down at 2 a.m. on day 17.
flowchart TD
Client[Client] --> Edge[Cloudflare Worker]
Edge -->|WS upgrade| DO[Durable Object]
DO --> AI[(OpenAI Realtime WS)]
AI --> DO
DO --> Client
DO -.hibernation.-> Storage[(Persisted state)]Because FastAPI's superpower — async request handling — is also its sharpest knife. A single worker can serve thousands of concurrent WebSocket connections on cooperative multitasking. The moment any handler blocks the event loop, every other client on that worker freezes. So the production pattern is not "make WebSockets work" but "audit every line in a WebSocket handler for accidental blocking."
The good news: when done right, FastAPI WebSockets are lighter and more debuggable than Node.js equivalents, with full access to Python's audio/ML stack and first-class async DB drivers like asyncpg and motor.
A correct FastAPI WebSocket service has six pieces:
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
user_id → WebSocket in memory plus Redis.min(2 × CPU, 4) because each worker holds independent connection state.Skip any one of these and you ship a service that works in dev and corrupts in prod.
The CallSphere Healthcare voice agent runs on FastAPI port 8084, accepting OpenAI Realtime API connections over WebSocket. The handler is roughly 280 lines. We use:
websocket.accept().session_id plus Redis for cross-worker fan-out.asyncpg for the audit log so the event loop never blocks on Postgres.That FastAPI service is one of 115+ database tables and 90+ tools coordinated across the platform, with HIPAA controls applied to every span.
from fastapi import WebSocket
import asyncio
class ConnectionManager:
def __init__(self) -> None:
self.active: dict[str, WebSocket] = {}
async def connect(self, sid: str, ws: WebSocket) -> None:
await ws.accept()
self.active[sid] = ws
async def broadcast(self, payload: dict) -> None:
dead = []
for sid, ws in list(self.active.items()):
try:
await ws.send_json(payload)
except Exception:
dead.append(sid)
for sid in dead:
self.active.pop(sid, None)
async def for every WebSocket handler. Sync handlers block the loop and there is no warning.Depends before websocket.accept(). Reject on the upgrade, not after.--workers 2 --loop uvloop --http httptools for ~3× throughput.asyncio.create_task for heartbeats; pong timeouts go to the same broadcast cleanup path as send failures.asyncio.Queue between your audio receive task and your transcription task to apply backpressure.starlette.testclient.TestClient — it supports WebSockets without a running server.Why does broadcast hang for 30 seconds? A client went silent without disconnecting and your send_json is waiting on the TCP buffer. Use a per-send timeout (asyncio.wait_for) and treat the timeout as a dead socket.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Can I share a single asyncpg pool? Yes — that is the correct pattern. Just never pass connections across tasks; acquire and release per message.
How do I scale across pods? Add Redis pub/sub. Each pod publishes outbound events; every pod subscribes and local-fans-out to its connections.
Should I use websockets library or Starlette's? Use Starlette's (FastAPI's built-in). It integrates with Depends, middleware, and OpenAPI tooling.
Do I need sticky sessions? No, if you use Redis pub/sub. Yes, only if you keep per-session state purely in worker memory.
CallSphere ships 37 agents on FastAPI + Node services. Try the 14-day free trial, or join the affiliate program to refer accounts at $149/$499/$1499.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
A clean before/after of agent architecture in 2026. The control loop moved from your framework code into the model's reasoning chain. What that looks like.
Google's May 2026 MCP 1.0 + A2A developers guide is the cleanest protocol picker we have seen. The takeaways, in plain English, with a CallSphere lens.
Workspace Studio puts a Gemini-powered AI agent builder inside Google Workspace. A walkthrough of what it does, who it is for, and where it fits in 2026.
How to actually observe a WebSocket fleet: ping/pong heartbeats, Prometheus metrics that matter, dead-man switches, and the alerts that fire before customers notice.
Gemini 3.1 Ultra ships with a 2-million token context window and full text, image, audio, and video multimodality. What changes and how to build for it.
How the modern agent eval stack actually flows: instrument, trace, dataset, evaluator, score, CI gate. The full pipeline that keeps agents from regressing.
© 2026 CallSphere LLC. All rights reserved.