By Sagar Shankaran, Founder of CallSphere
Axum websocket: stand up a production voice bridge in Rust: axum for routing, tokio-tungstenite for the OpenAI Realtime socket, and broadcast channels for fan-out. Real working code.
Key takeaways
TL;DR — Rust handles 2.5x more concurrent voice sessions than Go on the same hardware. axum + tokio-tungstenite is the cleanest path to a production voice bridge that fronts OpenAI Realtime from your own backend.
A Rust HTTP server that accepts a browser WebSocket, opens a paired WebSocket to OpenAI Realtime, and pumps frames between them with backpressure-aware tokio channels. You'll add a simple keep-alive ping, base64 audio framing, and clean shutdown on either side dropping.
cargo add axum tokio tokio-tungstenite futures-util serde serde_json.OPENAI_API_KEY in env, with Realtime access.async/await and tokio's select!.sequenceDiagram
participant B as Browser
participant R as Rust axum
participant O as OpenAI Realtime
B->>R: WS /voice
R->>O: WS wss://api.openai.com/v1/realtime
B->>R: input_audio_buffer.append
R->>O: forward
O-->>R: response.audio.delta
R-->>B: forward
```toml [dependencies] axum = { version = "0.7", features = ["ws"] } tokio = { version = "1", features = ["full"] } tokio-tungstenite = { version = "0.23", features = ["native-tls"] } futures-util = "0.3" serde = { version = "1", features = ["derive"] } serde_json = "1" ```
```rust use axum::{extract::ws::{WebSocket, WebSocketUpgrade}, response::IntoResponse, routing::get, Router};
#[tokio::main] async fn main() { let app = Router::new().route("/voice", get(voice_ws)); let listener = tokio::net::TcpListener::bind("0.0.0.0:8080").await.unwrap(); axum::serve(listener, app).await.unwrap(); }
async fn voice_ws(ws: WebSocketUpgrade) -> impl IntoResponse { ws.on_upgrade(handle_socket) } ```
tokio-tungstenite lets you set custom headers via a request builder, which Realtime requires for the OpenAI-Beta header.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
```rust use tokio_tungstenite::{connect_async, tungstenite::handshake::client::Request};
async fn open_openai() -> tokio_tungstenite::WebSocketStream<...> { let key = std::env::var("OPENAI_API_KEY").unwrap(); let req = Request::builder() .uri("wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2025-06-03") .header("Authorization", format!("Bearer {}", key)) .header("OpenAI-Beta", "realtime=v1") .body(()) .unwrap(); let (ws, _) = connect_async(req).await.expect("openai connect"); ws } ```
This is the core of the bridge. Two streams, two sinks, one select!. Whichever side speaks first wins the loop iteration; the other side waits.
```rust use futures_util::{SinkExt, StreamExt}; use axum::extract::ws::Message as AxMsg; use tokio_tungstenite::tungstenite::Message as OaMsg;
async fn handle_socket(socket: WebSocket) { let (mut bx_tx, mut bx_rx) = socket.split(); let oai = open_openai().await; let (mut oai_tx, mut oai_rx) = oai.split();
loop {
tokio::select! {
Some(Ok(msg)) = bx_rx.next() => {
if let AxMsg::Text(t) = msg {
let _ = oai_tx.send(OaMsg::Text(t)).await;
}
}
Some(Ok(msg)) = oai_rx.next() => {
if let OaMsg::Text(t) = msg {
let _ = bx_tx.send(AxMsg::Text(t)).await;
}
}
else => break,
}
}
} ```
```rust let session_update = serde_json::json!({ "type": "session.update", "session": { "instructions": "You are CallSphere's Rust-backed voice agent.", "voice": "alloy", "input_audio_format": "pcm16", "output_audio_format": "pcm16", "turn_detection": {"type":"server_vad","threshold":0.5} } }); oai_tx.send(OaMsg::Text(session_update.to_string())).await.unwrap(); ```
Without pings, idle proxies (CloudFront, Cloudflare) will reap the connection at 60s.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
```rust let mut tick = tokio::time::interval(std::time::Duration::from_secs(20)); loop { tokio::select! { _ = tick.tick() => { let _ = bx_tx.send(AxMsg::Ping(vec![])).await; } // ...other arms } } ```
OpenAI-Beta header. Connection 401s with no helpful error.mpsc with buffer(64) so a slow client doesn't OOM the server.tokio::spawn(async move { let _ = ... }) and log errors.CallSphere's healthcare agent runs a Rust admission-router that sits in front of FastAPI :8084 voice workers. The router authenticates HIPAA-scoped JWTs, applies tenant rate limits, and dispatches to one of 37 specialised agents across 6 verticals — all backed by 115+ Postgres tables. See pricing — $149/$499/$1499 with a 14-day trial.
Why not just use Node.js? Rust holds ~10MB RSS per session vs ~80MB on Node, and it doesn't GC-stutter mid-call.
Can I terminate WebRTC in Rust? Yes, with webrtc-rs, but it's heavier; for a Realtime bridge, WebSocket is enough.
What about TLS? Use rustls behind nginx, or terminate at Cloudflare.
Does axum 0.7 still ship? Yes, plus 0.8 — the API is stable; pinning 0.7 is fine for prod.
How many sessions per core? ~2k idle, ~600 with active audio at 50kbps each.
This guide is written for engineers and operators evaluating axum websocket in real production systems. Axum websocket sits alongside client disconnected, handler ws websocketupgrade, msg await.is err, send and receive, send messages in the daily work of teams shipping production AI. The notes below give a plain-language reference for terms used throughout the article.
For teams that want to ship axum websocket in voice and chat agents this quarter, CallSphere runs 37 agents and 90+ function tools across 6 verticals on a single dashboard. Start a 14-day trial, see live demo agents, or compare tiers on /pricing.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
Haystack 2.7's Agent component plus an Ollama-served Llama 3.2 gives you tool-calling RAG with citations. Here's a complete pipeline against your own document store.
Run STT, LLM, and TTS entirely on Cloudflare's edge — no OpenAI, no ElevenLabs. Real working code with Whisper, Llama 3.3 70B, and Deepgram Aura.
Version your prompts in git, run a 50-case eval suite on every PR, block merges below threshold, and ship a new agent prompt with confidence — full GitHub Actions tutorial.
Replace expensive outbound SDR tooling with a self-hosted dialer that runs OpenAI Realtime agents at 100 concurrent calls. Full architecture and code.
HVAC companies miss 40–60% of inbound. Build a 4-agent dispatch (intake, scheduling, parts, emergency) that integrates with ServiceTitan in 600 lines.
LangChain v1 + LangGraph v1 in JS, paired with Ollama, gives you a fully local chat agent with tools, memory, and structured output. No OpenAI key required.
© 2026 CallSphere LLC. All rights reserved.
Watch how CallSphere handles real customer calls, schedules appointments, and processes payments — live.
Try Live DemoBook a DemoCalculate Your ROI