---
title: "Build a Rust Voice Agent with axum, tokio, and a WebSocket Bridge"
description: "Stand up a production voice bridge in Rust: axum for routing, tokio-tungstenite for the OpenAI Realtime socket, and broadcast channels for fan-out. Real working code."
canonical: https://callsphere.ai/blog/vw2h-build-rust-voice-agent-axum-tokio-websocket-bridge
category: "AI Engineering"
tags: ["Tutorial", "Build", "Rust", "axum", "tokio", "WebSocket"]
author: "CallSphere Team"
published: 2026-03-18T00:00:00.000Z
updated: 2026-05-07T09:27:39.407Z
---

# Build a Rust Voice Agent with axum, tokio, and a WebSocket Bridge

> Stand up a production voice bridge in Rust: axum for routing, tokio-tungstenite for the OpenAI Realtime socket, and broadcast channels for fan-out. Real working code.

> **TL;DR** — Rust handles 2.5x more concurrent voice sessions than Go on the same hardware. axum + tokio-tungstenite is the cleanest path to a production voice bridge that fronts OpenAI Realtime from your own backend.

## What you'll build

A Rust HTTP server that accepts a browser WebSocket, opens a paired WebSocket to OpenAI Realtime, and pumps frames between them with backpressure-aware tokio channels. You'll add a simple keep-alive ping, base64 audio framing, and clean shutdown on either side dropping.

## Prerequisites

1. Rust 1.78+ (stable, 2024 edition).
2. `cargo add axum tokio tokio-tungstenite futures-util serde serde_json`.
3. `OPENAI_API_KEY` in env, with Realtime access.
4. Familiarity with `async/await` and tokio's `select!`.
5. A static frontend that POSTs PCM16 frames as base64 strings.

## Architecture

```mermaid
sequenceDiagram
  participant B as Browser
  participant R as Rust axum
  participant O as OpenAI Realtime
  B->>R: WS /voice
  R->>O: WS wss://api.openai.com/v1/realtime
  B->>R: input_audio_buffer.append
  R->>O: forward
  O-->>R: response.audio.delta
  R-->>B: forward
```

## Step 1 — Cargo.toml deps

```toml
[dependencies]
axum = { version = "0.7", features = ["ws"] }
tokio = { version = "1", features = ["full"] }
tokio-tungstenite = { version = "0.23", features = ["native-tls"] }
futures-util = "0.3"
serde = { version = "1", features = ["derive"] }
serde_json = "1"
```

## Step 2 — axum router with WebSocket upgrade

```rust
use axum::{extract::ws::{WebSocket, WebSocketUpgrade}, response::IntoResponse, routing::get, Router};

#[tokio::main]
async fn main() {
    let app = Router::new().route("/voice", get(voice_ws));
    let listener = tokio::net::TcpListener::bind("0.0.0.0:8080").await.unwrap();
    axum::serve(listener, app).await.unwrap();
}

async fn voice_ws(ws: WebSocketUpgrade) -> impl IntoResponse {
    ws.on_upgrade(handle_socket)
}
```

## Step 3 — Open the OpenAI socket

`tokio-tungstenite` lets you set custom headers via a request builder, which Realtime requires for the `OpenAI-Beta` header.

```rust
use tokio_tungstenite::{connect_async, tungstenite::handshake::client::Request};

async fn open_openai() -> tokio_tungstenite::WebSocketStream {
    let key = std::env::var("OPENAI_API_KEY").unwrap();
    let req = Request::builder()
        .uri("wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2025-06-03")
        .header("Authorization", format!("Bearer {}", key))
        .header("OpenAI-Beta", "realtime=v1")
        .body(())
        .unwrap();
    let (ws, _) = connect_async(req).await.expect("openai connect");
    ws
}
```

## Step 4 — Bidirectional pump with select!

This is the core of the bridge. Two streams, two sinks, one `select!`. Whichever side speaks first wins the loop iteration; the other side waits.

```rust
use futures_util::{SinkExt, StreamExt};
use axum::extract::ws::Message as AxMsg;
use tokio_tungstenite::tungstenite::Message as OaMsg;

async fn handle_socket(socket: WebSocket) {
    let (mut bx_tx, mut bx_rx) = socket.split();
    let oai = open_openai().await;
    let (mut oai_tx, mut oai_rx) = oai.split();

```
loop {
    tokio::select! {
        Some(Ok(msg)) = bx_rx.next() => {
            if let AxMsg::Text(t) = msg {
                let _ = oai_tx.send(OaMsg::Text(t)).await;
            }
        }
        Some(Ok(msg)) = oai_rx.next() => {
            if let OaMsg::Text(t) = msg {
                let _ = bx_tx.send(AxMsg::Text(t)).await;
            }
        }
        else => break,
    }
}
```

}
```

## Step 5 — Inject your system prompt on connect

```rust
let session_update = serde_json::json!({
    "type": "session.update",
    "session": {
        "instructions": "You are CallSphere's Rust-backed voice agent.",
        "voice": "alloy",
        "input_audio_format": "pcm16",
        "output_audio_format": "pcm16",
        "turn_detection": {"type":"server_vad","threshold":0.5}
    }
});
oai_tx.send(OaMsg::Text(session_update.to_string())).await.unwrap();
```

## Step 6 — Heartbeat and graceful shutdown

Without pings, idle proxies (CloudFront, Cloudflare) will reap the connection at 60s.

```rust
let mut tick = tokio::time::interval(std::time::Duration::from_secs(20));
loop {
    tokio::select! {
        _ = tick.tick() => { let _ = bx_tx.send(AxMsg::Ping(vec![])).await; }
        // ...other arms
    }
}
```

## Common pitfalls

- **Forgetting `OpenAI-Beta` header.** Connection 401s with no helpful error.
- **Unbounded channels.** Use `mpsc` with `buffer(64)` so a slow client doesn't OOM the server.
- **Mixed message types.** Don't pass binary frames through if both sides expect text — Realtime base64 encodes audio.
- **Panicking in spawn.** Wrap with `tokio::spawn(async move { let _ = ... })` and log errors.

## How CallSphere does this in production

CallSphere's healthcare agent runs a Rust admission-router that sits in front of FastAPI :8084 voice workers. The router authenticates HIPAA-scoped JWTs, applies tenant rate limits, and dispatches to one of 37 specialised agents across 6 verticals — all backed by 115+ Postgres tables. See [pricing](/pricing) — $149/$499/$1499 with a 14-day trial.

## FAQ

**Why not just use Node.js?** Rust holds ~10MB RSS per session vs ~80MB on Node, and it doesn't GC-stutter mid-call.

**Can I terminate WebRTC in Rust?** Yes, with `webrtc-rs`, but it's heavier; for a Realtime bridge, WebSocket is enough.

**What about TLS?** Use `rustls` behind nginx, or terminate at Cloudflare.

**Does axum 0.7 still ship?** Yes, plus 0.8 — the API is stable; pinning 0.7 is fine for prod.

**How many sessions per core?** ~2k idle, ~600 with active audio at 50kbps each.

## Sources

- [axum docs](https://docs.rs/axum/latest/axum/)
- [tokio-tungstenite](https://docs.rs/tokio-tungstenite)
- [Rust WebSocket Guide (websocket.org)](https://websocket.org/guides/languages/rust/)
- [OpenAI Realtime WebSocket](https://developers.openai.com/api/docs/guides/realtime-websocket)

---

Source: https://callsphere.ai/blog/vw2h-build-rust-voice-agent-axum-tokio-websocket-bridge
