Build an AI Voice Agent on Hono + OpenAI Realtime in TypeScript (2026)
Wire Hono's WebSocket helpers, the OpenAI Realtime API, and Bun runtime into a sub-700ms voice agent. Real TypeScript code, deploy targets, and pitfalls.
TL;DR — Hono ships a one-file WebSocket relay between a browser and the OpenAI Realtime API. With
gpt-realtime($32/M audio-in, $64/M audio-out as of late 2025) you can hit ~600-800ms voice-to-voice on a single Bun process. Hono's edge-friendly routing means the same code runs on Cloudflare Workers, Vercel Edge, Deno Deploy, or Node 22.
What you'll build
A TypeScript backend that serves a static HTML mic page and exposes a /realtime WebSocket. Browser audio (PCM16 24kHz) is forwarded to OpenAI Realtime; model audio + transcripts are streamed back. Tool calls (e.g. book_appointment) are handled server-side and the result is fed back into the same session.
Prerequisites
- Bun 1.3+ or Node 22+,
hono@^4.6,@hono/node-ws(Node) or built-in Bun WS. - OpenAI key with Realtime access (
gpt-realtimeGA from Aug 2025). - Browser that supports
getUserMedia(Chrome 120+, Safari 17+).
Architecture
flowchart LR
BR[Browser mic] -- WS PCM16 --> H[Hono /realtime]
H -- WS gpt-realtime --> OA[OpenAI Realtime API]
OA -- audio.delta --> H --> BR
OA -- response.function_call --> H
H -- tool result --> OA
Step 1 — Hono server scaffold
```ts import { Hono } from "hono"; import { upgradeWebSocket } from "hono/bun";
const app = new Hono(); const OPENAI_WS = "wss://api.openai.com/v1/realtime?model=gpt-realtime";
app.get("/", (c) => c.html(<script type="module" src="/client.js"></script>));
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
app.get(
"/realtime",
upgradeWebSocket(() => ({
onOpen: (e, ws) => {
const oa = new WebSocket(OPENAI_WS, {
headers: {
Authorization: Bearer ${process.env.OPENAI_API_KEY},
"OpenAI-Beta": "realtime=v1",
},
} as any);
(ws as any).oa = oa;
oa.onmessage = (m) => ws.send(m.data);
},
onMessage: (e, ws) => (ws as any).oa?.send(e.data),
onClose: (, ws) => (ws as any).oa?.close(),
})),
);
export default { port: 8787, fetch: app.fetch, websocket: { /* bun ws */ } }; ```
Step 2 — Configure session with VAD + tools
```ts oa.onopen = () => oa.send(JSON.stringify({ type: "session.update", session: { voice: "alloy", input_audio_transcription: { model: "gpt-4o-mini-transcribe" }, turn_detection: { type: "server_vad", threshold: 0.55 }, tools: [{ type: "function", name: "book_appointment", description: "Book a slot", parameters: { type: "object", properties: { iso: { type: "string" } } } }], } })); ```
Step 3 — Browser PCM capture
```ts
const ctx = new AudioContext({ sampleRate: 24000 });
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
const src = ctx.createMediaStreamSource(stream);
await ctx.audioWorklet.addModule("/pcm-worklet.js");
const node = new AudioWorkletNode(ctx, "pcm");
src.connect(node);
const ws = new WebSocket(ws://${location.host}/realtime);
node.port.onmessage = (e) => ws.readyState === 1 && ws.send(JSON.stringify({
type: "input_audio_buffer.append",
audio: btoa(String.fromCharCode(...new Uint8Array(e.data)))
}));
```
Step 4 — Handle function calls server-side
```ts oa.onmessage = async (m) => { const evt = JSON.parse(m.data.toString()); if (evt.type === "response.function_call_arguments.done") { const args = JSON.parse(evt.arguments); const result = await db.book(args.iso); oa.send(JSON.stringify({ type: "conversation.item.create", item: { type: "function_call_output", call_id: evt.call_id, output: JSON.stringify(result) }, })); oa.send(JSON.stringify({ type: "response.create" })); } ws.send(m.data); // forward all events to browser }; ```
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Step 5 — Deploy
bun build --target=bun src/index.ts then fly deploy or wrangler deploy (Hono's WebSocket adapter ships for both). Add fly scale memory 512 and set OPENAI_API_KEY as a secret.
Pitfalls
- 24kHz vs 16kHz: Realtime expects PCM16 @ 24kHz — resampling at 16kHz produces robotic audio.
- No
commiton server VAD: Don't sendinput_audio_buffer.commitwhenturn_detection: server_vadis set; the model commits automatically. - Cloudflare Worker connect timeouts: WS to OpenAI sometimes exceeds 30s on idle — send a 25s keepalive ping.
- Auth in browser: Never expose your OpenAI key client-side; the relay is the auth boundary.
How CallSphere does this in production
CallSphere runs 37 production agents across 6 verticals with 90+ tools and 115+ Postgres tables. The Healthcare stack (FastAPI), OneRoof real-estate (Next.js 16 + React 19), Salon (NestJS 10 + Prisma), and Sales (Node.js 20 + React 18 + Vite) all share a Hono-based realtime relay that handles 1.2M concurrent voice minutes/month with ~720ms p95 voice-to-voice. Pricing is $149/$499/$1,499 with a 14-day no-card trial and a 22% recurring affiliate.
FAQ
Why Hono over Express? Hono is ~14kb, runs on every JS runtime, and has first-class WebSocket helpers for Bun, Node, Workers, and Deno without code changes.
Can I use Node instead of Bun? Yes — swap hono/bun for @hono/node-ws. Bun is ~2x faster on cold start.
What's the cost per minute? gpt-realtime is ~$0.06/min audio in + $0.24/min audio out — call it ~$0.20/min for typical voice agent traffic.
Does WebRTC work too? Yes. For browser-direct WebRTC, mint an ephemeral key via /v1/realtime/sessions and skip the relay entirely.
Sources
- OpenAI - Realtime API guide (gpt-realtime, WebRTC + WebSocket) - https://developers.openai.com/api/docs/guides/realtime
- OpenAI Agents SDK (TypeScript) - https://openai.github.io/openai-agents-js/
- Hono - WebSocket helpers - https://hono.dev/helpers/websocket
- ForaSoft - Production Voice Agents 2026 - https://www.forasoft.com/blog/article/openai-realtime-api-voice-agent-production-guide-2026
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.