---
title: "Build an AI Voice Agent on Hono + OpenAI Realtime in TypeScript (2026)"
description: "Wire Hono's WebSocket helpers, the OpenAI Realtime API, and Bun runtime into a sub-700ms voice agent. Real TypeScript code, deploy targets, and pitfalls."
canonical: https://callsphere.ai/blog/vw8h-build-ai-voice-agent-hono-openai-realtime-typescript-2026
category: "AI Voice Agents"
tags: ["Hono", "OpenAI Realtime", "Voice Agent", "TypeScript", "Bun"]
author: "CallSphere Team"
published: 2026-03-15T00:00:00.000Z
updated: 2026-05-07T22:23:14.100Z
---

# Build an AI Voice Agent on Hono + OpenAI Realtime in TypeScript (2026)

> Wire Hono's WebSocket helpers, the OpenAI Realtime API, and Bun runtime into a sub-700ms voice agent. Real TypeScript code, deploy targets, and pitfalls.

> **TL;DR** — Hono ships a one-file WebSocket relay between a browser and the OpenAI Realtime API. With `gpt-realtime` ($32/M audio-in, $64/M audio-out as of late 2025) you can hit ~600-800ms voice-to-voice on a single Bun process. Hono's edge-friendly routing means the same code runs on Cloudflare Workers, Vercel Edge, Deno Deploy, or Node 22.

## What you'll build

A TypeScript backend that serves a static HTML mic page and exposes a `/realtime` WebSocket. Browser audio (PCM16 24kHz) is forwarded to OpenAI Realtime; model audio + transcripts are streamed back. Tool calls (e.g. `book_appointment`) are handled server-side and the result is fed back into the same session.

## Prerequisites

1. Bun 1.3+ or Node 22+, `hono@^4.6`, `@hono/node-ws` (Node) or built-in Bun WS.
2. OpenAI key with Realtime access (`gpt-realtime` GA from Aug 2025).
3. Browser that supports `getUserMedia` (Chrome 120+, Safari 17+).

## Architecture

```mermaid
flowchart LR
  BR[Browser mic] -- WS PCM16 --> H[Hono /realtime]
  H -- WS gpt-realtime --> OA[OpenAI Realtime API]
  OA -- audio.delta --> H --> BR
  OA -- response.function_call --> H
  H -- tool result --> OA
```

## Step 1 — Hono server scaffold

```ts
import { Hono } from "hono";
import { upgradeWebSocket } from "hono/bun";

const app = new Hono();
const OPENAI_WS = "wss://api.openai.com/v1/realtime?model=gpt-realtime";

app.get("/", (c) => c.html(``));

app.get(
  "/realtime",
  upgradeWebSocket(() => ({
    onOpen: (*e, ws) => {
      const oa = new WebSocket(OPENAI_WS, {
        headers: {
          Authorization: `Bearer ${process.env.OPENAI_API_KEY}`,
          "OpenAI-Beta": "realtime=v1",
        },
      } as any);
      (ws as any).oa = oa;
      oa.onmessage = (m) => ws.send(m.data);
    },
    onMessage: (e, ws) => (ws as any).oa?.send(e.data),
    onClose: (*, ws) => (ws as any).oa?.close(),
  })),
);

export default { port: 8787, fetch: app.fetch, websocket: { /* bun ws */ } };
```

## Step 2 — Configure session with VAD + tools

```ts
oa.onopen = () => oa.send(JSON.stringify({
  type: "session.update",
  session: {
    voice: "alloy",
    input_audio_transcription: { model: "gpt-4o-mini-transcribe" },
    turn_detection: { type: "server_vad", threshold: 0.55 },
    tools: [{
      type: "function", name: "book_appointment",
      description: "Book a slot",
      parameters: { type: "object", properties: { iso: { type: "string" } } }
    }],
  }
}));
```

## Step 3 — Browser PCM capture

```ts
const ctx = new AudioContext({ sampleRate: 24000 });
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
const src = ctx.createMediaStreamSource(stream);
await ctx.audioWorklet.addModule("/pcm-worklet.js");
const node = new AudioWorkletNode(ctx, "pcm");
src.connect(node);
const ws = new WebSocket(`ws://${location.host}/realtime`);
node.port.onmessage = (e) => ws.readyState === 1 && ws.send(JSON.stringify({
  type: "input_audio_buffer.append",
  audio: btoa(String.fromCharCode(...new Uint8Array(e.data)))
}));
```

## Step 4 — Handle function calls server-side

```ts
oa.onmessage = async (m) => {
  const evt = JSON.parse(m.data.toString());
  if (evt.type === "response.function_call_arguments.done") {
    const args = JSON.parse(evt.arguments);
    const result = await db.book(args.iso);
    oa.send(JSON.stringify({
      type: "conversation.item.create",
      item: { type: "function_call_output", call_id: evt.call_id,
              output: JSON.stringify(result) },
    }));
    oa.send(JSON.stringify({ type: "response.create" }));
  }
  ws.send(m.data); // forward all events to browser
};
```

## Step 5 — Deploy

`bun build --target=bun src/index.ts` then `fly deploy` or `wrangler deploy` (Hono's WebSocket adapter ships for both). Add `fly scale memory 512` and set `OPENAI_API_KEY` as a secret.

## Pitfalls

- **24kHz vs 16kHz**: Realtime expects PCM16 @ 24kHz — resampling at 16kHz produces robotic audio.
- **No `commit` on server VAD**: Don't send `input_audio_buffer.commit` when `turn_detection: server_vad` is set; the model commits automatically.
- **Cloudflare Worker connect timeouts**: WS to OpenAI sometimes exceeds 30s on idle — send a 25s keepalive ping.
- **Auth in browser**: Never expose your OpenAI key client-side; the relay is the auth boundary.

## How CallSphere does this in production

CallSphere runs **37 production agents** across **6 verticals** with **90+ tools** and **115+ Postgres tables**. The Healthcare stack (FastAPI), OneRoof real-estate (Next.js 16 + React 19), Salon (NestJS 10 + Prisma), and Sales (Node.js 20 + React 18 + Vite) all share a Hono-based realtime relay that handles 1.2M concurrent voice minutes/month with ~720ms p95 voice-to-voice. Pricing is **$149/$499/$1,499** with a **14-day no-card trial** and a **22% recurring affiliate**.

## FAQ

**Why Hono over Express?** Hono is ~14kb, runs on every JS runtime, and has first-class WebSocket helpers for Bun, Node, Workers, and Deno without code changes.

**Can I use Node instead of Bun?** Yes — swap `hono/bun` for `@hono/node-ws`. Bun is ~2x faster on cold start.

**What's the cost per minute?** `gpt-realtime` is ~$0.06/min audio in + $0.24/min audio out — call it ~$0.20/min for typical voice agent traffic.

**Does WebRTC work too?** Yes. For browser-direct WebRTC, mint an ephemeral key via `/v1/realtime/sessions` and skip the relay entirely.

## Sources

- OpenAI - Realtime API guide (gpt-realtime, WebRTC + WebSocket) - [https://developers.openai.com/api/docs/guides/realtime](https://developers.openai.com/api/docs/guides/realtime)
- OpenAI Agents SDK (TypeScript) - [https://openai.github.io/openai-agents-js/](https://openai.github.io/openai-agents-js/)
- Hono - WebSocket helpers - [https://hono.dev/helpers/websocket](https://hono.dev/helpers/websocket)
- ForaSoft - Production Voice Agents 2026 - [https://www.forasoft.com/blog/article/openai-realtime-api-voice-agent-production-guide-2026](https://www.forasoft.com/blog/article/openai-realtime-api-voice-agent-production-guide-2026)

---

Source: https://callsphere.ai/blog/vw8h-build-ai-voice-agent-hono-openai-realtime-typescript-2026