---
title: "Build a Bun + Hono + OpenAI Realtime Voice Agent on the Edge (2026)"
description: "Bun 1.3 + Hono is 2x faster than Node + Express for WebSocket relays. Wire it to gpt-realtime-2 and deploy to Fly.io edge for sub-500ms voice-to-voice in 6 regions."
canonical: https://callsphere.ai/blog/vw8h-build-ai-agent-bun-hono-openai-realtime-edge-2026
category: "AI Voice Agents"
tags: ["Bun", "Hono", "OpenAI Realtime", "Voice Agent", "Edge"]
author: "CallSphere Team"
published: 2026-04-10T00:00:00.000Z
updated: 2026-05-08T17:25:15.724Z
---

# Build a Bun + Hono + OpenAI Realtime Voice Agent on the Edge (2026)

> Bun 1.3 + Hono is 2x faster than Node + Express for WebSocket relays. Wire it to gpt-realtime-2 and deploy to Fly.io edge for sub-500ms voice-to-voice in 6 regions.

> **TL;DR** — Bun 1.3 starts in ~25ms cold, Hono is ~14kb, and OpenAI's gpt-realtime-2 (introduced 2026) gives you GPT-5-class reasoning over voice. Combined: a single `bun run server.ts` ships a 6-region edge voice agent.

## What you'll build

A WebSocket relay between browser PCM and OpenAI Realtime, deployed to 6 Fly.io regions with anycast. Browsers get routed to the nearest edge, p95 voice-to-voice ~480ms.

## Prerequisites

1. Bun 1.3+, `hono@^4.6`.
2. Fly.io CLI (`brew install flyctl`) and an OpenAI key.
3. Domain on Cloudflare for TLS pass-through.

## Architecture

```mermaid
flowchart LR
  BR[Browser] --> CF[Cloudflare anycast]
  CF --> FY[Fly edge nearest of 6]
  FY -- WS --> H[Hono relay on Bun]
  H -- WS --> OA[OpenAI Realtime gpt-realtime-2]
```

## Step 1 — `server.ts`

```ts
import { Hono } from "hono";
import { upgradeWebSocket } from "hono/bun";

const app = new Hono();
const URL = "wss://api.openai.com/v1/realtime?model=gpt-realtime-2";

app.get("/ws", upgradeWebSocket(() => {
  let oa: WebSocket;
  return {
    onOpen: (_e, ws) => {
      oa = new WebSocket(URL, {
        headers: { Authorization: `Bearer ${process.env.OPENAI_API_KEY}`,
                   "OpenAI-Beta": "realtime=v1" },
      } as any);
      oa.onopen = () => oa.send(JSON.stringify({
        type: "session.update",
        session: { voice: "verse",
                   turn_detection: { type: "semantic_vad" } } }));
      oa.onmessage = (m) => ws.send(m.data);
    },
    onMessage: (e) => oa?.readyState === 1 && oa.send(e.data),
    onClose:   () => oa?.close(),
  };
}));

export default { port: 8787, fetch: app.fetch, websocket: { /* bun ws */ } };
```

## Step 2 — `Dockerfile`

```dockerfile
FROM oven/bun:1.3-alpine
WORKDIR /app
COPY bun.lockb package.json ./
RUN bun install --frozen-lockfile
COPY . .
EXPOSE 8787
CMD ["bun", "run", "server.ts"]
```

## Step 3 — `fly.toml`

```toml
app = "voice-edge"
primary_region = "iad"
[build]
[http_service]
  internal_port = 8787
  force_https = true
[[regions]]
  iad
[deploy]
  strategy = "rolling"
```

`fly deploy` and `fly regions add lhr nrt syd fra gru` for 6 edges.

## Step 4 — Browser PCM

Use a 24kHz `AudioWorklet` to capture PCM16 chunks every 20ms and forward as base64-wrapped `input_audio_buffer.append` events.

## Step 5 — gpt-realtime-2 reasoning

The 2026 `gpt-realtime-2` model handles complex multi-tool calls in one turn. Set `session.instructions` with up to 8K tokens of policy without hurting latency.

## Step 6 — Anycast tip

Cloudflare's `*.callsphere.ai` → `voice-edge.fly.dev` with `proxied = false` (DNS-only) so WebSocket round-trips bypass the proxy.

## Pitfalls

- **Bun's WS spec drift**: Some `ws.send` overloads differ from Node — test on Bun, not just locally.
- **Fly cold start**: Set `min_machines_running = 1` per region to keep voice cold-starts <50ms.
- **gpt-realtime-2 cost**: Audio in/out roughly $0.07/$0.27 per minute (estimate; check current pricing).

## How CallSphere does this in production

CallSphere's edge voice fleet handles **1.2M+ minutes/month** across **6 verticals** with **37 agents** and **90+ tools**. Healthcare (FastAPI), OneRoof (Next.js 16 + React 19), Salon (NestJS 10 + Prisma), Sales (Node.js 20 + React 18 + Vite). All voice flows route through a Bun + Hono relay. **$149/$499/$1,499**, **14-day trial**, **22% affiliate**.

## FAQ

**Why not Node?** Bun's WebSocket implementation is ~2x faster on raw throughput.

**Cloudflare Workers?** Workers cap WS connections at 6 hours and have no persistent state — Fly + Bun is simpler.

**TURN servers?** WebSocket relays don't need them; only WebRTC direct does.

**Cost?** Fly: ~$5/region/month for 256MB shared CPU. OpenAI: ~$0.20-$0.30/min.

## Sources

- Hono on Bun WebSockets - [https://hono.dev/helpers/websocket](https://hono.dev/helpers/websocket)
- QuotyAI - Bun + Hono backend - [https://quotyai.com/blog/why-i-picked-bun-and-hono/](https://quotyai.com/blog/why-i-picked-bun-and-hono/)
- OpenAI gpt-realtime - [https://openai.com/index/introducing-gpt-realtime/](https://openai.com/index/introducing-gpt-realtime/)
- StartupHub - GPT-Realtime-2 2026 - [https://www.startuphub.ai/ai-news/artificial-intelligence/2026/openai-s-new-voice-api-models](https://www.startuphub.ai/ai-news/artificial-intelligence/2026/openai-s-new-voice-api-models)

## How this plays out in production

To make the framing in *Build a Bun + Hono + OpenAI Realtime Voice Agent on the Edge (2026)* operational, the trade-off you cannot defer is channel routing between voice and chat — a missed call should not die, it should warm up the SMS or web-chat lane within seconds. Treat this as a voice-first system from the first prompt: the agent's persona, its tool surface, and its escalation rules all flow from that single decision. Teams that ship fast tend to instrument the loop end-to-end before they tune any single component, because the bottleneck is rarely where intuition puts it.

## Voice agent architecture, end to end

A production-grade voice stack at CallSphere stitches Twilio Programmable Voice (PSTN ingress, TwiML, bidirectional Media Streams) to a realtime reasoning layer — typically OpenAI Realtime or ElevenLabs Conversational AI — with sub-second response as a hard SLO. Anything north of one second of perceived silence and callers either repeat themselves or hang up; that single number drives the whole architecture. Server-side VAD with proper barge-in support is non-negotiable, otherwise the agent talks over the caller and the conversation collapses. Streaming TTS with phoneme-aligned interruption keeps the cadence natural even when the user changes their mind mid-sentence. Post-call, every transcript is run through a structured pipeline: sentiment, intent classification, lead score, escalation flag, and a normalized slot extraction (name, callback number, reason, urgency). For healthcare workloads, the BAA-covered storage path, audit logs, encryption-at-rest, and PHI-safe transcript redaction are wired in from day one, not bolted on at compliance review. The end state is a system where every call produces a row of structured data, not just a recording.

## FAQ

**What changes when you move a voice agent the way *Build a Bun + Hono + OpenAI Realtime Voice Agent on the Edge (2026)* describes?**

Treat the architecture in this post as a starting point and instrument it before you tune it. The metrics that matter most early on are end-to-end latency (target < 1s for voice, < 3s for chat), barge-in correctness, tool-call success rate, and post-conversation lead score distribution. Optimize whatever the data flags as the bottleneck, not whatever feels slowest in your head.

**Where does this break down for voice agent deployments at scale?**

The two failure modes that bite hardest are silent context loss across multi-turn handoffs and tool calls that succeed in dev but get rate-limited in production. Both are solvable with a proper agent backplane that pins state to a session ID, retries with backoff, and writes every tool invocation to an audit log you can replay.

**How does the After-Hours Escalation product make sure no urgent call is dropped?**

It runs 7 agents on a Primary → Secondary → 6-fallback ladder with a 120-second ACK timeout per leg. If the primary on-call does not acknowledge inside the window, the next contact is paged automatically — voice, SMS, and push — until somebody owns the incident.

## See it live

Book a 30-minute working session at [calendly.com/sagar-callsphere/new-meeting](https://calendly.com/sagar-callsphere/new-meeting) and bring a real call flow — we will walk it through the live after-hours escalation product at [escalation.callsphere.tech](https://escalation.callsphere.tech) and show you exactly where the production wiring sits.

---

Source: https://callsphere.ai/blog/vw8h-build-ai-agent-bun-hono-openai-realtime-edge-2026
