By Sagar Shankaran, Founder of CallSphere
Bun 1.3 + Hono is 2x faster than Node + Express for WebSocket relays. Wire it to gpt-realtime-2 and deploy to Fly.io edge for sub-500ms voice-to-voice in 6 regions.
Key takeaways
TL;DR — Bun 1.3 starts in ~25ms cold, Hono is ~14kb, and OpenAI's gpt-realtime-2 (introduced 2026) gives you GPT-5-class reasoning over voice. Combined: a single
bun run server.tsships a 6-region edge voice agent.
A WebSocket relay between browser PCM and OpenAI Realtime, deployed to 6 Fly.io regions with anycast. Browsers get routed to the nearest edge, p95 voice-to-voice ~480ms.
hono@^4.6.brew install flyctl) and an OpenAI key.flowchart LR
BR[Browser] --> CF[Cloudflare anycast]
CF --> FY[Fly edge nearest of 6]
FY -- WS --> H[Hono relay on Bun]
H -- WS --> OA[OpenAI Realtime gpt-realtime-2]
server.ts```ts import { Hono } from "hono"; import { upgradeWebSocket } from "hono/bun";
const app = new Hono(); const URL = "wss://api.openai.com/v1/realtime?model=gpt-realtime-2";
app.get("/ws", upgradeWebSocket(() => {
let oa: WebSocket;
return {
onOpen: (_e, ws) => {
oa = new WebSocket(URL, {
headers: { Authorization: Bearer ${process.env.OPENAI_API_KEY},
"OpenAI-Beta": "realtime=v1" },
} as any);
oa.onopen = () => oa.send(JSON.stringify({
type: "session.update",
session: { voice: "verse",
turn_detection: { type: "semantic_vad" } } }));
oa.onmessage = (m) => ws.send(m.data);
},
onMessage: (e) => oa?.readyState === 1 && oa.send(e.data),
onClose: () => oa?.close(),
};
}));
export default { port: 8787, fetch: app.fetch, websocket: { /* bun ws */ } }; ```
Dockerfile```dockerfile FROM oven/bun:1.3-alpine WORKDIR /app COPY bun.lockb package.json ./ RUN bun install --frozen-lockfile COPY . . EXPOSE 8787 CMD ["bun", "run", "server.ts"] ```
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
fly.toml```toml app = "voice-edge" primary_region = "iad" [build] [http_service] internal_port = 8787 force_https = true [[regions]] iad [deploy] strategy = "rolling" ```
fly deploy and fly regions add lhr nrt syd fra gru for 6 edges.
Use a 24kHz AudioWorklet to capture PCM16 chunks every 20ms and forward as base64-wrapped input_audio_buffer.append events.
The 2026 gpt-realtime-2 model handles complex multi-tool calls in one turn. Set session.instructions with up to 8K tokens of policy without hurting latency.
Cloudflare's *.callsphere.ai → voice-edge.fly.dev with proxied = false (DNS-only) so WebSocket round-trips bypass the proxy.
ws.send overloads differ from Node — test on Bun, not just locally.min_machines_running = 1 per region to keep voice cold-starts <50ms.CallSphere's edge voice fleet handles 1.2M+ minutes/month across 6 verticals with 37 agents and 90+ tools. Healthcare (FastAPI), OneRoof (Next.js 16 + React 19), Salon (NestJS 10 + Prisma), Sales (Node.js 20 + React 18 + Vite). All voice flows route through a Bun + Hono relay. $149/$499/$1,499, 14-day trial, 22% affiliate.
Why not Node? Bun's WebSocket implementation is ~2x faster on raw throughput.
Cloudflare Workers? Workers cap WS connections at 6 hours and have no persistent state — Fly + Bun is simpler.
TURN servers? WebSocket relays don't need them; only WebRTC direct does.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Cost? Fly: ~$5/region/month for 256MB shared CPU. OpenAI: ~$0.20-$0.30/min.
To make the framing in Build a Bun + Hono + OpenAI Realtime Voice Agent on the Edge (2026) operational, the trade-off you cannot defer is channel routing between voice and chat — a missed call should not die, it should warm up the SMS or web-chat lane within seconds. Treat this as a voice-first system from the first prompt: the agent's persona, its tool surface, and its escalation rules all flow from that single decision. Teams that ship fast tend to instrument the loop end-to-end before they tune any single component, because the bottleneck is rarely where intuition puts it.
A production-grade voice stack at CallSphere stitches Twilio Programmable Voice (PSTN ingress, TwiML, bidirectional Media Streams) to a realtime reasoning layer — typically OpenAI Realtime or ElevenLabs Conversational AI — with sub-second response as a hard SLO. Anything north of one second of perceived silence and callers either repeat themselves or hang up; that single number drives the whole architecture. Server-side VAD with proper barge-in support is non-negotiable, otherwise the agent talks over the caller and the conversation collapses. Streaming TTS with phoneme-aligned interruption keeps the cadence natural even when the user changes their mind mid-sentence. Post-call, every transcript is run through a structured pipeline: sentiment, intent classification, lead score, escalation flag, and a normalized slot extraction (name, callback number, reason, urgency). For healthcare workloads, the BAA-covered storage path, audit logs, encryption-at-rest, and PHI-safe transcript redaction are wired in from day one, not bolted on at compliance review. The end state is a system where every call produces a row of structured data, not just a recording.
What changes when you move a voice agent the way Build a Bun + Hono + OpenAI Realtime Voice Agent on the Edge (2026) describes?
Treat the architecture in this post as a starting point and instrument it before you tune it. The metrics that matter most early on are end-to-end latency (target < 1s for voice, < 3s for chat), barge-in correctness, tool-call success rate, and post-conversation lead score distribution. Optimize whatever the data flags as the bottleneck, not whatever feels slowest in your head.
Where does this break down for voice agent deployments at scale?
The two failure modes that bite hardest are silent context loss across multi-turn handoffs and tool calls that succeed in dev but get rate-limited in production. Both are solvable with a proper agent backplane that pins state to a session ID, retries with backoff, and writes every tool invocation to an audit log you can replay.
How does the After-Hours Escalation product make sure no urgent call is dropped?
It runs 7 agents on a Primary → Secondary → 6-fallback ladder with a 120-second ACK timeout per leg. If the primary on-call does not acknowledge inside the window, the next contact is paged automatically — voice, SMS, and push — until somebody owns the incident.
Book a 30-minute working session at calendly.com/sagar-callsphere/new-meeting and bring a real call flow — we will walk it through the live after-hours escalation product at escalation.callsphere.tech and show you exactly where the production wiring sits.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
The voice AI market hits $47.5B by 2034. For gyms and PT studios, voice agents now make economic sense for member intake, upsells, and reactivation campaigns.
With the voice AI market at $47.5B by 2034 and OpenAI's realtime release this week, every dealership and service shop should be evaluating voice agents. Here's how.
Spring 2026 AC season starts now. With the voice AI market at $47.5B by 2034, HVAC shops without after-hours voice agents will lose to those that have them.
OpenAI's GPT-Realtime-Translate handles 70 input languages live at $0.034/min. Here is what that means for multilingual restaurant takeout — and how CallSphere ships it.
OpenAI's GPT-Realtime-Translate hits 70 languages at $0.034/min. For dental practices in diverse metros, this changes who picks up the phone — and who books the appointment.
Google Cloud Next rebranded Vertex AI as Gemini Enterprise Agent Platform with 2M context. Here is what that means for salon and beauty bookings — and where CallSphere fits.
© 2026 CallSphere LLC. All rights reserved.
Watch how CallSphere handles real customer calls, schedules appointments, and processes payments — live.
Try Live DemoBook a DemoCalculate Your ROI