By Sagar Shankaran, Founder of CallSphere
Mint an ephemeral OpenAI key from a Next.js Route Handler, connect via WebRTC from the browser, and ship a working voice demo to Vercel in one afternoon.
Key takeaways
TL;DR — Don't ship your API key to the browser. Use a Next.js Route Handler to mint a 60-second
ephemeral_key, then let the browser open a WebRTC peer connection straight to OpenAI. Audio capture, playback, and barge-in come for free with WebRTC.
A Next.js 14 (App Router) page with a single "Talk" button. Click it, grant microphone permission, and speak — the OpenAI Realtime model replies through WebRTC with sub-500ms latency on a good connection. Deploy to Vercel and the same code becomes a public voice demo.
npm install (no extra deps required for the core).sequenceDiagram
participant B as Browser
participant N as Next.js (Route Handler)
participant O as OpenAI Realtime
B->>N: GET /api/realtime/session
N->>O: POST /v1/realtime/sessions (Bearer key)
O-->>N: { client_secret.value }
N-->>B: ephemeral_key
B->>O: SDP offer + Bearer ephemeral_key
O-->>B: SDP answer
B<-->O: Audio (RTP) + DataChannel events
```ts // app/api/realtime/session/route.ts export async function GET() { const r = await fetch("https://api.openai.com/v1/realtime/sessions", { method: "POST", headers: { Authorization: `Bearer ${process.env.OPENAI_API_KEY!}`, "Content-Type": "application/json", }, body: JSON.stringify({ model: "gpt-4o-realtime-preview-2025-06-03", voice: "alloy", modalities: ["audio", "text"], instructions: "You are a CallSphere demo assistant. Be concise and warm.", }), }); return Response.json(await r.json()); } ```
This returns { client_secret: { value, expires_at } } — the value is the short-lived bearer your browser will use.
```tsx // app/page.tsx "use client"; import { useState, useRef } from "react";
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
export default function Page() { const [active, setActive] = useState(false); const pcRef = useRef<RTCPeerConnection | null>(null); const audioRef = useRef<HTMLAudioElement | null>(null);
async function start() { const { client_secret } = await fetch("/api/realtime/session").then(r => r.json()); const ephemeral = client_secret.value;
const pc = new RTCPeerConnection();
pcRef.current = pc;
// Receive remote audio
pc.ontrack = (e) => { audioRef.current!.srcObject = e.streams[0]; };
// Send mic
const ms = await navigator.mediaDevices.getUserMedia({ audio: true });
pc.addTrack(ms.getAudioTracks()[0]);
// Data channel for events
const dc = pc.createDataChannel("oai-events");
dc.onmessage = (e) => console.log("event:", JSON.parse(e.data));
const offer = await pc.createOffer();
await pc.setLocalDescription(offer);
const sdpRes = await fetch(
"https://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2025-06-03",
{
method: "POST",
body: offer.sdp,
headers: {
Authorization: \`Bearer ${ephemeral}\`,
"Content-Type": "application/sdp",
},
}
);
await pc.setRemoteDescription({ type: "answer", sdp: await sdpRes.text() });
setActive(true);
}
return (
The default session config is fine, but you usually want to override the system prompt:
```ts dc.onopen = () => dc.send(JSON.stringify({ type: "session.update", session: { instructions: "You are CallSphere. Always end with: would you like a demo?", turn_detection: { type: "server_vad", threshold: 0.5 }, }, })); ```
```tsx function stop() { pcRef.current?.getSenders().forEach((s) => s.track?.stop()); pcRef.current?.close(); pcRef.current = null; setActive(false); } ```
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
```bash vercel --prod ```
Set OPENAI_API_KEY in Vercel project settings. The Route Handler runs at the edge or Node runtime — both work. Public URL is your demo.
<audio autoPlay> works only after a user gesture — your "Talk" button satisfies this./v1/realtime POST: OpenAI returns a Content-Type: application/sdp body; don't res.json() it.The public demo at /demo uses this exact pattern with per-industry prompts (Healthcare, Real Estate, Salon, Forex, Hospitality, Behavioral Health). The Real Estate "OneRoof" demo additionally connects to a Go gateway over NATS for tool calls — but the WebRTC handshake is the same Next.js code. See it live at /demo or start a 14-day trial.
Is WebRTC faster than WebSocket? Yes, by 100–300ms typically — RTC handles the audio path natively without your code re-encoding chunks.
Can I record the conversation? Yes — pc.getSenders()[0].track gives you the local mic; pipe it to a MediaRecorder. Remote audio is the ontrack stream.
Does WebRTC work behind corporate firewalls? Mostly — you may need a TURN server. OpenAI's endpoint typically traverses NAT cleanly.
How do I add tools? Send a session.update with a tools array via DataChannel; handle response.function_call_arguments.done events.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
Haystack 2.7's Agent component plus an Ollama-served Llama 3.2 gives you tool-calling RAG with citations. Here's a complete pipeline against your own document store.
BrowserStack offers 30,000+ real devices; Sauce Labs ships deep Appium automation. Here is how AI voice agent teams use both for WebRTC mobile QA in 2026.
WebTransport is Baseline as of March 2026. Media Over QUIC ships in production within the year. Here is what changes for AI voice agents — and what stays the same.
Run STT, LLM, and TTS entirely on Cloudflare's edge — no OpenAI, no ElevenLabs. Real working code with Whisper, Llama 3.3 70B, and Deepgram Aura.
On May 4 2026 OpenAI published its Realtime stack rebuild — split-relay plus transceiver edge. Here is what changed and what it means for production voice agents.
Evaluate build vs buy for enterprise calling platforms. Architecture patterns, SIP infrastructure, WebRTC, cost models, and timeline estimates for custom telephony systems.
© 2026 CallSphere LLC. All rights reserved.
Watch how CallSphere handles real customer calls, schedules appointments, and processes payments — live.
Try Live DemoBook a DemoCalculate Your ROI