How to Build a Next.js 14 Voice Demo with OpenAI Realtime + WebRTC
Mint an ephemeral OpenAI key from a Next.js Route Handler, connect via WebRTC from the browser, and ship a working voice demo to Vercel in one afternoon.
TL;DR — Don't ship your API key to the browser. Use a Next.js Route Handler to mint a 60-second
ephemeral_key, then let the browser open a WebRTC peer connection straight to OpenAI. Audio capture, playback, and barge-in come for free with WebRTC.
What you'll build
A Next.js 14 (App Router) page with a single "Talk" button. Click it, grant microphone permission, and speak — the OpenAI Realtime model replies through WebRTC with sub-500ms latency on a good connection. Deploy to Vercel and the same code becomes a public voice demo.
Prerequisites
- Next.js 14+ (App Router), React 18.
- OpenAI API key with Realtime access.
- Node 20+ and
npm install(no extra deps required for the core). - Familiarity with React Server Components and Route Handlers.
- Browser supporting WebRTC + getUserMedia (everything modern).
Architecture
sequenceDiagram
participant B as Browser
participant N as Next.js (Route Handler)
participant O as OpenAI Realtime
B->>N: GET /api/realtime/session
N->>O: POST /v1/realtime/sessions (Bearer key)
O-->>N: { client_secret.value }
N-->>B: ephemeral_key
B->>O: SDP offer + Bearer ephemeral_key
O-->>B: SDP answer
B<-->O: Audio (RTP) + DataChannel events
Step 1 — Route Handler that mints an ephemeral key
```ts // app/api/realtime/session/route.ts export async function GET() { const r = await fetch("https://api.openai.com/v1/realtime/sessions", { method: "POST", headers: { Authorization: `Bearer ${process.env.OPENAI_API_KEY!}`, "Content-Type": "application/json", }, body: JSON.stringify({ model: "gpt-4o-realtime-preview-2025-06-03", voice: "alloy", modalities: ["audio", "text"], instructions: "You are a CallSphere demo assistant. Be concise and warm.", }), }); return Response.json(await r.json()); } ```
This returns { client_secret: { value, expires_at } } — the value is the short-lived bearer your browser will use.
Step 2 — Client component with the talk button
```tsx // app/page.tsx "use client"; import { useState, useRef } from "react";
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
export default function Page() { const [active, setActive] = useState(false); const pcRef = useRef<RTCPeerConnection | null>(null); const audioRef = useRef<HTMLAudioElement | null>(null);
async function start() { const { client_secret } = await fetch("/api/realtime/session").then(r => r.json()); const ephemeral = client_secret.value;
const pc = new RTCPeerConnection();
pcRef.current = pc;
// Receive remote audio
pc.ontrack = (e) => { audioRef.current!.srcObject = e.streams[0]; };
// Send mic
const ms = await navigator.mediaDevices.getUserMedia({ audio: true });
pc.addTrack(ms.getAudioTracks()[0]);
// Data channel for events
const dc = pc.createDataChannel("oai-events");
dc.onmessage = (e) => console.log("event:", JSON.parse(e.data));
const offer = await pc.createOffer();
await pc.setLocalDescription(offer);
const sdpRes = await fetch(
"https://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2025-06-03",
{
method: "POST",
body: offer.sdp,
headers: {
Authorization: \`Bearer ${ephemeral}\`,
"Content-Type": "application/sdp",
},
}
);
await pc.setRemoteDescription({ type: "answer", sdp: await sdpRes.text() });
setActive(true);
}
return (
Step 3 — Send a session.update via DataChannel
The default session config is fine, but you usually want to override the system prompt:
```ts dc.onopen = () => dc.send(JSON.stringify({ type: "session.update", session: { instructions: "You are CallSphere. Always end with: would you like a demo?", turn_detection: { type: "server_vad", threshold: 0.5 }, }, })); ```
Step 4 — Add a "hang up" button
```tsx function stop() { pcRef.current?.getSenders().forEach((s) => s.track?.stop()); pcRef.current?.close(); pcRef.current = null; setActive(false); } ```
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Step 5 — Deploy to Vercel
```bash vercel --prod ```
Set OPENAI_API_KEY in Vercel project settings. The Route Handler runs at the edge or Node runtime — both work. Public URL is your demo.
Common pitfalls
- Sending API key to client: never. Always go through the Route Handler.
- Ephemeral key expired: it lasts ~60s. Mint a fresh one per session.
- Autoplay blocked: the
<audio autoPlay>works only after a user gesture — your "Talk" button satisfies this. - CORS errors on
/v1/realtimePOST: OpenAI returns aContent-Type: application/sdpbody; don'tres.json()it.
How CallSphere does this in production
The public demo at /demo uses this exact pattern with per-industry prompts (Healthcare, Real Estate, Salon, Forex, Hospitality, Behavioral Health). The Real Estate "OneRoof" demo additionally connects to a Go gateway over NATS for tool calls — but the WebRTC handshake is the same Next.js code. See it live at /demo or start a 14-day trial.
FAQ
Is WebRTC faster than WebSocket? Yes, by 100–300ms typically — RTC handles the audio path natively without your code re-encoding chunks.
Can I record the conversation? Yes — pc.getSenders()[0].track gives you the local mic; pipe it to a MediaRecorder. Remote audio is the ontrack stream.
Does WebRTC work behind corporate firewalls? Mostly — you may need a TURN server. OpenAI's endpoint typically traverses NAT cleanly.
How do I add tools? Send a session.update with a tools array via DataChannel; handle response.function_call_arguments.done events.
Sources
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.