Skip to content
AI Voice Agents
AI Voice Agents10 min read0 views

How to Build a Next.js 14 Voice Demo with OpenAI Realtime + WebRTC

Mint an ephemeral OpenAI key from a Next.js Route Handler, connect via WebRTC from the browser, and ship a working voice demo to Vercel in one afternoon.

TL;DR — Don't ship your API key to the browser. Use a Next.js Route Handler to mint a 60-second ephemeral_key, then let the browser open a WebRTC peer connection straight to OpenAI. Audio capture, playback, and barge-in come for free with WebRTC.

What you'll build

A Next.js 14 (App Router) page with a single "Talk" button. Click it, grant microphone permission, and speak — the OpenAI Realtime model replies through WebRTC with sub-500ms latency on a good connection. Deploy to Vercel and the same code becomes a public voice demo.

Prerequisites

  1. Next.js 14+ (App Router), React 18.
  2. OpenAI API key with Realtime access.
  3. Node 20+ and npm install (no extra deps required for the core).
  4. Familiarity with React Server Components and Route Handlers.
  5. Browser supporting WebRTC + getUserMedia (everything modern).

Architecture

sequenceDiagram
  participant B as Browser
  participant N as Next.js (Route Handler)
  participant O as OpenAI Realtime
  B->>N: GET /api/realtime/session
  N->>O: POST /v1/realtime/sessions (Bearer key)
  O-->>N: { client_secret.value }
  N-->>B: ephemeral_key
  B->>O: SDP offer + Bearer ephemeral_key
  O-->>B: SDP answer
  B<-->O: Audio (RTP) + DataChannel events

Step 1 — Route Handler that mints an ephemeral key

```ts // app/api/realtime/session/route.ts export async function GET() { const r = await fetch("https://api.openai.com/v1/realtime/sessions", { method: "POST", headers: { Authorization: `Bearer ${process.env.OPENAI_API_KEY!}`, "Content-Type": "application/json", }, body: JSON.stringify({ model: "gpt-4o-realtime-preview-2025-06-03", voice: "alloy", modalities: ["audio", "text"], instructions: "You are a CallSphere demo assistant. Be concise and warm.", }), }); return Response.json(await r.json()); } ```

This returns { client_secret: { value, expires_at } } — the value is the short-lived bearer your browser will use.

Step 2 — Client component with the talk button

```tsx // app/page.tsx "use client"; import { useState, useRef } from "react";

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

export default function Page() { const [active, setActive] = useState(false); const pcRef = useRef<RTCPeerConnection | null>(null); const audioRef = useRef<HTMLAudioElement | null>(null);

async function start() { const { client_secret } = await fetch("/api/realtime/session").then(r => r.json()); const ephemeral = client_secret.value;

const pc = new RTCPeerConnection();
pcRef.current = pc;

// Receive remote audio
pc.ontrack = (e) => { audioRef.current!.srcObject = e.streams[0]; };

// Send mic
const ms = await navigator.mediaDevices.getUserMedia({ audio: true });
pc.addTrack(ms.getAudioTracks()[0]);

// Data channel for events
const dc = pc.createDataChannel("oai-events");
dc.onmessage = (e) => console.log("event:", JSON.parse(e.data));

const offer = await pc.createOffer();
await pc.setLocalDescription(offer);

const sdpRes = await fetch(
  "https://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2025-06-03",
  {
    method: "POST",
    body: offer.sdp,
    headers: {
      Authorization: \`Bearer ${ephemeral}\`,
      "Content-Type": "application/sdp",
    },
  }
);
await pc.setRemoteDescription({ type: "answer", sdp: await sdpRes.text() });
setActive(true);

}

return (

); } ```

Step 3 — Send a session.update via DataChannel

The default session config is fine, but you usually want to override the system prompt:

```ts dc.onopen = () => dc.send(JSON.stringify({ type: "session.update", session: { instructions: "You are CallSphere. Always end with: would you like a demo?", turn_detection: { type: "server_vad", threshold: 0.5 }, }, })); ```

Step 4 — Add a "hang up" button

```tsx function stop() { pcRef.current?.getSenders().forEach((s) => s.track?.stop()); pcRef.current?.close(); pcRef.current = null; setActive(false); } ```

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Step 5 — Deploy to Vercel

```bash vercel --prod ```

Set OPENAI_API_KEY in Vercel project settings. The Route Handler runs at the edge or Node runtime — both work. Public URL is your demo.

Common pitfalls

  • Sending API key to client: never. Always go through the Route Handler.
  • Ephemeral key expired: it lasts ~60s. Mint a fresh one per session.
  • Autoplay blocked: the <audio autoPlay> works only after a user gesture — your "Talk" button satisfies this.
  • CORS errors on /v1/realtime POST: OpenAI returns a Content-Type: application/sdp body; don't res.json() it.

How CallSphere does this in production

The public demo at /demo uses this exact pattern with per-industry prompts (Healthcare, Real Estate, Salon, Forex, Hospitality, Behavioral Health). The Real Estate "OneRoof" demo additionally connects to a Go gateway over NATS for tool calls — but the WebRTC handshake is the same Next.js code. See it live at /demo or start a 14-day trial.

FAQ

Is WebRTC faster than WebSocket? Yes, by 100–300ms typically — RTC handles the audio path natively without your code re-encoding chunks.

Can I record the conversation? Yes — pc.getSenders()[0].track gives you the local mic; pipe it to a MediaRecorder. Remote audio is the ontrack stream.

Does WebRTC work behind corporate firewalls? Mostly — you may need a TURN server. OpenAI's endpoint typically traverses NAT cleanly.

How do I add tools? Send a session.update with a tools array via DataChannel; handle response.function_call_arguments.done events.

Sources

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.