Building a 1-Click Browser Voice AI Demo with WebRTC in Under 50 Lines
The CallSphere /demo page hands a user a working voice AI in one click. Here is the WebRTC plumbing that gets a Realtime session live before the page settles.
A voice AI demo that needs a phone number is dead on arrival. In 2026 the bar is one click, in the browser, with sub-500 ms first-audio. WebRTC + OpenAI Realtime makes it boring.
What it is and why now
flowchart LR
Browser["Browser · WebRTC"] --> ICE["ICE / STUN / TURN"]
ICE --> SFU["SFU · Pion Go gateway 1.23"]
SFU --> NATS["NATS bus"]
NATS --> AI["AI Worker · OpenAI Realtime"]
AI --> NATS
NATS --> SFU
SFU --> BrowserEmbedding a working voice agent on a marketing page used to be impossible because every transport route went through telephony. With WebRTC + ephemeral Realtime tokens, the browser becomes the SIP client. Your landing page hands the user a microphone, a "Talk" button, and a 380 ms first-audio experience.
In 2026 this is now table stakes for any AI voice company. CallSphere's /demo page does it. So do most of our competitors. The differentiator is no longer "does it work?" — it is "how fast is the first turn?"
How WebRTC fits AI voice (architecture)
A 1-click demo has three moving parts:
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
- Token endpoint — your Next.js / Express route mints an ephemeral key with the OpenAI Realtime sessions API.
- Browser peer — a single `RTCPeerConnection` with a mic track, a remote-audio attach, and a data channel for events.
- SDP exchange — POST your offer SDP to the Realtime endpoint with the ephemeral token; receive an SDP answer.
That is it. No SFU, no signaling server, no SIP gateway. Mic-on to first-audio under half a second.
CallSphere implementation
The /demo page on callsphere.ai uses exactly the snippet below, with two differences:
- We mint the ephemeral key from a Next.js route handler at `/api/realtime/token` that injects per-vertical instructions (Real Estate OneRoof, Healthcare, Behavioral Health, Salon, Restaurant, Retail).
- The data channel surfaces the same JSON events our production gateway emits, so visitors can see the audit trail in real time.
The demo intentionally skips telephony, the 6-container pod, and the Go gateway — those exist on the production path for paying customers. The /demo path proves the user-facing latency story before they sign up. That is the marketing job: a 30-second click-to-talk experience converts at 8–9% on our page.
Code snippet (TypeScript, full 1-click demo)
```ts async function oneClickVoice() { const { client_secret, model } = await fetch("/api/realtime/token").then(r => r.json());
const pc = new RTCPeerConnection(); const audio = document.getElementById("agent-audio") as HTMLAudioElement; pc.ontrack = (e) => { audio.srcObject = e.streams[0]; };
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
const stream = await navigator.mediaDevices.getUserMedia({ audio: true }); pc.addTrack(stream.getAudioTracks()[0], stream);
const dc = pc.createDataChannel("oai"); dc.onopen = () => dc.send(JSON.stringify({ type: "session.update", session: { instructions: "You are CallSphere's demo agent." } }));
await pc.setLocalDescription(await pc.createOffer()); const ans = await fetch(`https://api.openai.com/v1/realtime?model=\${model}\`, { method: "POST", headers: { Authorization: `Bearer ${client_secret}`, "Content-Type": "application/sdp" }, body: pc.localDescription!.sdp, }); await pc.setRemoteDescription({ type: "answer", sdp: await ans.text() }); } ```
Build / migration steps
- Add a server route (Next.js / Express) that calls `POST /v1/realtime/sessions` with your secret key and returns the ephemeral `client_secret`.
- Drop the snippet above into the page. Wire it to a single `
- Add an `
- Listen for `response.done` on the data channel to display transcripts.
- Add a `getStats` poll to surface RTT and packet loss in the UI.
- Throttle the token endpoint (CallSphere uses 1 mint per IP per 30 s) — ephemeral keys are still a paid resource.
FAQ
Do I need TURN for this? For a marketing demo, optional. For a real product, yes. Will Safari work? iOS 11+ desktop and mobile Safari work; some autoplay quirks — call `audio.play()` on user gesture. How long does an ephemeral key last? A few minutes; refresh on the server side every minute if needed. Can I prerecord/cache the welcome line? Yes — emit a `response.create` immediately on `dc.onopen`. How do I prevent abuse? Rate-limit by IP, fingerprint, and require a Turnstile or hCaptcha pass first.
Sources
- https://developers.openai.com/api/docs/guides/realtime-webrtc
- https://platform.openai.com/docs/guides/realtime
- https://medium.com/@viplav.fauzdar/%EF%B8%8F-building-a-gpt-realtime-voice-assistant-with-webrtc-fe6dd4c8f488
- https://dev.to/aws-builders/switching-my-ai-voice-agent-from-websocket-to-webrtc-what-broke-and-what-i-learned-3dkn
Try it now on /demo. When you are ready to ship, /trial is 14 days free.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.