OpenAI documents two transports for the Realtime API: WebRTC and WebSocket. The right answer is not "pick one." It is "pick per hop." CallSphere uses WebRTC at the browser edge and WebSocket on every server-to-server hop.

What it is and why now

flowchart LR
  Mobile[iOS / Android SDK] --> WHIP[WHIP ingest]
  WHIP --> Mux[Mux / LiveKit]
  Mux --> Brain[AI brain]
  Brain --> WHEP[WHEP egress]
  WHEP --> Web[Web viewer]

CallSphere reference architecture

The Realtime API exposes `gpt-realtime` (and `gpt-4o-realtime`) over two wire formats. WebRTC is the recommended path for browsers and mobile clients; WebSocket is the recommended path for middle-tier servers running inside controlled networks. The difference is not academic. WebRTC runs over UDP/SRTP with a built-in jitter buffer, AEC, AGC, and noise suppression. WebSocket runs over TCP — every dropped packet stalls the stream while it retransmits, which is fine for tokens but devastating for live audio.

In 2026 the question matters more than ever because most teams have started embedding voice in marketing pages and product UIs, not just in phone systems. A live page-embed running over WebSocket on flaky Wi-Fi sounds like a 1990s VoIP call. The same path over WebRTC sounds like FaceTime.

How WebRTC fits AI voice (architecture)

The peer connection lifecycle for a Realtime call:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

Browser asks your server for an ephemeral session token (so the long-lived API key never leaves the backend).
Browser creates an `RTCPeerConnection`, captures the mic into a track, and creates a data channel for events.
ICE gathers candidates (host, server-reflexive via STUN, relay via TURN).
SDP offer is sent to OpenAI's Realtime endpoint with the ephemeral token; OpenAI returns the SDP answer.
SRTP carries Opus audio both directions; the data channel carries JSON events (`response.create`, `input_audio_buffer.commit`, function-call deltas).

Where WebSocket wins: server agents that need to mediate tool calls, redact PII, write audit logs, or talk to phone networks. There the server keeps a WebSocket open to OpenAI and bridges audio to whichever transport the user uses.

CallSphere implementation

CallSphere runs both transports in production:

Browser /demo — WebRTC peer connection straight to OpenAI Realtime with an ephemeral key minted by our Next.js API route. No backend audio relay. End-to-end median first-audio is 380 ms.
Real Estate (OneRoof) — Browser dials in over WebRTC; our Go gateway 1.23 keeps a WebSocket to Realtime so it can fan out tool calls to NATS and the 6-container pod (CRM writer, calendar, MLS lookup, SMS, audit, transcript).
Healthcare — Same pattern, HIPAA-isolated. The WebSocket leg lives entirely inside the VPC so audit and PHI redaction happen before anything leaves us.

Across 37 agents and 90+ tools we touch 115+ database tables. The ability to keep WebRTC at the user edge while WebSocket carries the controlled-network legs is what lets us claim sub-second perceived latency without giving up SOC 2 controls.

Code snippet (TypeScript, browser side)

```ts async function startRealtime() { const tokenRes = await fetch("/api/realtime/token"); const { client_secret } = await tokenRes.json();

const pc = new RTCPeerConnection(); const audioEl = document.createElement("audio"); audioEl.autoplay = true; pc.ontrack = (e) => { audioEl.srcObject = e.streams[0]; };

const mic = await navigator.mediaDevices.getUserMedia({ audio: true }); pc.addTrack(mic.getAudioTracks()[0]);

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

const dc = pc.createDataChannel("oai-events"); dc.onmessage = (e) => console.log("event", JSON.parse(e.data));

const offer = await pc.createOffer(); await pc.setLocalDescription(offer);

const res = await fetch("https://api.openai.com/v1/realtime?model=gpt-realtime", { method: "POST", body: offer.sdp, headers: { Authorization: `Bearer ${client_secret}`, "Content-Type": "application/sdp" }, }); await pc.setRemoteDescription({ type: "answer", sdp: await res.text() }); } ```

Build / migration steps

Mint short-lived ephemeral tokens server-side; never ship long-lived keys to the browser.
Stand up an `RTCPeerConnection` in the client; attach the mic; create a data channel for events.
Generate the SDP offer, POST it to the Realtime SDP endpoint, set the answer.
For controlled-network hops (telephony bridges, agent workers), open a WebSocket from your server to `wss://api.openai.com/v1/realtime`.
Wire tool calls and audit logging on the WebSocket leg only; keep the browser leg pure transport.
Add a `getStats` poller for the peer connection to track packet loss and jitter (we sample every 2 s).

FAQ

Can I run both at once? Yes. CallSphere uses WebRTC client-side and WebSocket server-side; they connect to the same model. Does WebRTC work on iOS Safari? Yes since iOS 11, and Safari 26.4 (March 2026) ships first-party WebTransport too. What about telephony? Phone calls hit our SIP gateway, which bridges into a WebSocket Realtime session. Browser callers stay on WebRTC. Do I still need TURN? Yes — about 8–10% of users live behind symmetric NATs that fail STUN. How long is a session? OpenAI caps Realtime sessions at 30 minutes; renew before that.

Sources

Try the WebRTC path live on our /demo, see the bundle in /pricing, or start a /trial.

OpenAI Realtime API: WebRTC vs WebSocket — When to Pick Which in 2026

What it is and why now

How WebRTC fits AI voice (architecture)

CallSphere implementation

Code snippet (TypeScript, browser side)

Build / migration steps

FAQ

Sources

Try CallSphere AI Voice Agents

Related Articles You May Like

Latency vs Cost: A Decision Matrix for Voice AI Spend in 2026

WebRTC Mobile Testing with BrowserStack + Sauce Labs (2026)

Streaming Agent Responses with OpenAI Agents SDK and LangChain in 2026

Token-Level Evaluation of Streaming Agents: TTFT, Stream Smoothness, and Mid-Stream Hallucination Detection

OpenAI's May 2026 WebRTC Rearchitecture: How Voice Latency Got Real

Logistics Dispatch Voice Agent 2026: Driver Hotline + Load Assignment Hands-Free