---
title: "OpenAI Realtime API: WebRTC vs WebSocket — When to Pick Which in 2026"
description: "OpenAI's Realtime API speaks both WebRTC and WebSocket. Here is the production playbook CallSphere uses across 37 agents to decide which transport fits which call path."
canonical: https://callsphere.ai/blog/vw1e-openai-realtime-webrtc-vs-websocket
category: "AI Voice Agents"
tags: ["WebRTC", "OpenAI Realtime", "Voice AI", "Latency", "WebSocket"]
author: "CallSphere Team"
published: 2026-03-15T00:00:00.000Z
updated: 2026-05-07T09:32:10.930Z
---

# OpenAI Realtime API: WebRTC vs WebSocket — When to Pick Which in 2026

> OpenAI's Realtime API speaks both WebRTC and WebSocket. Here is the production playbook CallSphere uses across 37 agents to decide which transport fits which call path.

> OpenAI documents two transports for the Realtime API: WebRTC and WebSocket. The right answer is not "pick one." It is "pick per hop." CallSphere uses WebRTC at the browser edge and WebSocket on every server-to-server hop.

## What it is and why now

```mermaid
flowchart LR
  Mobile[iOS / Android SDK] --> WHIP[WHIP ingest]
  WHIP --> Mux[Mux / LiveKit]
  Mux --> Brain[AI brain]
  Brain --> WHEP[WHEP egress]
  WHEP --> Web[Web viewer]
```

CallSphere reference architecture

The Realtime API exposes `gpt-realtime` (and `gpt-4o-realtime`) over two wire formats. WebRTC is the recommended path for browsers and mobile clients; WebSocket is the recommended path for middle-tier servers running inside controlled networks. The difference is not academic. WebRTC runs over UDP/SRTP with a built-in jitter buffer, AEC, AGC, and noise suppression. WebSocket runs over TCP — every dropped packet stalls the stream while it retransmits, which is fine for tokens but devastating for live audio.

In 2026 the question matters more than ever because most teams have started embedding voice in marketing pages and product UIs, not just in phone systems. A live page-embed running over WebSocket on flaky Wi-Fi sounds like a 1990s VoIP call. The same path over WebRTC sounds like FaceTime.

## How WebRTC fits AI voice (architecture)

The peer connection lifecycle for a Realtime call:

1. Browser asks your server for an ephemeral session token (so the long-lived API key never leaves the backend).
2. Browser creates an `RTCPeerConnection`, captures the mic into a track, and creates a data channel for events.
3. ICE gathers candidates (host, server-reflexive via STUN, relay via TURN).
4. SDP offer is sent to OpenAI's Realtime endpoint with the ephemeral token; OpenAI returns the SDP answer.
5. SRTP carries Opus audio both directions; the data channel carries JSON events (`response.create`, `input_audio_buffer.commit`, function-call deltas).

Where WebSocket wins: server agents that need to mediate tool calls, redact PII, write audit logs, or talk to phone networks. There the server keeps a WebSocket open to OpenAI and bridges audio to whichever transport the user uses.

## CallSphere implementation

CallSphere runs both transports in production:

- **Browser /demo** — WebRTC peer connection straight to OpenAI Realtime with an ephemeral key minted by our Next.js API route. No backend audio relay. End-to-end median first-audio is 380 ms.
- **Real Estate (OneRoof)** — Browser dials in over WebRTC; our Go gateway 1.23 keeps a WebSocket to Realtime so it can fan out tool calls to NATS and the 6-container pod (CRM writer, calendar, MLS lookup, SMS, audit, transcript).
- **Healthcare** — Same pattern, HIPAA-isolated. The WebSocket leg lives entirely inside the VPC so audit and PHI redaction happen before anything leaves us.

Across 37 agents and 90+ tools we touch 115+ database tables. The ability to keep WebRTC at the user edge while WebSocket carries the controlled-network legs is what lets us claim sub-second perceived latency without giving up SOC 2 controls.

## Code snippet (TypeScript, browser side)

```ts
async function startRealtime() {
  const tokenRes = await fetch("/api/realtime/token");
  const { client_secret } = await tokenRes.json();

const pc = new RTCPeerConnection();
  const audioEl = document.createElement("audio");
  audioEl.autoplay = true;
  pc.ontrack = (e) => { audioEl.srcObject = e.streams[0]; };

const mic = await navigator.mediaDevices.getUserMedia({ audio: true });
  pc.addTrack(mic.getAudioTracks()[0]);

const dc = pc.createDataChannel("oai-events");
  dc.onmessage = (e) => console.log("event", JSON.parse(e.data));

const offer = await pc.createOffer();
  await pc.setLocalDescription(offer);

const res = await fetch("[https://api.openai.com/v1/realtime?model=gpt-realtime](https://api.openai.com/v1/realtime?model=gpt-realtime)", {
    method: "POST",
    body: offer.sdp,
    headers: { Authorization: `Bearer ${client_secret}`, "Content-Type": "application/sdp" },
  });
  await pc.setRemoteDescription({ type: "answer", sdp: await res.text() });
}
```

## Build / migration steps

1. Mint short-lived ephemeral tokens server-side; never ship long-lived keys to the browser.
2. Stand up an `RTCPeerConnection` in the client; attach the mic; create a data channel for events.
3. Generate the SDP offer, POST it to the Realtime SDP endpoint, set the answer.
4. For controlled-network hops (telephony bridges, agent workers), open a WebSocket from your server to `wss://api.openai.com/v1/realtime`.
5. Wire tool calls and audit logging on the WebSocket leg only; keep the browser leg pure transport.
6. Add a `getStats` poller for the peer connection to track packet loss and jitter (we sample every 2 s).

## FAQ

**Can I run both at once?** Yes. CallSphere uses WebRTC client-side and WebSocket server-side; they connect to the same model.
**Does WebRTC work on iOS Safari?** Yes since iOS 11, and Safari 26.4 (March 2026) ships first-party WebTransport too.
**What about telephony?** Phone calls hit our SIP gateway, which bridges into a WebSocket Realtime session. Browser callers stay on WebRTC.
**Do I still need TURN?** Yes — about 8–10% of users live behind symmetric NATs that fail STUN.
**How long is a session?** OpenAI caps Realtime sessions at 30 minutes; renew before that.

## Sources

- [https://platform.openai.com/docs/guides/realtime](https://platform.openai.com/docs/guides/realtime)
- [https://developers.openai.com/api/docs/guides/realtime-webrtc](https://developers.openai.com/api/docs/guides/realtime-webrtc)
- [https://livekit.com/blog/why-webrtc-beats-websockets-for-voice-ai-agents](https://livekit.com/blog/why-webrtc-beats-websockets-for-voice-ai-agents)
- [https://community.openai.com/t/realtimeapi-webrtc-client-websocket-server-possible/1125810](https://community.openai.com/t/realtimeapi-webrtc-client-websocket-server-possible/1125810)

Try the WebRTC path live on our [/demo](/demo), see the bundle in [/pricing](/pricing), or start a [/trial](/trial).

---

Source: https://callsphere.ai/blog/vw1e-openai-realtime-webrtc-vs-websocket