By Sagar Shankaran, Founder of CallSphere
WebRTC dominates browser AI voice for a reason - UDP under the hood. But SIP over WebSocket still wins for click-to-call inside a SaaS app where you control both ends. Here is the 2026 picture.
Key takeaways
WebRTC won browser voice. But SIP over WebSocket (sipws) is still the right answer for embedding a phone inside a SaaS app where you do not need NAT traversal and you do need a familiar SIP trunk. The choice in 2026 is not WebRTC vs sipws but WebRTC vs WebRTC + sipws-control plane.
flowchart TD
Out[Outbound campaign] --> Twilio[Twilio Voice API]
Twilio --> STIR[STIR/SHAKEN attestation]
STIR --> Carrier[Originating carrier]
Carrier --> Term[Terminating carrier]
Term --> Recipient[Recipient phone]
Recipient --> Webhook[/voice webhook/]
Webhook --> Agent[AI sales agent]SIP over WebSocket (RFC 7118) was specified in 2014 to let SIP user agents run inside browsers. The browser opens a WSS connection to a SIP-over-WebSocket-aware SIP server (Kamailio, OpenSIPS, Asterisk PJSIP), and SIP messages flow over that WebSocket as plain SIP text. Media still uses WebRTC (DTLS-SRTP over UDP) - sipws is a signaling transport only.
For AI voice in 2026 the canonical browser flow is WebRTC + Realtime API direct: the browser establishes a peer connection to OpenAI's edge, audio flows over DTLS-SRTP, no SIP involved. But for products that need PSTN dial-out, transfer to a human, or an existing SIP-trunk billing relationship, sipws is still useful. JsSIP, SIP.js, and Twilio Voice SDK all use sipws (or its WebRTC-bridge equivalent) for signaling.
A sipws REGISTER from a browser:
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
REGISTER sip:callsphere.ai SIP/2.0
Via: SIP/2.0/WSS df7jal23ls0d.invalid;branch=z9hG4bK-5
From: <sip:alice@callsphere.ai>;tag=abc123
To: <sip:alice@callsphere.ai>
Call-ID: 1234@browser
CSeq: 1 REGISTER
Contact: <sip:alice@df7jal23ls0d.invalid;transport=ws>
Expires: 600
The "df7jal23ls0d.invalid" hostname is a placeholder; sipws clients use the WebSocket-Sec-Key as a stand-in for the network address since browsers do not expose their own IP/port to JavaScript. The server tracks the WebSocket connection internally for routing.
For AI voice agent dial-out from the browser:
INVITE sip:+18001234567@callsphere.ai SIP/2.0
Via: SIP/2.0/WSS df7jal23ls0d.invalid;branch=z9hG4bK-7
From: <sip:alice@callsphere.ai>;tag=abc123
To: <sip:+18001234567@callsphere.ai>
Contact: <sip:alice@df7jal23ls0d.invalid;transport=ws>
Content-Type: application/sdp
v=0
m=audio 9 UDP/TLS/RTP/SAVPF 111
a=rtcp:9 IN IP4 0.0.0.0
a=fingerprint:sha-256 D2:9A:...:38
a=setup:actpass
a=ice-ufrag:F7gI
Note the SDP advertises DTLS-SRTP for media even though the SIP signaling is over WSS. The SBC on the server side bridges sipws-to-SIP and WebRTC-to-SRTP-on-trunk.
CallSphere uses Twilio Programmable Voice across all six verticals. For browser-initiated calls (admin dashboard click-to-call, demo widget) we use Twilio Voice SDK which handles sipws-equivalent signaling internally and presents a clean JavaScript API. For Healthcare AI on FastAPI :8084 we never expose sipws to end users; calls always come through Twilio's PSTN edge or via Twilio Voice SDK. Sales Calling AI's 5 concurrent outbound calls per tenant fire from the server side, no browser sipws needed. After-Hours AI uses simul call+SMS to on-call staff with a 120-second timeout, server-originated. Across 37 agents, 90+ tools, 115+ DB tables, HIPAA + SOC 2 alignment, $149/$499/$1499 pricing, 14-day trial, and 22% affiliate, the policy is "Voice SDK for browser, REST + Twilio for server-side, no DIY sipws stack".
ws, or FreeSWITCH with mod_sofia ws transport.Is sipws faster than WebRTC for AI voice? No. The signaling transport is irrelevant to media latency. WebRTC media is the same regardless.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Why not use sipws + WebRTC media? That is exactly what JsSIP/SIP.js do. The choice is whether you build the JsSIP-style stack yourself or use an SDK like Twilio Voice SDK that abstracts it.
Does sipws support SIP REFER for transfer? Yes. Transfer behavior is identical to wired SIP; the WebSocket is just a transport.
What about WebTransport? Experimental in 2026. Sipws over WebTransport is on draft work in the IETF SIPCORE working group but no production deployments yet.
Are there NAT traversal issues with sipws? Less than wired SIP because the browser opens an outbound WSS to your server, traversing most NATs cleanly. Media still needs ICE.
Start a 14-day trial on a browser-ready voice stack, see pricing, or contact us about embedded AI voice for SaaS apps.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
A founder's guide to the female voice generator landscape: AI female voices, Japanese voices, robot voices, and how CallSphere ships 57+ voices live.
BrowserStack offers 30,000+ real devices; Sauce Labs ships deep Appium automation. Here is how AI voice agent teams use both for WebRTC mobile QA in 2026.
WebTransport is Baseline as of March 2026. Media Over QUIC ships in production within the year. Here is what changes for AI voice agents — and what stays the same.
MOS 4.3+ is the band where AI voice feels human. Drop below 3.6 and conversations break. Here is how to measure, improve, and alert on MOS in production AI voice using G.711, Opus, and the underlying packet loss / jitter / latency math.
On May 4 2026 OpenAI published its Realtime stack rebuild — split-relay plus transceiver edge. Here is what changed and what it means for production voice agents.
Evaluate build vs buy for enterprise calling platforms. Architecture patterns, SIP infrastructure, WebRTC, cost models, and timeline estimates for custom telephony systems.
© 2026 CallSphere LLC. All rights reserved.