By Sagar Shankaran, Founder of CallSphere
SFrame plus Encoded Transform finally makes WebRTC E2EE shippable for AI voice. Here is the architecture, the SFU compromises, and the production pattern.
Key takeaways
Plain WebRTC is hop-by-hop encrypted. Once you put an SFU in the middle, the SFU sees plaintext audio. SFrame plus Encoded Transform fixes that — and 2026 is the first year you can rely on it cross-browser.
DTLS-SRTP encrypts the leg between the browser and the next hop. If that next hop is an SFU (LiveKit, mediasoup, Janus, Pion-based), the SFU decrypts every frame to forward it. Regulators are fine with that as long as the SFU is inside your BAA. They are not fine with it for cross-tenant SaaS.
E2EE pushes encryption up one layer: frames are encrypted by the publisher, decrypted only by the legitimate subscribers, and the SFU only sees ciphertext. That is the exact threat model HIPAA, SOC 2 CC6.7, and most financial-services audits actually want.
There is a second reason that has gotten louder in 2026: AI voice recording. Customers want proof that no third-party SFU operator could ever replay or train on their conversations. SFrame plus a customer-controlled key gives you that proof.
```mermaid flowchart LR P[Publisher] -- encrypt SFrame --> SFU S1[Subscriber 1] <-- decrypt SFrame -- SFU S2[Subscriber 2] <-- decrypt SFrame -- SFU KMS[Group key server] -. distributes keys .-> P KMS -. distributes keys .-> S1 KMS -. distributes keys .-> S2 ```
SFrame (RFC 9605) defines a per-frame AEAD wrapping with a sender-chosen key id. Keys ride a separate channel — usually MLS (Messaging Layer Security) or an out-of-band group key agreement. The SFU forwards the wrapped frame untouched.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
For AI voice agents the AI itself is just another subscriber that holds a key, so the agent can transcribe and respond. The SFU still cannot. The agent's container has access to the key only inside the customer's trusted VPC; the key never crosses tenants.
CallSphere ships SFrame on top of OpenAI Realtime in two patterns:
Across 37 agents, 90+ tools, and 115+ database tables we treat E2EE as the default for any vertical where the customer asks for SOC 2 + HIPAA evidence. Pricing remains $149/$499/$1499; E2EE is on the Pro and Enterprise tiers, with a 14-day trial across all six verticals (real estate, healthcare, behavioral health, legal, salon, insurance). Affiliates earn 22% — see /affiliate.
```ts // e2ee-worker.js let cipher;
onmessage = async ({ data }) => { if (data.op === "setKey") { cipher = await crypto.subtle.importKey( "raw", data.key, { name: "AES-GCM" }, false, ["encrypt", "decrypt"], ); } };
onrtctransform = (event) => { const { readable, writable, options } = event.transformer; const direction = options.role; // "sender" or "receiver" readable.pipeThrough(new TransformStream({ async transform(frame, controller) { const iv = crypto.getRandomValues(new Uint8Array(12)); if (direction === "sender") { const ct = await crypto.subtle.encrypt({ name: "AES-GCM", iv }, cipher, frame.data); // Build SFrame header per RFC 9605 (kid + counter), prepend, then enqueue frame.data = packSFrame(iv, ct); } else { const { iv2, ct } = unpackSFrame(frame.data); frame.data = await crypto.subtle.decrypt({ name: "AES-GCM", iv: iv2 }, cipher, ct); } controller.enqueue(frame); }, })).pipeTo(writable); }; ```
Does E2EE break my AI agent? Only if you forget to give the agent a key. Treat the agent like any other party.
Can the SFU still rewrite simulcast layers? Yes — SFrame intentionally leaves header fields the SFU needs for forwarding decisions in the clear.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Is this the same as DTLS-SRTP? No. DTLS-SRTP is hop-by-hop. SFrame is end-to-end on top of it.
Will Safari support it? Encoded Transform is in Safari TP cycles and on track for Safari 27.
What about latency? SFrame adds 0.1–1 ms per frame on modern hardware. Below the audible threshold.
Is there an open-source library? Yes — sframe-js, Janus's SFrame plugin, and LiveKit's built-in E2EE module are all production-ready.
Does E2EE survive simulcast? Yes — SFrame encrypts payload only; SFU layer decisions stay outside the encrypted envelope.
What if a participant cannot get the key? They cannot decrypt. Audit your key-distribution path; the failure mode is silent garbage audio.
Three rules from running E2EE in production for a year:
Most teams that ship E2EE successfully also write a small "is the key flowing?" liveness check into their UI — a tiny indicator that turns red the moment a participant drifts off the current key. That single feature has caught more configuration bugs than any test.
Want a HIPAA-grade voice agent? Start a /trial, see /industries/healthcare, or read /pricing.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
A founder's guide to texto a voz (text-to-speech in Spanish): LATAM vs Castilian voices, free options, and how CallSphere ships Spanish agents.
A founder's guide to the female voice generator landscape: AI female voices, Japanese voices, robot voices, and how CallSphere ships 57+ voices live.
A founder's guide to the Siri voice generator landscape: how AI voice cloning works, what is legal, and how CallSphere uses 57+ voices in production.
A founder's guide to AI voice assistants for ecommerce: customer service, order lookup, and how CallSphere fits in versus virtual receptionists.
Robot text to speech in 2026: how I pick TTS APIs, when robotic voices help, and how CallSphere ships 57+ language voice agents. Hands-on guide.
The customer support specialist role in 2026 is half human, half AI. Here is what the job looks like, the AI tools that pair with it, and how we ship it.
© 2026 CallSphere LLC. All rights reserved.