---
title: "End-to-End Encryption (E2EE) for AI Voice Agents with SFrame in 2026"
description: "SFrame plus Encoded Transform finally makes WebRTC E2EE shippable for AI voice. Here is the architecture, the SFU compromises, and the production pattern."
canonical: https://callsphere.ai/blog/vw3e-webrtc-e2ee-sframe-ai-voice-agents-2026
category: "AI Infrastructure"
tags: ["WebRTC", "E2EE", "SFrame", "Voice AI", "Security"]
author: "CallSphere Team"
published: 2026-03-19T00:00:00.000Z
updated: 2026-05-07T09:59:23.951Z
---

# End-to-End Encryption (E2EE) for AI Voice Agents with SFrame in 2026

> SFrame plus Encoded Transform finally makes WebRTC E2EE shippable for AI voice. Here is the architecture, the SFU compromises, and the production pattern.

> Plain WebRTC is hop-by-hop encrypted. Once you put an SFU in the middle, the SFU sees plaintext audio. SFrame plus Encoded Transform fixes that — and 2026 is the first year you can rely on it cross-browser.

## Why E2EE matters for AI voice

DTLS-SRTP encrypts the leg between the browser and the next hop. If that next hop is an SFU (LiveKit, mediasoup, Janus, Pion-based), the SFU decrypts every frame to forward it. Regulators are fine with that as long as the SFU is inside your BAA. They are not fine with it for cross-tenant SaaS.

E2EE pushes encryption up one layer: frames are encrypted by the publisher, decrypted only by the legitimate subscribers, and the SFU only sees ciphertext. That is the exact threat model HIPAA, SOC 2 CC6.7, and most financial-services audits actually want.

There is a second reason that has gotten louder in 2026: AI voice recording. Customers want proof that no third-party SFU operator could ever replay or train on their conversations. SFrame plus a customer-controlled key gives you that proof.

## Architecture: SFrame on top of WebRTC

```mermaid
flowchart LR
  P[Publisher] -- encrypt SFrame --> SFU
  S1[Subscriber 1]  P
  KMS -. distributes keys .-> S1
  KMS -. distributes keys .-> S2
```

SFrame (RFC 9605) defines a per-frame AEAD wrapping with a sender-chosen key id. Keys ride a separate channel — usually MLS (Messaging Layer Security) or an out-of-band group key agreement. The SFU forwards the wrapped frame untouched.

For AI voice agents the AI itself is just another subscriber that holds a key, so the agent can transcribe and respond. The SFU still cannot. The agent's container has access to the key only inside the customer's trusted VPC; the key never crosses tenants.

## CallSphere implementation

CallSphere ships SFrame on top of OpenAI Realtime in two patterns:

- **Healthcare** — Patient browser → SFU (HIPAA VPC) → clinician browser + AI triage agent. The SFU never sees plaintext PHI; the AI agent runs inside the BAA and is a key holder. Recording happens on a dedicated key-holding recorder pod. See [/industries/healthcare](/industries/healthcare).
- **Real Estate (OneRoof)** — Buyer + agent + AI dialer share an SFrame group. The Pion Go gateway 1.23 still bridges to PSTN, but the bridge is itself a key holder. The 6-container pod (CRM, MLS, calendar, SMS, audit, transcript) runs against decrypted audio inside the trusted boundary only. See [/industries/real-estate](/industries/real-estate).

Across 37 agents, 90+ tools, and 115+ database tables we treat E2EE as the default for any vertical where the customer asks for SOC 2 + HIPAA evidence. Pricing remains $149/$499/$1499; E2EE is on the Pro and Enterprise tiers, with a 14-day trial across all six verticals (real estate, healthcare, behavioral health, legal, salon, insurance). Affiliates earn 22% — see [/affiliate](/affiliate).

## Code snippet (Worker side)

```ts
// e2ee-worker.js
let cipher;

onmessage = async ({ data }) => {
  if (data.op === "setKey") {
    cipher = await crypto.subtle.importKey(
      "raw", data.key, { name: "AES-GCM" }, false, ["encrypt", "decrypt"],
    );
  }
};

onrtctransform = (event) => {
  const { readable, writable, options } = event.transformer;
  const direction = options.role; // "sender" or "receiver"
  readable.pipeThrough(new TransformStream({
    async transform(frame, controller) {
      const iv = crypto.getRandomValues(new Uint8Array(12));
      if (direction === "sender") {
        const ct = await crypto.subtle.encrypt({ name: "AES-GCM", iv }, cipher, frame.data);
        // Build SFrame header per RFC 9605 (kid + counter), prepend, then enqueue
        frame.data = packSFrame(iv, ct);
      } else {
        const { iv2, ct } = unpackSFrame(frame.data);
        frame.data = await crypto.subtle.decrypt({ name: "AES-GCM", iv: iv2 }, cipher, ct);
      }
      controller.enqueue(frame);
    },
  })).pipeTo(writable);
};
```

## Build steps

1. Decide your group-key story first: MLS, Signal-style ratchet, or a simpler shared symmetric key issued by your auth service. MLS is the industry direction.
2. Stand up a key server inside your VPC; never let the SFU see the keys.
3. Wire `RTCRtpScriptTransform` on every sender and receiver; both ends must implement the same SFrame profile.
4. Provision your SFU to be SFrame-aware so it does not strip headers. LiveKit, mediasoup, and Janus all have current implementations.
5. Add a secondary path for the AI agent so it joins the group as a key-holding peer, not as a server-side tap.
6. Audit `getStats` carefully — packet sizes leak even when contents do not.
7. Plan key rotation: rotate the group key on every join/leave for forward secrecy.

## Common pitfalls

- **Forgetting simulcast layers** — SFU may rewrite layer ids; SFrame encrypts payload only, but you must ensure the SFU does not touch payload bytes.
- **Mismatched header parsing** — RFC 9605 has nuanced sender id and counter encoding. A single off-by-one breaks the entire group.
- **Recording outside the trust boundary** — if your recording service does not hold a key, you record ciphertext you cannot replay. Plan for a dedicated recorder participant.
- **Performance budget** — AES-GCM on a 240-byte Opus frame takes ~80 µs on M2; on entry-level Android ARMv8 it is closer to 600 µs. Profile both ends.
- **Key compromise without rotation** — a single static key shared across calls is the single most common audit failure.

## FAQ

**Does E2EE break my AI agent?** Only if you forget to give the agent a key. Treat the agent like any other party.

**Can the SFU still rewrite simulcast layers?** Yes — SFrame intentionally leaves header fields the SFU needs for forwarding decisions in the clear.

**Is this the same as DTLS-SRTP?** No. DTLS-SRTP is hop-by-hop. SFrame is end-to-end on top of it.

**Will Safari support it?** Encoded Transform is in Safari TP cycles and on track for Safari 27.

**What about latency?** SFrame adds 0.1–1 ms per frame on modern hardware. Below the audible threshold.

**Is there an open-source library?** Yes — sframe-js, Janus's SFrame plugin, and LiveKit's built-in E2EE module are all production-ready.

**Does E2EE survive simulcast?** Yes — SFrame encrypts payload only; SFU layer decisions stay outside the encrypted envelope.

**What if a participant cannot get the key?** They cannot decrypt. Audit your key-distribution path; the failure mode is silent garbage audio.

## Production playbook for AI voice teams in 2026

Three rules from running E2EE in production for a year:

1. **Recorder-as-participant.** Do not tap audio at the SFU. Run a dedicated recorder pod that joins the session, holds a key, and writes encrypted-at-rest WebM to your bucket.
2. **Rotate on every join/leave.** Forward secrecy is the whole point. The cost is one extra MLS round per change; the audit benefit is enormous.
3. **Log key-id transitions.** Every SFrame frame carries a key id. Persist transitions per session to your audit table; SOC 2 reviewers will ask for this.

Most teams that ship E2EE successfully also write a small "is the key flowing?" liveness check into their UI — a tiny indicator that turns red the moment a participant drifts off the current key. That single feature has caught more configuration bugs than any test.

## Sources

- [https://datatracker.ietf.org/doc/rfc9605/](https://datatracker.ietf.org/doc/rfc9605/)
- [https://webrtchacks.com/true-end-to-end-encryption-with-webrtc-insertable-streams/](https://webrtchacks.com/true-end-to-end-encryption-with-webrtc-insertable-streams/)
- [https://www.meetecho.com/blog/janus-e2ee-sframe/](https://www.meetecho.com/blog/janus-e2ee-sframe/)
- [https://blog.mozilla.org/webrtc/end-to-end-encrypt-webrtc-in-all-browsers/](https://blog.mozilla.org/webrtc/end-to-end-encrypt-webrtc-in-all-browsers/)
- [https://www.digitalsamba.com/blog/the-power-of-e2ee-in-webrtc-unlocking-a-world-of-secure-communication](https://www.digitalsamba.com/blog/the-power-of-e2ee-in-webrtc-unlocking-a-world-of-secure-communication)
- [https://medooze.medium.com/sframe-js-end-to-end-encryption-for-webrtc-f9a83a997d6d](https://medooze.medium.com/sframe-js-end-to-end-encryption-for-webrtc-f9a83a997d6d)

Want a HIPAA-grade voice agent? Start a [/trial](/trial), see [/industries/healthcare](/industries/healthcare), or read [/pricing](/pricing).

---

Source: https://callsphere.ai/blog/vw3e-webrtc-e2ee-sframe-ai-voice-agents-2026