Skip to content
AI Infrastructure
AI Infrastructure11 min read0 views

SRTP Key Exchange for HIPAA in 2026: DTLS-SRTP, SDES, and the Bridge Between Them

SDES + TLS works for SIP trunks; DTLS-SRTP wins for WebRTC. For HIPAA-bound AI voice that crosses both worlds, the SBC is where the translation happens. Here is the 2026 reality.

"We use SRTP" is half an answer. The real question is how the keys get exchanged: in the SDP body protected by signaling TLS, or on the media path through DTLS handshake. Both can satisfy HIPAA. Picking wrong creates a quiet vulnerability that no auditor will catch but a determined attacker will.

Background

flowchart LR
  Phone["PSTN caller"] --> Carrier["Carrier"]
  Carrier -- "SIP INVITE" --> SBC["Session Border Controller"]
  SBC -- "SIP" --> PBX["Twilio / Asterisk"]
  PBX -- "RTP · Opus" --> Bridge["AI Voice Gateway"]
  Bridge --> AI["OpenAI Realtime"]
  AI --> Bridge
  Bridge --> PBX
CallSphere reference architecture

Secure RTP (RFC 3711) encrypts and authenticates RTP and RTCP packets using AES-CTR or AES-GCM with HMAC-SHA1 or HMAC-SHA256. The standard does not specify how the master key gets to both endpoints; that is the job of a separate key exchange protocol.

Two main options dominate in 2026. SDES (RFC 4568) carries the SRTP master key inside the SDP body of SIP signaling, base64-encoded in an a=crypto: attribute. It works only when the SIP signaling is protected with TLS - otherwise the key flies in cleartext. DTLS-SRTP (RFC 5763 and RFC 5764) runs a DTLS handshake on the media path itself; the SRTP keys are derived from the DTLS master secret. The SDP carries only DTLS fingerprints, not keys.

SDES is dominant on traditional SIP trunks. DTLS-SRTP is mandatory in WebRTC. For HIPAA-bound AI voice products, both are acceptable encryption mechanisms; the trick is making sure the boundary between SIP and WebRTC translates correctly.

Technical deep-dive

The SDES exchange in an INVITE looks like:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
m=audio 49170 RTP/SAVP 0 8
a=crypto:1 AES_CM_128_HMAC_SHA1_80
  inline:DhxqDt+Pw/3qD6FnJ4kEwImGRJ5lTZUjxVrn5Q==
a=crypto:2 AES_256_CM_HMAC_SHA1_80
  inline:S2RXxN+wKxIhLcF7HxJzC5pH4QrJzG7lA9XbF...

If the carrying INVITE is over UDP/5060 cleartext, this key is visible in any packet capture. SDES is only safe over SIP/TLS or SIPS.

DTLS-SRTP is different:

m=audio 49170 UDP/TLS/RTP/SAVPF 111
a=fingerprint:sha-256 D2:9A:5A:7E:...:6F
a=setup:actpass
a=ice-ufrag:F7gI
a=ice-pwd:x9cml/YzichV2+XlhiMu8g

Only the certificate fingerprint travels in SDP. The actual DTLS handshake happens on the RTP port; the resulting master secret is run through SRTP-extractor to derive client/server SRTP keys. Even if signaling is fully cleartext, the media is still safe (assuming the fingerprint in the SDP was authentic; SIP-over-TLS still helps, but it is not load-bearing).

For HIPAA the key control is "encryption in transit for ePHI". Both approaches satisfy the rule. The auditor questions to expect:

  • What cipher suites are negotiated? AES-CM is acceptable; AES-GCM preferred.
  • Is signaling protected? TLS 1.2+ minimum, TLS 1.3 strongly preferred.
  • Are keys logged anywhere? They should not be. Pen-test the bridge for key leakage in logs.
  • How is the SBC configured at the SIP/WebRTC boundary? It must terminate one and originate the other; cleartext mid-bridge is a violation.

CallSphere implementation

CallSphere is HIPAA-aligned and SOC 2-aligned. Every leg uses Twilio Programmable Voice with SIP/TLS and SRTP enforced; SDES with TLS protects the keys. For Healthcare AI on FastAPI :8084 the WebSocket bridge to OpenAI Realtime runs WSS (TLS 1.3); RTP-equivalent media frames inside the WebSocket are TLS-protected end-to-end. Sales Calling AI runs 5 concurrent outbound calls per tenant, all SRTP. After-Hours AI uses Twilio simul call+SMS with a 120-second timeout where every leg is encrypted. Across 37 agents, 90+ tools, 115+ DB tables, $149/$499/$1499 pricing, 14-day trial, and 22% affiliate, the encryption posture is uniform per vertical with quarterly cipher-suite review and audit-log retention.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Implementation steps

  1. Force SIP/TLS on every external trunk; reject UDP/5060 entirely. Use TLS 1.2 minimum, prefer 1.3.
  2. Enforce SRTP on the SDP answer; reject calls without an a=crypto line on TLS-protected SIP.
  3. For browser-to-AI sessions, use DTLS-SRTP exclusively (WebRTC mandate).
  4. At the SBC/bridge between SIP and WebRTC, terminate SDES on the SIP side and originate DTLS-SRTP on the WebRTC side; never run cleartext between them.
  5. Pin AES-256 GCM where supported (RFC 7714); fall back to AES-128 CM only for legacy interop.
  6. Audit your logs - no SDP body containing key material should ever appear in plaintext logs.
  7. Run a packet capture from outside; verify no SRTP key is visible and that DTLS handshake actually happens.
  8. Schedule quarterly cipher-suite review and annual third-party pen test.

FAQ

Is SDES less secure than DTLS-SRTP for HIPAA? Functionally similar when SIP/TLS is enforced. DTLS-SRTP is more robust because it does not depend on signaling-layer encryption.

Does HIPAA require a specific key exchange? No. HIPAA Security Rule 164.312(e)(1) requires "encryption in transit when reasonable and appropriate" without specifying mechanism. SDES + TLS is reasonable.

What about ZRTP or MIKEY? ZRTP is interesting (peer-to-peer, no PKI) but rare in 2026. MIKEY is mostly IMS/3GPP. SDES and DTLS-SRTP cover 99% of real deployments.

Will downgrade attacks affect us? Configure your SBC to refuse non-TLS signaling and non-SRTP media; that closes the obvious downgrade vectors.

Are there forward-secrecy concerns with SDES? Yes. SDES keys persist in the SDP record; a compromise of session storage exposes past sessions. DTLS-SRTP with ephemeral DHE provides forward secrecy.

Sources

Start a 14-day trial on a HIPAA-aligned voice stack, see pricing for $149/$499/$1499 tiers, or contact us about HIPAA compliance for AI voice.

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

AI Infrastructure

HIPAA Pen-Test and Risk Assessment for AI Voice in 2026

The 2024 NPRM proposes mandatory penetration tests every 12 months and vulnerability scans every 6 months. Here is how an AI voice agent should be tested in 2026.

Agentic AI

Safety Evaluation for Agents: Jailbreak, Prompt Injection, and Tool-Misuse Test Suites in 2026

How to build a safety eval pipeline that runs known jailbreak corpora, prompt-injection attacks, and tool-misuse scenarios on every release — and gates merges on it.

Agentic AI

Input and Output Guardrails in the OpenAI Agents SDK: A Production Pattern (2026)

Stop the agent BEFORE it does the wrong thing. How to wire input and output guardrails in the OpenAI Agents SDK with cheap classifiers and an eval suite that proves they work.

AI Engineering

NeMo Guardrails vs LlamaGuard: Side-by-Side Comparison in 2026

NeMo Guardrails and LlamaGuard solve overlapping problems with different architectures. The trade-offs once you push them past 100 RPS in production agent stacks.

AI Infrastructure

Prompt Injection Defense Patterns for April 2026 Agent Stacks

Prompt injection is still the top open agent security risk in 2026. The five defense patterns that work, and the two that do not — with real attack-and-defend examples.

AI Infrastructure

De-Identifying AI Conversation Logs: Safe Harbor vs Expert Determination

AI voice and chat logs are a treasure trove for analytics and a liability landmine for HIPAA. Here is how the two de-identification methods at 45 CFR 164.514 actually apply to multi-turn AI transcripts.