By Sagar Shankaran, Founder of CallSphere
EVS delivers up to 20 kHz audio bandwidth and superwideband fidelity for VoLTE/VoNR calls, but most AI voice stacks transcode it down to 16 kHz before the model ever sees it. Here is the 2026 reality.
Key takeaways
The Enhanced Voice Services codec carries the cleanest audio a US mobile carrier can deliver: 20 kHz fullband, packet-loss-aware, jitter-resilient. The bitter joke for AI voice builders in 2026 is that almost none of that quality reaches the model. The bridge transcodes it to PCMU on the way to the GPU.
flowchart LR
Phone["PSTN caller"] --> Carrier["Carrier"]
Carrier -- "SIP INVITE" --> SBC["Session Border Controller"]
SBC -- "SIP" --> PBX["Twilio / Asterisk"]
PBX -- "RTP · Opus" --> Bridge["AI Voice Gateway"]
Bridge --> AI["OpenAI Realtime"]
AI --> Bridge
Bridge --> PBXEVS was standardized by 3GPP in September 2014 as the successor to AMR-WB. It is the official codec for VoLTE and VoNR, supports narrowband, wideband (16 kHz), super-wideband (32 kHz), and fullband (48 kHz) sampling, and was specifically designed for mobile networks: channel-aware coding, packet-loss concealment, and jitter buffer integration that AMR-WB never had. As of 2024, around 200 smartphone models support EVS, and most major US carriers offer it on VoLTE.
For AI voice, EVS is appealing on paper. Larger spectrum means better fricatives ("s", "f", "th") which improves ASR accuracy on names, alphanumerics, and noisy speakers. Channel-aware coding masks the small packet losses that make Whisper or Deepgram hallucinate.
The catch is that EVS rarely survives the trip from the phone to your model. A typical AI inbound call from a mobile takes this path:
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
[Mobile UE] --EVS SWB 24.4 kbps--> [Carrier IMS / VoLTE core]
--EVS or AMR-WB--> [SIP trunk to your provider (Twilio/Telnyx/Bandwidth)]
--PCMU 8 kHz 64 kbps--> [Your SIP gateway]
--L16 16 kHz mono--> [WebSocket bridge]
--Opus 48 kHz mono--> [OpenAI Realtime API]
The first transcoding step is the killer. Most US SIP trunks present PCMU (G.711 mu-law, 8 kHz) on the SDP offer because every legacy PBX and TDM gateway downstream supports it. Even when you ask for Opus or G.722, the carrier media gateway often falls back to PCMU because the originating leg was EVS and the gateway prefers a known transcode path. You inherit narrowband audio whether you wanted it or not.
If you can negotiate G.722 (HD voice, 16 kHz wideband) end-to-end, ASR accuracy jumps measurably on names and digits. If you can negotiate Opus, you also gain dynamic bitrate, but very few US carriers will hand you Opus on a SIP trunk in 2026. The realistic upgrade path is G.722 plus careful SDP negotiation.
<!-- Twilio TwiML hint that prefers wideband -->
<Response>
<Connect>
<Stream url="wss://bridge.callsphere.ai/realtime">
<Parameter name="codec" value="audio/g722"/>
</Stream>
</Connect>
</Response>
CallSphere terminates every inbound and outbound call on Twilio Programmable Voice across all six verticals (Healthcare AI, Real Estate AI, Sales Calling AI, Salon AI, IT Helpdesk AI, After-Hours AI). Twilio offers PCMU and G.711a on SIP trunks plus Opus on Voice SDK and Conversation Relay. For Healthcare AI on FastAPI :8084 we forward audio over WebSocket as L16 16 kHz to OpenAI Realtime; the WebSocket bridge upsamples PCMU to 16 kHz with a sinc filter so the model sees consistent sample rate. Sales Calling AI fires up to 5 concurrent outbound calls per tenant; After-Hours AI rings on-call staff with a Twilio simul call+SMS pattern and a 120-second timeout. The 37 agents and 90+ tools across 115+ DB tables run on $149/$499/$1499 plans with a 14-day trial and 22% affiliate. EVS does not enter the picture for us yet, but G.722 negotiation is on the 2026 roadmap once Twilio exposes it on more SKUs.
Can OpenAI Realtime accept EVS directly? No. Realtime accepts PCM 16-bit, G.711, and Opus. EVS would need a transcoder upstream.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Is G.722 enough, or do I need fullband? For ASR on names, digits, and short utterances, G.722 (50 Hz to 7 kHz) recovers most of the gain. Fullband matters more for music-on-hold, IVR audio quality, and human listening, less for AI accuracy.
Does Twilio expose EVS at all? Not on standard SIP trunks. Some carrier interconnects negotiate AMR-WB; EVS is rare on the egress side.
Will EVS adoption grow for AI in 2026? Likely yes for direct mobile-to-AI products that bypass legacy SIP, less so for traditional contact-center deployments.
How much does the codec actually move ASR accuracy? On clean speech, 1-3% absolute WER. On accented speech, names, and noisy environments, 5-10% absolute. The longer the utterance, the larger the gap.
Start a 14-day trial to hear our codec stack in production, see pricing for $149/$499/$1499 tiers, or contact us about wideband audio negotiation for high-stakes voice AI.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
A founder's guide to the female voice generator landscape: AI female voices, Japanese voices, robot voices, and how CallSphere ships 57+ voices live.
MOS 4.3+ is the band where AI voice feels human. Drop below 3.6 and conversations break. Here is how to measure, improve, and alert on MOS in production AI voice using G.711, Opus, and the underlying packet loss / jitter / latency math.
Texas SB 1188 requires US-resident EHRs from January 1, 2026; Nevada's consumer-health-data law constrains health data; Colorado AI Act takes effect June 30, 2026. AI voice agents must architect for state-by-state data localization.
When your AI voice agent gets one-way audio, missed DTMF, or codec mismatch, sngrep and Wireshark are still the fastest path to root cause in 2026. Here is the playbook.
PCI DSS 4.0.1 future-dated requirements went mandatory March 31, 2025. AI voice agents that take card payments on behalf of healthcare providers — copays, deductibles, payment plans — must meet 12 requirements with DTMF masking and scope reduction.
Transcoding RTP to WebSocket is more CPU-intensive than people expect. For AI voice in 2026, where you place the transcode (edge near the carrier vs central near the model) decides your cost-per-minute.
© 2026 CallSphere LLC. All rights reserved.