EVS Codec for HD Voice and AI Quality in 2026: What Actually Reaches the Model
EVS delivers up to 20 kHz audio bandwidth and superwideband fidelity for VoLTE/VoNR calls, but most AI voice stacks transcode it down to 16 kHz before the model ever sees it. Here is the 2026 reality.
The Enhanced Voice Services codec carries the cleanest audio a US mobile carrier can deliver: 20 kHz fullband, packet-loss-aware, jitter-resilient. The bitter joke for AI voice builders in 2026 is that almost none of that quality reaches the model. The bridge transcodes it to PCMU on the way to the GPU.
Background
flowchart LR
Phone["PSTN caller"] --> Carrier["Carrier"]
Carrier -- "SIP INVITE" --> SBC["Session Border Controller"]
SBC -- "SIP" --> PBX["Twilio / Asterisk"]
PBX -- "RTP · Opus" --> Bridge["AI Voice Gateway"]
Bridge --> AI["OpenAI Realtime"]
AI --> Bridge
Bridge --> PBXEVS was standardized by 3GPP in September 2014 as the successor to AMR-WB. It is the official codec for VoLTE and VoNR, supports narrowband, wideband (16 kHz), super-wideband (32 kHz), and fullband (48 kHz) sampling, and was specifically designed for mobile networks: channel-aware coding, packet-loss concealment, and jitter buffer integration that AMR-WB never had. As of 2024, around 200 smartphone models support EVS, and most major US carriers offer it on VoLTE.
For AI voice, EVS is appealing on paper. Larger spectrum means better fricatives ("s", "f", "th") which improves ASR accuracy on names, alphanumerics, and noisy speakers. Channel-aware coding masks the small packet losses that make Whisper or Deepgram hallucinate.
Technical deep-dive
The catch is that EVS rarely survives the trip from the phone to your model. A typical AI inbound call from a mobile takes this path:
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
[Mobile UE] --EVS SWB 24.4 kbps--> [Carrier IMS / VoLTE core]
--EVS or AMR-WB--> [SIP trunk to your provider (Twilio/Telnyx/Bandwidth)]
--PCMU 8 kHz 64 kbps--> [Your SIP gateway]
--L16 16 kHz mono--> [WebSocket bridge]
--Opus 48 kHz mono--> [OpenAI Realtime API]
The first transcoding step is the killer. Most US SIP trunks present PCMU (G.711 mu-law, 8 kHz) on the SDP offer because every legacy PBX and TDM gateway downstream supports it. Even when you ask for Opus or G.722, the carrier media gateway often falls back to PCMU because the originating leg was EVS and the gateway prefers a known transcode path. You inherit narrowband audio whether you wanted it or not.
If you can negotiate G.722 (HD voice, 16 kHz wideband) end-to-end, ASR accuracy jumps measurably on names and digits. If you can negotiate Opus, you also gain dynamic bitrate, but very few US carriers will hand you Opus on a SIP trunk in 2026. The realistic upgrade path is G.722 plus careful SDP negotiation.
<!-- Twilio TwiML hint that prefers wideband -->
<Response>
<Connect>
<Stream url="wss://bridge.callsphere.ai/realtime">
<Parameter name="codec" value="audio/g722"/>
</Stream>
</Connect>
</Response>
CallSphere implementation
CallSphere terminates every inbound and outbound call on Twilio Programmable Voice across all six verticals (Healthcare AI, Real Estate AI, Sales Calling AI, Salon AI, IT Helpdesk AI, After-Hours AI). Twilio offers PCMU and G.711a on SIP trunks plus Opus on Voice SDK and Conversation Relay. For Healthcare AI on FastAPI :8084 we forward audio over WebSocket as L16 16 kHz to OpenAI Realtime; the WebSocket bridge upsamples PCMU to 16 kHz with a sinc filter so the model sees consistent sample rate. Sales Calling AI fires up to 5 concurrent outbound calls per tenant; After-Hours AI rings on-call staff with a Twilio simul call+SMS pattern and a 120-second timeout. The 37 agents and 90+ tools across 115+ DB tables run on $149/$499/$1499 plans with a 14-day trial and 22% affiliate. EVS does not enter the picture for us yet, but G.722 negotiation is on the 2026 roadmap once Twilio exposes it on more SKUs.
Implementation steps
- Inspect your SIP provider's offered codecs in the SDP body of the inbound INVITE; PCMU is almost always primary.
- If your provider supports G.722, request it explicitly in the answer SDP and pin its payload type.
- On the WebSocket bridge, upsample to 16 kHz with a high-quality sinc or polyphase filter, not a naive zero-stuffer.
- Measure ASR word error rate on a held-out set of names and digits before and after; target a 15-20% relative WER drop with G.722 over PCMU.
- Watch the carrier reports: if the upstream leg is EVS but you negotiate PCMU, you are paying for the worst of both worlds.
- For mobile-heavy inbound (60%+ from cell phones), evaluate a media gateway that can pass through EVS to a downstream codec capable of preserving wideband.
- Re-run ASR benchmarks quarterly; codec interop on US trunks shifts faster than you expect.
FAQ
Can OpenAI Realtime accept EVS directly? No. Realtime accepts PCM 16-bit, G.711, and Opus. EVS would need a transcoder upstream.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Is G.722 enough, or do I need fullband? For ASR on names, digits, and short utterances, G.722 (50 Hz to 7 kHz) recovers most of the gain. Fullband matters more for music-on-hold, IVR audio quality, and human listening, less for AI accuracy.
Does Twilio expose EVS at all? Not on standard SIP trunks. Some carrier interconnects negotiate AMR-WB; EVS is rare on the egress side.
Will EVS adoption grow for AI in 2026? Likely yes for direct mobile-to-AI products that bypass legacy SIP, less so for traditional contact-center deployments.
How much does the codec actually move ASR accuracy? On clean speech, 1-3% absolute WER. On accented speech, names, and noisy environments, 5-10% absolute. The longer the utterance, the larger the gap.
Sources
- VoiceAge: Enhanced Voice Services (EVS) codec
- Wikipedia: Enhanced Voice Services
- GetStream: WebRTC Codecs - What's supported?
Start a 14-day trial to hear our codec stack in production, see pricing for $149/$499/$1499 tiers, or contact us about wideband audio negotiation for high-stakes voice AI.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.