By Sagar Shankaran, Founder of CallSphere
How RTP carries AI voice end-to-end, why Opus matters more than G.711 for model accuracy, and the codec negotiation patterns that ship in 2026.
Key takeaways
If you care about the accuracy of your AI voice agent, the codec choice on the carrier leg matters more than the model choice. The full-fidelity end of the call is where the model lives; the squeezed end is where the PSTN forces 8 kHz mu-law. The 2026 patterns push the high-fidelity boundary as far toward the caller as possible.
flowchart LR
Phone["PSTN caller"] --> Carrier["Carrier"]
Carrier -- "SIP INVITE" --> SBC["Session Border Controller"]
SBC -- "SIP" --> PBX["Twilio / Asterisk"]
PBX -- "RTP · Opus" --> Bridge["AI Voice Gateway"]
Bridge --> AI["OpenAI Realtime"]
AI --> Bridge
Bridge --> PBXVoice over IP audio is encoded by a codec on the sender, packetized into RTP, and decoded by a codec on the receiver. The codecs that matter for AI agents are:
For voice to feel natural, one-way latency should stay under ~150 ms. Opus supports configurable frame sizes from 10 to 20 ms, which materially helps the latency budget.
The 2026 trend: keep Opus all the way from caller to model where possible (WebRTC client to your AI), and only collapse to G.711 when crossing into the PSTN. Every codec hop adds artifacts that hurt model accuracy.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Codec negotiation happens in the SDP body of the SIP INVITE. The caller sends an offer listing supported codecs in priority order. The receiver answers with the codec it picks. RTP carries the encoded payloads on the negotiated UDP port pair.
For a typical AI voice call:
Most of the audio quality loss is in that PSTN G.711 leg. There is nothing your software can do about it; the originating carrier picked the codec when the call left the caller's phone. What you control is everything after the carrier hands the call to you: keep it Opus, keep it 16 or 24 kHz, and avoid double-transcoding through low-rate codecs.
CallSphere uses Twilio across all products. PSTN inbound to the Healthcare AI receptionist on FastAPI :8084 is G.711 mu-law on the carrier leg, then bridged to OpenAI Realtime where the audio is up-sampled to 24 kHz internally. Sales Calling AI with five concurrent outbound on Twilio Programmable Voice and After-Hours AI with simultaneous Twilio call plus SMS and 120 second timeout follow the same path.
For browser and mobile demo paths (the /demo page), CallSphere uses Opus end-to-end via WebRTC, which produces noticeably crisper audio and slightly higher model accuracy than the PSTN paths. The 37 agents, 90+ tools, 115+ database tables, HIPAA and SOC 2 controls, and the $149/$499/$1499 pricing for 1/3/10 numbers do not change based on codec — but customers running large headsets or wideband phones see better outcomes.
<!-- FreeSWITCH SIP profile: prefer Opus, fall back to G.722, then G.711 -->
<settings>
<param name="inbound-codec-prefs" value="opus,G722,PCMU,PCMA"/>
<param name="outbound-codec-prefs" value="opus,G722,PCMU,PCMA"/>
<param name="codec-negotiation" value="generous"/>
<param name="enable-3pcc" value="proxy"/>
<param name="rtp-timeout-sec" value="300"/>
<param name="rtp-hold-timeout-sec" value="1800"/>
</settings>
Does Opus work over the PSTN? No, the PSTN forces G.711 on the last mile. Opus matters on the legs you control: browser, mobile, internal SIP.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Will my AI accuracy improve if I force G.722 on the carrier leg? Slightly, on calls where the originating carrier supports it end-to-end. Most US PSTN paths still collapse to G.711 somewhere.
What about transcoding cost? Modern SBCs and softswitches transcode in software at low cost. The bigger penalty is latency and audio fidelity, not CPU.
Should I use 8 kHz Opus or 16 kHz Opus? 16 kHz where bandwidth permits. The model accuracy gain from 16 kHz is real.
Does the OpenAI Realtime API care which codec I send? It accepts G.711 mu-law, G.711 a-law, and Opus inputs. Internally it works on 24 kHz PCM regardless.
Start a 14-day trial, book a demo to hear Opus end-to-end, or read the Twilio integration page.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
An honest 2026 guide to VoIP desk phones. Hardware vs softphone, top picks, when an internet phone is worth it, and where AI voice agents fit.
The best business phone app in 2026 is the one with an AI agent attached. Compare options, costs, and what an AI phone app actually does for a small business.
A founder's guide to business phone systems in 2026. Cloud vs on-prem, AI voice agents, small business pricing, and what actually works for under 100 seats.
The 2024 NPRM proposes mandatory penetration tests every 12 months and vulnerability scans every 6 months. Here is how an AI voice agent should be tested in 2026.
AWS HealthScribe became the open scribe layer EHR vendors built on top of in 2026. Here's the API surface, the per-encounter pricing, the BAA terms.
Why Claude salon AI is reshaping voice and chat automation, with concrete patterns for appointment AI in production deployments. A field-tested view from production teams shippi...
© 2026 CallSphere LLC. All rights reserved.