Skip to content
AI Infrastructure
AI Infrastructure10 min read0 views

G.722 vs Opus for Voice AI in 2026: The Real Tradeoff Beyond Bitrate

G.722 fixed 64 kbps with sub-2 ms latency or Opus 6-510 kbps with adaptive bitrate? For voice AI agents the answer depends on whether your media path crosses a SIP trunk or stays end-to-end WebRTC.

Opus wins almost every codec benchmark on the public internet, but G.722 still wins the SIP trunk negotiation 90% of the time. For an AI voice builder in 2026, picking between them is less a quality argument than an architecture argument.

Background

flowchart TD
  Out[Outbound campaign] --> Twilio[Twilio Voice API]
  Twilio --> STIR[STIR/SHAKEN attestation]
  STIR --> Carrier[Originating carrier]
  Carrier --> Term[Terminating carrier]
  Term --> Recipient[Recipient phone]
  Recipient --> Webhook[/voice webhook/]
  Webhook --> Agent[AI sales agent]
CallSphere reference architecture

G.722 was standardized by ITU-T in 1988 and is the original HD voice codec: 7 kHz audio band over a 16 kHz sample rate at a fixed 64 kbps, sub-2 ms algorithmic delay. It is the de-facto interop codec for SIP wideband. Opus was standardized as RFC 6716 in 2012 by IETF, supports 6 kbps to 510 kbps variable bitrate, sample rates from 8 to 48 kHz, and is mandatory-to-implement in WebRTC.

For voice AI specifically, both are reasonable choices. Opus surpasses G.722 on raw audio quality, especially below 32 kbps, and adapts to network conditions. G.722 is rock-stable, ubiquitous, and adds essentially zero CPU on the encode/decode path.

Technical deep-dive

The key tradeoffs:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
Aspect G.722 Opus
Bitrate Fixed 64 kbps 6-510 kbps adaptive
Sample rate 16 kHz 8/12/16/24/48 kHz
Algorithmic delay <2 ms 5-22.5 ms (frame-dependent)
FEC / PLC Basic Inband FEC + DTX
SIP trunk support Universal Rare in US 2026
WebRTC Optional Mandatory
ASR quality Wideband, clear Better at variable bitrate

For the latency-sensitive AI voice path the algorithmic delay matters. G.722 adds 1.5 ms on encode, Opus adds 5 ms on a 5 ms frame and up to 22.5 ms on a 20 ms frame. Add jitter buffer (40-100 ms typical) and that 20 ms saving is noise.

For ASR accuracy, the picture is more interesting. G.722 has a fixed wideband signal that ASR engines have decades of training on. Opus at 32 kbps on a clean network is comparable; under 5% packet loss Opus's inband FEC pulls ahead because lost frames get reconstructed. Under 15% packet loss Opus pulls way ahead.

# SDP offer with both codecs, Opus preferred
m=audio 49170 RTP/AVP 111 9 0
a=rtpmap:111 opus/48000/2
a=fmtp:111 minptime=10;useinbandfec=1
a=rtpmap:9 G722/8000
a=rtpmap:0 PCMU/8000

Note the G722/8000 entry is a historical RTP quirk (RFC 3551 errata): G.722 actually samples at 16 kHz but the RTP clock rate is 8000.

CallSphere implementation

CallSphere uses Twilio across all six verticals. For inbound and outbound PSTN we negotiate PCMU as primary because that is what Twilio's standard SIP trunk supports; on Twilio Voice SDK browser calls and Conversation Relay the path is Opus end-to-end. The FastAPI :8084 Healthcare bridge upsamples to 16 kHz before forwarding to OpenAI Realtime. Sales Calling AI runs 5 concurrent outbound calls and is sensitive to packet loss on cell phone destinations; we tune jitter buffer to 60 ms for that product. After-Hours AI uses simul call+SMS to on-call staff with a 120-second timeout, where reliability matters more than codec fidelity. With 37 agents, 90+ tools, 115+ DB tables, HIPAA + SOC 2, $149/$499/$1499 pricing, and a 14-day trial, codec choice cascades through every product.

Implementation steps

  1. Map your call paths: PSTN inbound, PSTN outbound, browser-to-AI, browser-to-PSTN. Each leg negotiates separately.
  2. For PSTN legs in 2026, accept PCMU and offer G.722 if your trunk lists it; do not assume Opus is available.
  3. For browser legs always use Opus at 32-64 kbps with inband FEC enabled.
  4. Pin RTP payload types in your SDP answer; some SBCs renegotiate mid-call and that breaks AI streaming.
  5. Measure packet loss in your CDR. If sustained loss is >2%, prefer Opus where available.
  6. Run an ASR A/B test: same calls transcribed via G.722 path and Opus path, compare WER on names and digits.
  7. Avoid letting the SIP gateway transcode mid-call; either pick a codec the AI bridge can accept natively or transcode once at the edge.

FAQ

Should I force Opus on every leg if I can? On WebRTC legs, yes. On SIP legs, only if your trunk and your AI bridge both support it; otherwise you add an unnecessary transcode.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Does Opus's variable bitrate confuse the model? No. The model receives PCM samples after decode, not the codec frames. Bitrate variation only affects what the network carries.

Is G.722 still patent-encumbered? The original patents expired long ago. It is freely implementable and ubiquitous in open source (FreeSWITCH, Asterisk, PJSIP).

What about G.729 or AMR-WB? G.729 is narrowband 8 kHz, worse for ASR than G.722, no advantage for AI. AMR-WB is similar quality to G.722 but more common on mobile interconnect than SIP trunks.

How much CPU do these add? G.722 is essentially free (sub-1% of a vCPU per call). Opus at 48 kHz with FEC is 2-5% per call. Negligible at our concurrency until you cross a few thousand simultaneous calls.

Sources

Start a 14-day trial to test the codec path live, see pricing, or contact us about wideband AI voice negotiation.

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like