By Sagar Shankaran, Founder of CallSphere
PCM16 at 24kHz burns 384 kbps. Opus at 32 kbps delivers indistinguishable quality. The bandwidth math says use Opus. The vendor API math sometimes says use PCM16. Here is when each wins.
Key takeaways
PCM16 at 24kHz burns 384 kbps. Opus at 32 kbps delivers indistinguishable quality. The bandwidth math says use Opus. The vendor API math sometimes says use PCM16. Here is when each wins.
flowchart LR
Twilio["Twilio Media Streams"] -- "WS · μlaw 8kHz" --> Bridge["FastAPI Bridge :8084"]
Bridge -- "PCM16 24kHz" --> OAI["OpenAI Realtime"]
OAI --> Bridge
Bridge --> Twilio
Bridge --> Logs[(structured logs · OTel)]In 2026, voice agent infrastructure has two dominant audio formats: PCM16 (uncompressed linear PCM, 16-bit) and Opus (the modern WebRTC codec). Vendors handle them inconsistently — OpenAI Realtime accepts both but bills the same per-token rate; ElevenLabs prefers Opus for streaming TTS; Twilio Media Streams sends mu-law 8kHz by default.
The bandwidth gap is enormous (10× or more), but the cost picture is more complicated than just bytes-on-wire. Egress, codec CPU, latency, and quality all interact.
PCM16 at 24kHz (typical for OpenAI Realtime):
Opus at 24kHz wideband, 32 kbps:
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
mu-law 8kHz (PSTN/Twilio default):
Pretend 10,000 concurrent voice agents, average 5-minute call, 60/40 caller-agent talk.
PCM16 24kHz both directions:
Opus 24kHz both directions:
mu-law 8kHz:
So at 10k concurrent, Opus saves $400–$1,500/month on egress vs PCM16 with no quality penalty.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
CallSphere uses a per-segment codec policy across the 6 verticals — 37 agents, 90+ tools, 115+ DB tables:
The egress savings on the public-internet hops fund the lower-tier pricing tiers ($149 / $499 / $1499) — bandwidth is a real line item at our scale. The 14-day no-card trial lets you A/B Opus vs PCM16 on the demo cards.
Is Opus quality really indistinguishable from PCM at 32 kbps? Yes for voice — listening tests show Opus 32 kbps wideband is transparent for speech. Music needs more.
Why does OpenAI Realtime want PCM16 24kHz? Direct tensor input, no decode CPU on the inference path. Simplifies their pipeline.
Is mu-law dead? For AI input, yes. Use it only for PSTN bridges and transcode once.
What about Opus 48kHz or 16kHz? 24kHz is the sweet spot for voice AI. 48kHz is overkill for speech. 16kHz is too narrow for natural-sounding TTS.
Does codec choice affect STT accuracy? Slightly. PCM16 24kHz wins by 0.5–1.5 percentage points WER on hard accents vs Opus 32kbps. The gap closes at 64 kbps Opus.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
A VoIP telephone number is a phone number that routes calls over the internet instead of copper lines. Learn what a VoIP number is, how to get one, what it costs, and how to pair it with an AI voice agent in 2026.
Every 100ms of latency costs you. So does every cent per minute. Here is the decision matrix we use across 6 verticals to pick where to spend and where to save on voice AI infrastructure.
Bedrock Claude + Transcribe streaming + Polly Neural runs $0.06–$0.10 per minute on paper. The honest math reveals where the AWS-native stack beats and where it loses to OpenAI Realtime.
Embeddings, vector storage, graph nodes, and recall API calls all add up faster than expected. The cost model for serving 100k users with agent memory at scale.
Picking an LLM is choosing two of three: latency, quality, cost. The 2026 framework for explicit trade-offs and how to negotiate them.
Multi-layer cache designs for AI apps — prompt cache, response cache, retrieval cache, embedding cache — and how they compose in 2026.
© 2026 CallSphere LLC. All rights reserved.