By Sagar Shankaran, Founder of CallSphere
MOS 4.3+ is the band where AI voice feels human. Drop below 3.6 and conversations break. Here is how to measure, improve, and alert on MOS in production AI voice using G.711, Opus, and the underlying packet loss / jitter / latency math.
Key takeaways
Mean Opinion Score is the only call quality metric that matters in production AI voice in 2026. The ITU-T scale is 1 to 5; "acceptable" is 4.3 or higher; conversations start to break around 3.6. The math behind MOS is packet loss, jitter, and one-way latency converted to a perceptual score via the E-Model. AI voice deployments running below 4.0 sustained will see customer satisfaction drop sharply, even if every other metric looks green.
MOS was standardized by the ITU-T (recommendation P.800) in the 1990s as a subjective listening test: groups of trained listeners rate audio samples on a 1-to-5 scale, the average is the Mean Opinion Score. The objective MOS used in VoIP is computed via the E-Model (ITU-T G.107), which maps network metrics (latency, jitter, packet loss) plus codec choice into an R-factor and then to a MOS prediction.
The bands matter. MOS 4.3 to 5.0 is "excellent" - indistinguishable from in-person. MOS 4.0 to 4.2 is "good" - what a typical PSTN call sounds like. MOS 3.6 to 4.0 is "fair" - users notice but tolerate. MOS below 3.6 is "poor" - users start to abandon. AI voice has a tighter floor because the LLM-generated speech is already at risk of sounding robotic; combine that with a 3.5 MOS network and conversations break.
The dominant variables in 2026 are codec choice (G.711 caps at MOS 4.4 best case; Opus can hit 4.5+ at higher bitrates), one-way latency (target under 150 ms; degrades after 200 ms), packet loss (target under 1 percent; noticeable above 3 percent), and jitter (target under 30 ms; noticeable above 50 ms).
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
flowchart TD
A[RTP packets in / out] --> B[Per-call telemetry collector]
B --> C[Compute one-way latency]
B --> D[Compute jitter]
B --> E[Compute packet loss]
C --> F[E-Model R-factor]
D --> F
E --> F
F --> G[Predicted MOS]
G --> H{MOS < 4.0?}
H -->|Yes| I[Alert + investigate]
H -->|No| J[Log and continue]
I --> K[Codec / network / transcoder root cause]
The E-Model is the canonical conversion: R = 93.2 - latency_impact - jitter_impact - loss_impact - codec_impact, then MOS = 1 + 0.035R + 0.0000007R*(R-60)*(100-R). Most VoIP monitoring tools (Twilio Voice Insights, Obkio, Paessler) implement this directly.
CallSphere measures MOS on every call across our six verticals. The Twilio Media Streams bridge that feeds OpenAI Realtime captures RTP-level telemetry; our call_quality table (one of 115+ DB tables) stores per-call latency, jitter, packet loss, and computed MOS. Healthcare AI calls are tagged with the patient ID (HIPAA-compliant) so we can correlate quality with clinical outcomes. Sales Calling AI tags with the lead ID so we can correlate quality with conversion rate. The MOS dashboard (one of 90+ tools) surfaces per-tenant rolling averages and triggers alerts when MOS drops below 4.0 sustained for 5 minutes. Default codec is Opus at 16 kHz for the AI side, transcoded to G.711 at the PSTN edge. Scale ($1499/mo) tenants get a per-call MOS report in the admin console; Growth ($499/mo) tenants get aggregate weekly. The 22% affiliate program credits Scale upgrades driven by quality SLAs.
What is a "good" MOS for AI voice? 4.0 minimum sustained; 4.2 ideal. Below 4.0 the AI's already-synthetic speech starts to feel robotic; below 3.6 conversations break.
Does Opus beat G.711 for AI voice? Yes when end-to-end. Opus at 16 kHz can hit MOS 4.5+; G.711 caps at 4.4. The catch is the PSTN side typically G.711-only, so you transcode at the edge and lose some of the gain.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
How much does latency hurt MOS? Significantly above 150 ms one-way. The E-Model penalizes latency steeply after 175 ms because it adds conversational delay that listeners notice.
Can I improve MOS just by buying better internet? Sometimes. If packet loss is the dominant variable, yes. If it is codec or one-way latency due to physical distance, no.
Does CallSphere expose MOS metrics? Yes, on Growth and Scale plans. Per-call detail is on Scale; aggregate weekly is on Growth. Starter shows session-level "good/fair/poor" labels only.
Start a 14-day trial with full MOS visibility, browse pricing for per-call analytics on Scale, or book a demo. Partners earn 22% via the affiliate program; SLA questions go to contact.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
A founder's guide to the female voice generator landscape: AI female voices, Japanese voices, robot voices, and how CallSphere ships 57+ voices live.
How to actually observe a WebSocket fleet: ping/pong heartbeats, Prometheus metrics that matter, dead-man switches, and the alerts that fire before customers notice.
How the modern agent eval stack actually flows: instrument, trace, dataset, evaluator, score, CI gate. The full pipeline that keeps agents from regressing.
Arize Phoenix is the open-source LLM observability tool that grew up significantly in 2026. Tracing, evals, and the OTel-native approach that makes Phoenix portable.
Langfuse's April 2026 release ships online evals, prompt versioning, and dataset workflows. Why self-hosted observability is worth the operational lift in 2026 builds.
End-to-end performance profiling across LLM, retrieval, tool, and UI layers. The 2026 patterns for finding the real bottleneck in AI pipelines.
© 2026 CallSphere LLC. All rights reserved.