By Sagar Shankaran, Founder of CallSphere
RTCStatsReport is the only telemetry source that reflects what the user actually heard. Here is the 2026 mapping from getStats fields to AI voice SLOs.
Key takeaways
Server logs lie about audio quality. The only ground truth is what the receiver's jitter buffer experienced — and that is exclusively in `RTCStatsReport`. For AI voice in 2026, getStats is the SLO data plane.
Old-school voice apps measured MOS at the SBC. AI voice agents do not have an SBC at the user edge — the browser is the edge. A 2% packet-loss spike in the user's coffee-shop Wi-Fi never appears in your server traces. It only appears in `getStats` on the browser, and only the values that come from the receiver-side report tell you what actually played.
The W3C Statistics API (webrtc-stats) defines stable identifiers for every relevant metric: `inbound-rtp`, `outbound-rtp`, `remote-inbound-rtp`, `candidate-pair`, `media-source`, and `media-playout` (the new one, important for jitter-buffer health). The 2026 update added `ecn-ce-marks-received` for L4S detection and finalized `audioLevel` semantics across browsers.
```mermaid flowchart LR Browser -- getStats every 2s --> Collector Collector -- diff over time --> Metrics[(Prometheus / Datadog)] Metrics --> SLO[Voice SLO dashboard] Metrics --> Alert[PagerDuty if p99 jitter > 30ms] ```
You poll `pc.getStats()` once every 2 seconds, diff the cumulative counters against the previous sample, and emit gauge metrics. Sub-second polling does not help — the underlying counters update at most once per RTCP report (every 1–5 s).
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
CallSphere ships getStats collection in the browser SDK that powers all six verticals (real estate, healthcare, behavioral health, legal, salon, insurance). Polling at 2 s, we ship 14 fields per call to a Prometheus pushgateway routed through our Pion Go gateway 1.23 over NATS to the 6-container pod (CRM, MLS, calendar, SMS, audit, transcript).
The single most useful derived metric is `concealedSamples / totalSamplesReceived` — anything above 1.5% means the AI agent's response was audibly degraded, regardless of how clean the model output was. Across 37 agents, 90+ tools, 115+ database tables we alert on p99 RTT, p99 jitter, and concealment ratio. Real Estate (OneRoof, /industries/real-estate) sees the lowest concealment because of the noise-suppression worker described in the Insertable Streams post. SOC 2 + HIPAA logs only carry derived numbers — never raw audio level. Pricing $149/$499/$1499; affiliates 22% — see /affiliate.
```ts let prev = new Map(); setInterval(async () => { const stats = await pc.getStats(); for (const [id, report] of stats) { if (report.type === "inbound-rtp" && report.kind === "audio") { const last = prev.get(id); if (last) { const dPackets = report.packetsReceived - last.packetsReceived; const dLost = report.packetsLost - last.packetsLost; const lossRate = dLost / Math.max(1, dPackets + dLost); emit("voice.inbound.lossRate", lossRate); emit("voice.inbound.jitter", report.jitter); } prev.set(id, report); } if (report.type === "media-playout") { emit("voice.playout.concealed", report.concealedSamples / Math.max(1, report.totalSamplesReceived)); } if (report.type === "candidate-pair" && report.selected) { emit("voice.rtt", report.currentRoundTripTime); emit("voice.relay", report.remoteCandidateId.includes("relay") ? 1 : 0); } } }, 2000); ```
Is concealedSamples in every browser? Yes — `media-playout` is in Chrome 110+, Safari 17+, Firefox 122+.
Why not poll faster? RTCP only updates the underlying counters every 1–5 seconds; polling faster gives no new data.
What about MOS? No browser computes MOS for you. Compute it server-side from packet loss, jitter, and RTT using ITU-T G.107.
Are stats privacy-sensitive? `audioLevel` can leak whether someone is speaking. Strip it from anything that hits a third-party APM.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Is there a single field that summarizes call quality? No — combine RTT, jitter, and concealment. A weighted score works well.
What is the cost of `getStats`? Sub-millisecond on modern hardware; safe to call every 2 s.
Does it work in workers? No — `getStats` is on the main thread or the document only. Pull the data out, then ship to a worker for analysis.
Can I correlate stats with model latency? Yes — emit `response.first_audio_delay` from the DataChannel side and join with `getStats` server-side.
Three rules from shipping getStats observability across 37 agents:
The most underrated metric is `media-source.totalAudioEnergy` divided by elapsed time — silent calls have a known energy-vs-time signature, and "user said nothing for 30 s" is a very specific failure mode that needs a dedicated alert.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
A founder's guide to texto a voz (text-to-speech in Spanish): LATAM vs Castilian voices, free options, and how CallSphere ships Spanish agents.
A founder's guide to the female voice generator landscape: AI female voices, Japanese voices, robot voices, and how CallSphere ships 57+ voices live.
A founder's guide to the Siri voice generator landscape: how AI voice cloning works, what is legal, and how CallSphere uses 57+ voices in production.
A founder's guide to AI voice assistants for ecommerce: customer service, order lookup, and how CallSphere fits in versus virtual receptionists.
Robot text to speech in 2026: how I pick TTS APIs, when robotic voices help, and how CallSphere ships 57+ language voice agents. Hands-on guide.
The customer support specialist role in 2026 is half human, half AI. Here is what the job looks like, the AI tools that pair with it, and how we ship it.
© 2026 CallSphere LLC. All rights reserved.