Skip to content
AI Infrastructure
AI Infrastructure11 min0 views

Reading RTCStatsReport for AI Voice Agents: 2026 Field Guide

RTCStatsReport is the only telemetry source that reflects what the user actually heard. Here is the 2026 mapping from getStats fields to AI voice SLOs.

Server logs lie about audio quality. The only ground truth is what the receiver's jitter buffer experienced — and that is exclusively in `RTCStatsReport`. For AI voice in 2026, getStats is the SLO data plane.

Why getStats matters more for AI voice

Old-school voice apps measured MOS at the SBC. AI voice agents do not have an SBC at the user edge — the browser is the edge. A 2% packet-loss spike in the user's coffee-shop Wi-Fi never appears in your server traces. It only appears in `getStats` on the browser, and only the values that come from the receiver-side report tell you what actually played.

The W3C Statistics API (webrtc-stats) defines stable identifiers for every relevant metric: `inbound-rtp`, `outbound-rtp`, `remote-inbound-rtp`, `candidate-pair`, `media-source`, and `media-playout` (the new one, important for jitter-buffer health). The 2026 update added `ecn-ce-marks-received` for L4S detection and finalized `audioLevel` semantics across browsers.

Architecture pattern

```mermaid flowchart LR Browser -- getStats every 2s --> Collector Collector -- diff over time --> Metrics[(Prometheus / Datadog)] Metrics --> SLO[Voice SLO dashboard] Metrics --> Alert[PagerDuty if p99 jitter > 30ms] ```

You poll `pc.getStats()` once every 2 seconds, diff the cumulative counters against the previous sample, and emit gauge metrics. Sub-second polling does not help — the underlying counters update at most once per RTCP report (every 1–5 s).

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

Key fields to track

  • inbound-rtp: `packetsLost`, `jitter`, `packetsReceived`, `bytesReceived`, `audioLevel`.
  • media-playout: `totalSamplesReceived`, `concealedSamples`, `silentConcealedSamples`. Concealed samples are the smoking gun for "did the user hear glitches?"
  • remote-inbound-rtp: `roundTripTime`, `fractionLost`. Tells you what the agent sees coming from the user.
  • candidate-pair (selected): `currentRoundTripTime`, `bytesSent`, `bytesReceived`, `localCandidateId` / `remoteCandidateId` (use to detect TURN relay).
  • media-source: `audioLevel`, `totalAudioEnergy`. Use sparingly — they leak speaker activity.

CallSphere implementation

CallSphere ships getStats collection in the browser SDK that powers all six verticals (real estate, healthcare, behavioral health, legal, salon, insurance). Polling at 2 s, we ship 14 fields per call to a Prometheus pushgateway routed through our Pion Go gateway 1.23 over NATS to the 6-container pod (CRM, MLS, calendar, SMS, audit, transcript).

The single most useful derived metric is `concealedSamples / totalSamplesReceived` — anything above 1.5% means the AI agent's response was audibly degraded, regardless of how clean the model output was. Across 37 agents, 90+ tools, 115+ database tables we alert on p99 RTT, p99 jitter, and concealment ratio. Real Estate (OneRoof, /industries/real-estate) sees the lowest concealment because of the noise-suppression worker described in the Insertable Streams post. SOC 2 + HIPAA logs only carry derived numbers — never raw audio level. Pricing $149/$499/$1499; affiliates 22% — see /affiliate.

Code snippet

```ts let prev = new Map(); setInterval(async () => { const stats = await pc.getStats(); for (const [id, report] of stats) { if (report.type === "inbound-rtp" && report.kind === "audio") { const last = prev.get(id); if (last) { const dPackets = report.packetsReceived - last.packetsReceived; const dLost = report.packetsLost - last.packetsLost; const lossRate = dLost / Math.max(1, dPackets + dLost); emit("voice.inbound.lossRate", lossRate); emit("voice.inbound.jitter", report.jitter); } prev.set(id, report); } if (report.type === "media-playout") { emit("voice.playout.concealed", report.concealedSamples / Math.max(1, report.totalSamplesReceived)); } if (report.type === "candidate-pair" && report.selected) { emit("voice.rtt", report.currentRoundTripTime); emit("voice.relay", report.remoteCandidateId.includes("relay") ? 1 : 0); } } }, 2000); ```

Build steps

  1. Poll every 2 s; diff cumulative counters; never store absolute values.
  2. Tag every metric with `agentId`, `vertical`, `sessionId`, `browser`.
  3. Route raw values through a server you control before any third-party APM. Browser-emitted PHI in stats is a real concern.
  4. Build SLO dashboards on three numbers: RTT, jitter, concealment ratio.
  5. Alert on concealment > 1.5% sustained for 30 s. That is the user-noticed bar.
  6. Cross-reference `localCandidateId` / `remoteCandidateId` with the candidate-pair report to detect when a session is on TURN.
  7. Persist a 24-hour rolling sample for forensic playback; engineers will need it.

Common pitfalls

  • Polling `getStats` from the main thread — it is a fast call but during garbage collection it can spike to 30 ms. Schedule on `requestIdleCallback` if you can.
  • Treating absolute counters as gauges — `packetsLost` is monotonic. Always diff.
  • Ignoring `media-playout` — the smoking gun for user-perceived audio glitches lives only here.
  • Cross-tab leakage — multi-tab apps may have multiple PeerConnections. Tag stats per PC.
  • Privacy via `audioLevel` — do not ship raw `audioLevel` to a third-party APM; it leaks speaker activity timing.

FAQ

Is concealedSamples in every browser? Yes — `media-playout` is in Chrome 110+, Safari 17+, Firefox 122+.

Why not poll faster? RTCP only updates the underlying counters every 1–5 seconds; polling faster gives no new data.

What about MOS? No browser computes MOS for you. Compute it server-side from packet loss, jitter, and RTT using ITU-T G.107.

Are stats privacy-sensitive? `audioLevel` can leak whether someone is speaking. Strip it from anything that hits a third-party APM.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Is there a single field that summarizes call quality? No — combine RTT, jitter, and concealment. A weighted score works well.

What is the cost of `getStats`? Sub-millisecond on modern hardware; safe to call every 2 s.

Does it work in workers? No — `getStats` is on the main thread or the document only. Pull the data out, then ship to a worker for analysis.

Can I correlate stats with model latency? Yes — emit `response.first_audio_delay` from the DataChannel side and join with `getStats` server-side.

Production playbook for AI voice teams in 2026

Three rules from shipping getStats observability across 37 agents:

  1. Compute MOS server-side. ITU-T G.107 plus packet loss, jitter, and RTT gives you a reliable per-call MOS proxy. Do not trust client-side approximations.
  2. Tag the candidate type. `relay` vs `srflx` vs `host` is the single best predictor of call quality. Tag every metric with it.
  3. Sample, do not aggregate, on the client. Send raw per-second samples; aggregate in your TSDB. Aggregation on the client throws away the noise pattern that points at the actual problem.

The most underrated metric is `media-source.totalAudioEnergy` divided by elapsed time — silent calls have a known energy-vs-time signature, and "user said nothing for 30 s" is a very specific failure mode that needs a dedicated alert.

Watch list 2026

  • `ecn-ce-marks-received` lands in Chromium 132 and lights up L4S adoption tracking — see the L4S post.
  • `media-playout.delay` is being added to the spec; it gives you the actual playout delay, not just concealment.
  • `remote-outbound-rtp` becomes mandatory in the 2026-Q3 spec update; previously many browsers omitted it, breaking RTT calculations in some flows.
  • Privacy guidance is being formalised in webrtc-stats — expect tighter rules on what `audioLevel` and `totalAudioEnergy` you can ship to third parties without consent.

Sources

See live SLOs on /demo, see /pricing, or start a /trial.

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

AI Voice Agents

WebRTC Mobile Testing with BrowserStack + Sauce Labs (2026)

BrowserStack offers 30,000+ real devices; Sauce Labs ships deep Appium automation. Here is how AI voice agent teams use both for WebRTC mobile QA in 2026.

AI Infrastructure

Monitoring WebSocket Health: Heartbeats and Prometheus in 2026

How to actually observe a WebSocket fleet: ping/pong heartbeats, Prometheus metrics that matter, dead-man switches, and the alerts that fire before customers notice.

AI Engineering

Latency vs Cost: A Decision Matrix for Voice AI Spend in 2026

Every 100ms of latency costs you. So does every cent per minute. Here is the decision matrix we use across 6 verticals to pick where to spend and where to save on voice AI infrastructure.

Agentic AI

The Agent Evaluation Stack in 2026: From Trace to Eval Score

How the modern agent eval stack actually flows: instrument, trace, dataset, evaluator, score, CI gate. The full pipeline that keeps agents from regressing.

AI Infrastructure

OpenAI's May 2026 WebRTC Rearchitecture: How Voice Latency Got Real

On May 4 2026 OpenAI published its Realtime stack rebuild — split-relay plus transceiver edge. Here is what changed and what it means for production voice agents.

AI Voice Agents

MOS Call Quality Scoring for AI Voice Operations in 2026: Beyond 4.2

MOS 4.3+ is the band where AI voice feels human. Drop below 3.6 and conversations break. Here is how to measure, improve, and alert on MOS in production AI voice using G.711, Opus, and the underlying packet loss / jitter / latency math.