Reading RTCStatsReport for AI Voice Agents: 2026 Field Guide
RTCStatsReport is the only telemetry source that reflects what the user actually heard. Here is the 2026 mapping from getStats fields to AI voice SLOs.
Server logs lie about audio quality. The only ground truth is what the receiver's jitter buffer experienced — and that is exclusively in `RTCStatsReport`. For AI voice in 2026, getStats is the SLO data plane.
Why getStats matters more for AI voice
Old-school voice apps measured MOS at the SBC. AI voice agents do not have an SBC at the user edge — the browser is the edge. A 2% packet-loss spike in the user's coffee-shop Wi-Fi never appears in your server traces. It only appears in `getStats` on the browser, and only the values that come from the receiver-side report tell you what actually played.
The W3C Statistics API (webrtc-stats) defines stable identifiers for every relevant metric: `inbound-rtp`, `outbound-rtp`, `remote-inbound-rtp`, `candidate-pair`, `media-source`, and `media-playout` (the new one, important for jitter-buffer health). The 2026 update added `ecn-ce-marks-received` for L4S detection and finalized `audioLevel` semantics across browsers.
Architecture pattern
```mermaid flowchart LR Browser -- getStats every 2s --> Collector Collector -- diff over time --> Metrics[(Prometheus / Datadog)] Metrics --> SLO[Voice SLO dashboard] Metrics --> Alert[PagerDuty if p99 jitter > 30ms] ```
You poll `pc.getStats()` once every 2 seconds, diff the cumulative counters against the previous sample, and emit gauge metrics. Sub-second polling does not help — the underlying counters update at most once per RTCP report (every 1–5 s).
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Key fields to track
- inbound-rtp: `packetsLost`, `jitter`, `packetsReceived`, `bytesReceived`, `audioLevel`.
- media-playout: `totalSamplesReceived`, `concealedSamples`, `silentConcealedSamples`. Concealed samples are the smoking gun for "did the user hear glitches?"
- remote-inbound-rtp: `roundTripTime`, `fractionLost`. Tells you what the agent sees coming from the user.
- candidate-pair (selected): `currentRoundTripTime`, `bytesSent`, `bytesReceived`, `localCandidateId` / `remoteCandidateId` (use to detect TURN relay).
- media-source: `audioLevel`, `totalAudioEnergy`. Use sparingly — they leak speaker activity.
CallSphere implementation
CallSphere ships getStats collection in the browser SDK that powers all six verticals (real estate, healthcare, behavioral health, legal, salon, insurance). Polling at 2 s, we ship 14 fields per call to a Prometheus pushgateway routed through our Pion Go gateway 1.23 over NATS to the 6-container pod (CRM, MLS, calendar, SMS, audit, transcript).
The single most useful derived metric is `concealedSamples / totalSamplesReceived` — anything above 1.5% means the AI agent's response was audibly degraded, regardless of how clean the model output was. Across 37 agents, 90+ tools, 115+ database tables we alert on p99 RTT, p99 jitter, and concealment ratio. Real Estate (OneRoof, /industries/real-estate) sees the lowest concealment because of the noise-suppression worker described in the Insertable Streams post. SOC 2 + HIPAA logs only carry derived numbers — never raw audio level. Pricing $149/$499/$1499; affiliates 22% — see /affiliate.
Code snippet
```ts let prev = new Map(); setInterval(async () => { const stats = await pc.getStats(); for (const [id, report] of stats) { if (report.type === "inbound-rtp" && report.kind === "audio") { const last = prev.get(id); if (last) { const dPackets = report.packetsReceived - last.packetsReceived; const dLost = report.packetsLost - last.packetsLost; const lossRate = dLost / Math.max(1, dPackets + dLost); emit("voice.inbound.lossRate", lossRate); emit("voice.inbound.jitter", report.jitter); } prev.set(id, report); } if (report.type === "media-playout") { emit("voice.playout.concealed", report.concealedSamples / Math.max(1, report.totalSamplesReceived)); } if (report.type === "candidate-pair" && report.selected) { emit("voice.rtt", report.currentRoundTripTime); emit("voice.relay", report.remoteCandidateId.includes("relay") ? 1 : 0); } } }, 2000); ```
Build steps
- Poll every 2 s; diff cumulative counters; never store absolute values.
- Tag every metric with `agentId`, `vertical`, `sessionId`, `browser`.
- Route raw values through a server you control before any third-party APM. Browser-emitted PHI in stats is a real concern.
- Build SLO dashboards on three numbers: RTT, jitter, concealment ratio.
- Alert on concealment > 1.5% sustained for 30 s. That is the user-noticed bar.
- Cross-reference `localCandidateId` / `remoteCandidateId` with the candidate-pair report to detect when a session is on TURN.
- Persist a 24-hour rolling sample for forensic playback; engineers will need it.
Common pitfalls
- Polling `getStats` from the main thread — it is a fast call but during garbage collection it can spike to 30 ms. Schedule on `requestIdleCallback` if you can.
- Treating absolute counters as gauges — `packetsLost` is monotonic. Always diff.
- Ignoring `media-playout` — the smoking gun for user-perceived audio glitches lives only here.
- Cross-tab leakage — multi-tab apps may have multiple PeerConnections. Tag stats per PC.
- Privacy via `audioLevel` — do not ship raw `audioLevel` to a third-party APM; it leaks speaker activity timing.
FAQ
Is concealedSamples in every browser? Yes — `media-playout` is in Chrome 110+, Safari 17+, Firefox 122+.
Why not poll faster? RTCP only updates the underlying counters every 1–5 seconds; polling faster gives no new data.
What about MOS? No browser computes MOS for you. Compute it server-side from packet loss, jitter, and RTT using ITU-T G.107.
Are stats privacy-sensitive? `audioLevel` can leak whether someone is speaking. Strip it from anything that hits a third-party APM.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Is there a single field that summarizes call quality? No — combine RTT, jitter, and concealment. A weighted score works well.
What is the cost of `getStats`? Sub-millisecond on modern hardware; safe to call every 2 s.
Does it work in workers? No — `getStats` is on the main thread or the document only. Pull the data out, then ship to a worker for analysis.
Can I correlate stats with model latency? Yes — emit `response.first_audio_delay` from the DataChannel side and join with `getStats` server-side.
Production playbook for AI voice teams in 2026
Three rules from shipping getStats observability across 37 agents:
- Compute MOS server-side. ITU-T G.107 plus packet loss, jitter, and RTT gives you a reliable per-call MOS proxy. Do not trust client-side approximations.
- Tag the candidate type. `relay` vs `srflx` vs `host` is the single best predictor of call quality. Tag every metric with it.
- Sample, do not aggregate, on the client. Send raw per-second samples; aggregate in your TSDB. Aggregation on the client throws away the noise pattern that points at the actual problem.
The most underrated metric is `media-source.totalAudioEnergy` divided by elapsed time — silent calls have a known energy-vs-time signature, and "user said nothing for 30 s" is a very specific failure mode that needs a dedicated alert.
Watch list 2026
- `ecn-ce-marks-received` lands in Chromium 132 and lights up L4S adoption tracking — see the L4S post.
- `media-playout.delay` is being added to the spec; it gives you the actual playout delay, not just concealment.
- `remote-outbound-rtp` becomes mandatory in the 2026-Q3 spec update; previously many browsers omitted it, breaking RTT calculations in some flows.
- Privacy guidance is being formalised in webrtc-stats — expect tighter rules on what `audioLevel` and `totalAudioEnergy` you can ship to third parties without consent.
Sources
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.