---
title: "Reading RTCStatsReport for AI Voice Agents: 2026 Field Guide"
description: "RTCStatsReport is the only telemetry source that reflects what the user actually heard. Here is the 2026 mapping from getStats fields to AI voice SLOs."
canonical: https://callsphere.ai/blog/vw3e-rtcstatsreport-interpretation-ai-agents-2026
category: "AI Infrastructure"
tags: ["WebRTC", "getStats", "RTCStatsReport", "Observability", "Voice AI"]
author: "CallSphere Team"
published: 2026-03-25T00:00:00.000Z
updated: 2026-05-07T09:59:24.517Z
---

# Reading RTCStatsReport for AI Voice Agents: 2026 Field Guide

> RTCStatsReport is the only telemetry source that reflects what the user actually heard. Here is the 2026 mapping from getStats fields to AI voice SLOs.

> Server logs lie about audio quality. The only ground truth is what the receiver's jitter buffer experienced — and that is exclusively in `RTCStatsReport`. For AI voice in 2026, getStats is the SLO data plane.

## Why getStats matters more for AI voice

Old-school voice apps measured MOS at the SBC. AI voice agents do not have an SBC at the user edge — the browser is the edge. A 2% packet-loss spike in the user's coffee-shop Wi-Fi never appears in your server traces. It only appears in `getStats` on the browser, and only the values that come from the receiver-side report tell you what actually played.

The W3C Statistics API (webrtc-stats) defines stable identifiers for every relevant metric: `inbound-rtp`, `outbound-rtp`, `remote-inbound-rtp`, `candidate-pair`, `media-source`, and `media-playout` (the new one, important for jitter-buffer health). The 2026 update added `ecn-ce-marks-received` for L4S detection and finalized `audioLevel` semantics across browsers.

## Architecture pattern

```mermaid
flowchart LR
  Browser -- getStats every 2s --> Collector
  Collector -- diff over time --> Metrics[(Prometheus / Datadog)]
  Metrics --> SLO[Voice SLO dashboard]
  Metrics --> Alert[PagerDuty if p99 jitter > 30ms]
```

You poll `pc.getStats()` once every 2 seconds, diff the cumulative counters against the previous sample, and emit gauge metrics. Sub-second polling does not help — the underlying counters update at most once per RTCP report (every 1–5 s).

## Key fields to track

- **inbound-rtp**: `packetsLost`, `jitter`, `packetsReceived`, `bytesReceived`, `audioLevel`.
- **media-playout**: `totalSamplesReceived`, `concealedSamples`, `silentConcealedSamples`. Concealed samples are the smoking gun for "did the user hear glitches?"
- **remote-inbound-rtp**: `roundTripTime`, `fractionLost`. Tells you what the *agent* sees coming from the user.
- **candidate-pair** (selected): `currentRoundTripTime`, `bytesSent`, `bytesReceived`, `localCandidateId` / `remoteCandidateId` (use to detect TURN relay).
- **media-source**: `audioLevel`, `totalAudioEnergy`. Use sparingly — they leak speaker activity.

## CallSphere implementation

CallSphere ships getStats collection in the browser SDK that powers all six verticals (real estate, healthcare, behavioral health, legal, salon, insurance). Polling at 2 s, we ship 14 fields per call to a Prometheus pushgateway routed through our Pion Go gateway 1.23 over NATS to the 6-container pod (CRM, MLS, calendar, SMS, audit, transcript).

The single most useful derived metric is `concealedSamples / totalSamplesReceived` — anything above 1.5% means the AI agent's response was audibly degraded, regardless of how clean the model output was. Across 37 agents, 90+ tools, 115+ database tables we alert on p99 RTT, p99 jitter, and concealment ratio. Real Estate (OneRoof, [/industries/real-estate](/industries/real-estate)) sees the lowest concealment because of the noise-suppression worker described in the Insertable Streams post. SOC 2 + HIPAA logs only carry derived numbers — never raw audio level. Pricing $149/$499/$1499; affiliates 22% — see [/affiliate](/affiliate).

## Code snippet

```ts
let prev = new Map();
setInterval(async () => {
  const stats = await pc.getStats();
  for (const [id, report] of stats) {
    if (report.type === "inbound-rtp" && report.kind === "audio") {
      const last = prev.get(id);
      if (last) {
        const dPackets = report.packetsReceived - last.packetsReceived;
        const dLost = report.packetsLost - last.packetsLost;
        const lossRate = dLost / Math.max(1, dPackets + dLost);
        emit("voice.inbound.lossRate", lossRate);
        emit("voice.inbound.jitter", report.jitter);
      }
      prev.set(id, report);
    }
    if (report.type === "media-playout") {
      emit("voice.playout.concealed",
        report.concealedSamples / Math.max(1, report.totalSamplesReceived));
    }
    if (report.type === "candidate-pair" && report.selected) {
      emit("voice.rtt", report.currentRoundTripTime);
      emit("voice.relay", report.remoteCandidateId.includes("relay") ? 1 : 0);
    }
  }
}, 2000);
```

## Build steps

1. Poll every 2 s; diff cumulative counters; never store absolute values.
2. Tag every metric with `agentId`, `vertical`, `sessionId`, `browser`.
3. Route raw values through a server you control before any third-party APM. Browser-emitted PHI in stats is a real concern.
4. Build SLO dashboards on three numbers: RTT, jitter, concealment ratio.
5. Alert on concealment > 1.5% sustained for 30 s. That is the user-noticed bar.
6. Cross-reference `localCandidateId` / `remoteCandidateId` with the candidate-pair report to detect when a session is on TURN.
7. Persist a 24-hour rolling sample for forensic playback; engineers will need it.

## Common pitfalls

- **Polling `getStats` from the main thread** — it is a fast call but during garbage collection it can spike to 30 ms. Schedule on `requestIdleCallback` if you can.
- **Treating absolute counters as gauges** — `packetsLost` is monotonic. Always diff.
- **Ignoring `media-playout`** — the smoking gun for user-perceived audio glitches lives only here.
- **Cross-tab leakage** — multi-tab apps may have multiple PeerConnections. Tag stats per PC.
- **Privacy via `audioLevel`** — do not ship raw `audioLevel` to a third-party APM; it leaks speaker activity timing.

## FAQ

**Is concealedSamples in every browser?** Yes — `media-playout` is in Chrome 110+, Safari 17+, Firefox 122+.

**Why not poll faster?** RTCP only updates the underlying counters every 1–5 seconds; polling faster gives no new data.

**What about MOS?** No browser computes MOS for you. Compute it server-side from packet loss, jitter, and RTT using ITU-T G.107.

**Are stats privacy-sensitive?** `audioLevel` can leak whether someone is speaking. Strip it from anything that hits a third-party APM.

**Is there a single field that summarizes call quality?** No — combine RTT, jitter, and concealment. A weighted score works well.

**What is the cost of `getStats`?** Sub-millisecond on modern hardware; safe to call every 2 s.

**Does it work in workers?** No — `getStats` is on the main thread or the document only. Pull the data out, then ship to a worker for analysis.

**Can I correlate stats with model latency?** Yes — emit `response.first_audio_delay` from the DataChannel side and join with `getStats` server-side.

## Production playbook for AI voice teams in 2026

Three rules from shipping getStats observability across 37 agents:

1. **Compute MOS server-side.** ITU-T G.107 plus packet loss, jitter, and RTT gives you a reliable per-call MOS proxy. Do not trust client-side approximations.
2. **Tag the candidate type.** `relay` vs `srflx` vs `host` is the single best predictor of call quality. Tag every metric with it.
3. **Sample, do not aggregate, on the client.** Send raw per-second samples; aggregate in your TSDB. Aggregation on the client throws away the noise pattern that points at the actual problem.

The most underrated metric is `media-source.totalAudioEnergy` divided by elapsed time — silent calls have a known energy-vs-time signature, and "user said nothing for 30 s" is a very specific failure mode that needs a dedicated alert.

## Watch list 2026

- **`ecn-ce-marks-received`** lands in Chromium 132 and lights up L4S adoption tracking — see the L4S post.
- **`media-playout.delay`** is being added to the spec; it gives you the actual playout delay, not just concealment.
- **`remote-outbound-rtp` becomes mandatory** in the 2026-Q3 spec update; previously many browsers omitted it, breaking RTT calculations in some flows.
- **Privacy guidance** is being formalised in webrtc-stats — expect tighter rules on what `audioLevel` and `totalAudioEnergy` you can ship to third parties without consent.

## Sources

- [https://www.w3.org/TR/webrtc-stats/](https://www.w3.org/TR/webrtc-stats/)
- [https://bloggeek.me/getstats/](https://bloggeek.me/getstats/)
- [https://www.100ms.live/blog/measuring-webrtc-call-quality-part-1](https://www.100ms.live/blog/measuring-webrtc-call-quality-part-1)
- [https://www.webrtc-developers.com/interpreting-webrtc-statistics-with-rtcstats/](https://www.webrtc-developers.com/interpreting-webrtc-statistics-with-rtcstats/)
- [https://github.com/peermetrics/webrtc-stats](https://github.com/peermetrics/webrtc-stats)
- [https://github.com/oanguenot/webrtcmetrics](https://github.com/oanguenot/webrtcmetrics)

See live SLOs on [/demo](/demo), see [/pricing](/pricing), or start a [/trial](/trial).

---

Source: https://callsphere.ai/blog/vw3e-rtcstatsreport-interpretation-ai-agents-2026
