---
title: "Jitter Buffer Tuning for AI Agent Latency in 2026: Less Buffer, More Brain"
description: "Default jitter buffers (60-200 ms) are tuned for human ears, not LLM turn-taking. Here is how to retune them for AI voice agents without melting your audio quality on cell phone calls."
canonical: https://callsphere.ai/blog/vw3d-jitter-buffer-tuning-ai-latency-2026
category: "AI Infrastructure"
tags: ["Jitter Buffer", "Latency", "RTP", "Voice AI", "VAD"]
author: "CallSphere Team"
published: 2026-03-19T00:00:00.000Z
updated: 2026-05-07T09:59:38.191Z
---

# Jitter Buffer Tuning for AI Agent Latency in 2026: Less Buffer, More Brain

> Default jitter buffers (60-200 ms) are tuned for human ears, not LLM turn-taking. Here is how to retune them for AI voice agents without melting your audio quality on cell phone calls.

> A 200 ms jitter buffer is invisible to a human listener and devastating to an AI voice agent. It is one of the easiest end-to-end latency improvements you can ship, and one of the most overlooked.

## Background

```mermaid
flowchart LR
  UA[SIP UA] -- REGISTER --> Reg[Registrar]
  UA -- INVITE --> Proxy[SIP Proxy]
  Proxy --> Dispatcher[Kamailio dispatcher]
  Dispatcher --> Worker1[FreeSWITCH worker]
  Dispatcher --> Worker2[FreeSWITCH worker]
  Worker1 --> AI[(AI agent)]
  Worker2 --> AI
```

CallSphere reference architecture

A jitter buffer holds incoming RTP packets briefly to absorb network-induced timing variation before play-out. Adaptive jitter buffers (AJB) adjust their depth between configured min and max based on observed jitter. Default values across SIP stacks (Asterisk, FreeSWITCH, PJSIP, Twilio) cluster around 60-200 ms minimum and 200-500 ms maximum. Those defaults are tuned for one-way listening: the buffer can grow to absorb a 300 ms reordering event without dropouts, and the human ear forgives the latency.

For AI voice that math is wrong. The AI's loop is: voice activity detection -> waited end-of-turn -> ASR partial -> ASR final -> LLM token stream -> TTS first byte -> RTP send. Every milliseconds of jitter buffer on inbound RTP becomes a millisecond of additional turn-taking lag. The user perceives the delay between finishing their sentence and the AI starting its response. Studies put the human "feels natural" threshold at around 800 ms; modern AI voice ships in the 600-1500 ms range. Cutting 100 ms off the jitter buffer is a meaningful share of that budget.

## Technical deep-dive

Three knobs matter: minimum depth, maximum depth, and adaptation aggressiveness.

```ini
; Asterisk rtp.conf for AI voice agent inbound
[general]
rtpstart=10000
rtpend=20000
rtpchecksums=no
strictrtp=yes
icesupport=no

[jitter]
jbenable=yes
jbforce=yes
jbmaxsize=120
jbresyncthreshold=500
jbimpl=adaptive
jbtargetextra=20
```

```xml

```

The FreeSWITCH triplet sets minimum 20 ms, maximum 80 ms, and target 60 ms. That is aggressive but right for a low-loss path. The PLC flag turns on packet-loss concealment so a single dropped packet does not punch a hole through to the ASR engine.

For PJSIP-based Twilio Voice SDK clients, the relevant setting is `audioJitterBufferMaxPackets` and `audioJitterBufferFastAccelerate` in the createPeerConnection options. WebRTC defaults are roughly 50-200 ms depending on the implementation; Chrome adapted faster than Firefox in 2026 measurements.

The risk is over-cutting: a buffer too shallow drops packets on burst jitter. Symptom: choppy ASR, missing words at the start of utterances, ASR engines transcribing "hello" as "ello".

## CallSphere implementation

CallSphere runs Twilio Programmable Voice across all six verticals. For Healthcare AI on FastAPI :8084 we accept Twilio Media Streams over WebSocket; Twilio's send-side jitter is sub-30 ms median and we run no additional jitter buffer between the WebSocket and OpenAI Realtime, instead relying on Realtime's own server VAD turn detection. For Sales Calling AI's 5 concurrent outbound calls per tenant we tune our internal bridge buffer to 40-80 ms because cell phone destinations have higher native jitter. After-Hours AI uses simul call+SMS to on-call staff with a 120-second timeout where reliability beats latency optimization. Across 37 agents, 90+ tools, 115+ DB tables, HIPAA + SOC 2, $149/$499/$1499 pricing, and 14-day trial, the buffer policy is documented per-vertical with measured turn-taking latency on each.

## Implementation steps

1. Establish your baseline: log RTP arrival timestamps for 1000 calls and compute the 95th percentile inter-packet delta minus the expected packetization interval.
2. If P95 jitter is under 30 ms, you are over-buffered with defaults; cut min to 20 ms and max to 80 ms.
3. If P95 jitter is over 80 ms, you are likely on a noisy mobile path; keep max at 120-150 ms but reduce min to 40 ms.
4. Turn on PLC unconditionally. A reconstructed packet beats a hole.
5. Measure end-to-end turn-taking latency before and after with a stopwatch script: send "hello", wait, time first AI audio byte.
6. Add jitter and loss alerts to your CDR pipeline; sustained P95 above 100 ms means a peering or SBC problem.
7. For OpenAI Realtime, prefer server VAD turn detection over client; the model has its own end-of-turn classifier that handles small jitter implicitly.
8. Re-run quarterly; carrier and SIP peering paths shift.

## FAQ

**Why not just remove the jitter buffer entirely?**
Even a 5 ms buffer is doing useful work because RTP arrival is bursty. Zero buffer creates audio glitches the ASR engine hears as phantom syllables.

**Does Twilio expose jitter buffer config on Programmable Voice?**
Not on the trunk side, but on Voice SDK clients you can tune it in the WebRTC PeerConnection. On Media Streams the server-side buffer is short and you can effectively bypass it.

**Will this break for callers on slow networks?**
You will see more PLC events and occasional ASR misreads on bad networks. Set a fallback buffer depth that triggers on detected loss above 5%.

**How does this interact with Opus FEC?**
Opus inband FEC reconstructs the previous frame so a small jitter buffer plus FEC is more robust than a large jitter buffer with G.722.

**What about the outbound side?**
Outbound RTP is sent on a fixed cadence; jitter buffer only matters on the receiver. The AI as receiver is the side to optimize.

## Sources

- [Wildix: RTP, RTCP and Jitter Buffer](https://blog.wildix.com/rtp-rtcp-jitter-buffer/)
- [Telnyx: Adaptive Jitter Buffer for SIP Trunking](https://developers.telnyx.com/docs/voice/sip-trunking/features/jitter-buffer)
- [PJSIP: Jitter Buffer Features and Operations](https://docs.pjsip.org/en/2.15.1/specific-guides/audio/jitter_buffer.html)

Start a [14-day trial](/trial) and feel the latency, see [pricing](/pricing) for $149/$499/$1499 tiers, or [contact us](/contact) about jitter buffer tuning for high-stakes voice AI.

---

Source: https://callsphere.ai/blog/vw3d-jitter-buffer-tuning-ai-latency-2026