---
title: "The Latency Budget for AI Voice Agents Across PSTN in 2026"
description: "Where every millisecond goes between caller and AI: PSTN, carrier, STT, LLM, TTS, and back. The component-level targets that ship in 2026 and how to hit them."
canonical: https://callsphere.ai/blog/vw1d-latency-budget-pstn-ai-stack-2026
category: "AI Engineering"
tags: ["VoIP", "SIP", "Telephony", "AI Voice Agents"]
author: "CallSphere Team"
published: 2026-04-26T00:00:00.000Z
updated: 2026-05-07T09:32:10.928Z
---

# The Latency Budget for AI Voice Agents Across PSTN in 2026

> Where every millisecond goes between caller and AI: PSTN, carrier, STT, LLM, TTS, and back. The component-level targets that ship in 2026 and how to hit them.

> Humans expect a reply within roughly 500 to 700 ms in natural conversation. Anything past one second feels artificial; past two seconds the caller starts talking over the agent. The 2026 latency budget for an AI phone agent is unforgiving and the math is well understood.

## Background: the 2026 latency picture

```mermaid
flowchart TD
  Out[Outbound campaign] --> Twilio[Twilio Voice API]
  Twilio --> STIR[STIR/SHAKEN attestation]
  STIR --> Carrier[Originating carrier]
  Carrier --> Term[Terminating carrier]
  Term --> Recipient[Recipient phone]
  Recipient --> Webhook[/voice webhook/]
  Webhook --> Agent[AI sales agent]
```

CallSphere reference architecture

Twilio published explicit November 2025 targets that the industry has converged on:

- **Mouth-to-ear turn gap** (what the human perceives): target 1,115 ms, upper limit 1,400 ms.
- **Platform turn gap** (internal processing only): target 885 ms, upper limit 1,100 ms.
- **STT first transcript**: target 350 ms, upper limit 500 ms.
- **LLM time-to-first-token**: target 375 ms, upper limit 750 ms.
- **TTS first byte**: target 100 ms, upper limit 250 ms.

ConversationRelay reports

      +15555550199

```

## FAQ

**What's the single biggest latency win in 2026?**
Switching from a cascaded STT → LLM → TTS pipeline to a speech-to-speech model (OpenAI Realtime).

**Does carrier choice really matter?**
Yes, especially at p95 and in regions where the carrier's path differs significantly from your provider's path. Telnyx's private backbone matters most outside the major US metros.

**What's the floor for human-feel latency?**
About 500 to 700 ms mouth-to-ear. Below that, the human experience improves only marginally.

**Can I get under 500 ms?**
Possible end-to-end speech-to-speech with optimized infrastructure, but the PSTN floor is about 500 ms by itself. WebRTC paths can go lower.

**What's the most overlooked optimization?**
End-of-turn detection. Tuning it carefully shaves 100+ ms off perceived latency.

## Sources

- [Twilio Blog: Core Latency in AI Voice Agents](https://www.twilio.com/en-us/blog/developers/best-practices/guide-core-latency-ai-voice-agents)
- [Webex / Cisco: Building Voice AI That Can Keep Up with Real Conversations](https://blog.webex.com/engineering/building-voice-ai-that-can-keep-up-with-real-conversations/)
- [Ruh AI: Voice AI Latency Optimization 2026](https://www.ruh.ai/blogs/voice-ai-latency-optimization)
- [Forasoft: OpenAI Realtime API Production Voice Agents 2026](https://www.forasoft.com/blog/article/openai-realtime-api-voice-agent-production-guide-2026)

Start a [14-day trial](/trial) and measure CallSphere's latency on your own calls, see [pricing](/pricing), or compare with the [Twilio integration](/integrations/twilio).

---

Source: https://callsphere.ai/blog/vw1d-latency-budget-pstn-ai-stack-2026