---
title: "Twilio Voice <Stream> Bidirectional Patterns for AI Agents (2026)"
description: "Bidirectional Media Streams ship raw mulaw both directions over a single WebSocket. We break down the four patterns CallSphere ships in production: proxy-to-OpenAI, sidecar STT, conference fork, and replay-on-reconnect."
canonical: https://callsphere.ai/blog/vw8d-twilio-voice-stream-bidirectional-patterns-2026
category: "AI Infrastructure"
tags: ["Twilio", "Media Streams", "WebSocket", "OpenAI Realtime", "Voice AI"]
author: "CallSphere Team"
published: 2026-03-15T00:00:00.000Z
updated: 2026-05-07T22:22:59.688Z
---

# Twilio Voice <Stream> Bidirectional Patterns for AI Agents (2026)

> Bidirectional Media Streams ship raw mulaw both directions over a single WebSocket. We break down the four patterns CallSphere ships in production: proxy-to-OpenAI, sidecar STT, conference fork, and replay-on-reconnect.

> **TL;DR** — Bidirectional `` is the cleanest path from PSTN to a Realtime LLM. Send mulaw 8 kHz both ways, mark every chunk with a sequence number, and gate barge-in on the `mark` event — not on the audio buffer.

## Background

Twilio's `` verb opens a WebSocket from the call leg to your server. In **unidirectional** mode you only receive audio (good for transcription). In **bidirectional** mode (``) you can also push base64-encoded mulaw frames *back* into the call. That second direction is what unlocks AI voice agents — you stream OpenAI Realtime / Deepgram Aura / ElevenLabs output straight onto the PSTN line without a second SIP leg.

Anatomy of a stream:

- **start** event — once per call, contains `streamSid`, `callSid`, `accountSid`, custom parameters.
- **media** events — 20 ms mulaw frames, base64-encoded, ~50 per second per direction.
- **mark** events — your own labels. Twilio echoes them back when the corresponding outbound audio finishes playing. This is the only reliable barge-in signal.
- **stop** event — leg ended.

## Architecture / config

```mermaid
flowchart LR
  PSTN[Caller / PSTN] --> TW[Twilio Voice]
  TW -- TwiML <Stream bidirectional> --> WS[wss://yourapp/stream]
  WS -- inbound mulaw --> STT[STT or Realtime API]
  STT --> LLM[LLM turn]
  LLM --> TTS[TTS or Realtime API]
  TTS -- outbound mulaw --> WS
  WS -- "mark" events --> BARGE[Barge-in detector]
  BARGE -- "clear" --> WS
```

Four patterns we run in production:

1. **Proxy-to-Realtime** — your WS server proxies frames straight into OpenAI Realtime over a second WS. ~120 ms median round trip.
2. **Sidecar STT + LLM + TTS** — split STT (Deepgram), LLM (Anthropic / OpenAI Chat), TTS (ElevenLabs streaming). Higher latency (~450 ms) but per-stage observability.
3. **Conference fork** — call goes into a Twilio ``, you fork audio to your AI stream, and an AI participant is added back via a TwiML App. Useful for AI as 3rd party.
4. **Replay-on-reconnect** — buffer last 8 s of inbound + last 4 s of outbound on Redis; on `stop` followed by a new `start` with the same `callSid`, replay so the LLM has continuity.

## CallSphere implementation

CallSphere runs **Twilio across all six verticals**. The Healthcare agent fronts a FastAPI service on port `:8084` that proxies bidirectional audio into OpenAI Realtime; Sales runs five concurrent outbound calls per account with separate WS workers; the After-hours agent fires a simultaneous voice call + SMS in a 120-second race. Every leg flows through the same `/twilio/stream` Fastify route, with `streamSid` keyed into Postgres for replay.

Stack snapshot:

- **37 specialized agents · 90+ tools · 115+ DB tables · 6 verticals**.
- **HIPAA + SOC 2** — TLS to the WS, mulaw recording opt-in per tenant, BAA covers Twilio + OpenAI.
- **$149 / $499 / $1499 plans · 14-day trial · 22% lifetime affiliate**.

## Build steps with code

```xml

```

```ts
// Fastify WS handler — frames inbound, mark-gated barge-in
app.register(websocket);
app.get("/twilio/stream", { websocket: true }, (conn) => {
  let streamSid = "";
  conn.socket.on("message", async (raw) => {
    const evt = JSON.parse(raw.toString());
    if (evt.event === "start") streamSid = evt.start.streamSid;
    if (evt.event === "media") openai.sendAudio(evt.media.payload);
    if (evt.event === "mark" && evt.mark.name === "tts-end") openai.flush();
  });
  openai.on("audio", (b64) => {
    conn.socket.send(JSON.stringify({ event: "media", streamSid, media: { payload: b64 } }));
    conn.socket.send(JSON.stringify({ event: "mark", streamSid, mark: { name: "tts-end" } }));
  });
});
```

## Pitfalls

- **Forgetting `bidirectional="true"`** — you'll silently get one-way audio and waste an afternoon.
- **Not echoing `streamSid`** in outbound media — Twilio drops the frame.
- **Using sample rate 16 kHz** — `` is mulaw 8 kHz only on PSTN; resample.
- **Treating audio buffer length as barge-in** — race condition. Trust `mark` events.
- **Logging full base64 frames** — explodes Datadog cost; log every 200th frame at most.

## FAQ

**Q: How many bidirectional streams per Twilio account?**
Default cap is 100 concurrent; raise via support ticket. We run 800 concurrent in production.

**Q: Mulaw vs PCM?**
PSTN is mulaw 8 kHz. Twilio `` does not transcode for you — your TTS must output mulaw or you must resample server-side.

**Q: Can I record while streaming?**
Yes — `` plus standard `` works. Recordings are stored separately.

**Q: How do I detect dropped streams?**
Watch for `stop` events without prior `mark` echoes within 5 s. Reconnect with replay buffer.

**Q: Latency floor?**
~80 ms one-way Twilio→WS in us-east-1. Add LLM + TTS to estimate end-to-end.

## Sources

- [Twilio Docs — Media Streams Overview](https://www.twilio.com/docs/voice/media-streams)
- [Twilio Blog — AI Voice Assistant + OpenAI Realtime](https://www.twilio.com/en-us/blog/voice-ai-assistant-openai-realtime-api-node)
- [Twilio Blog — ConversationRelay Launch](https://www.twilio.com/en-us/blog/products/launches/the-evolution-of-conversation-relay)
- [Twilio Blog — Agent Connect GA](https://www.twilio.com/en-us/blog/products/launches/agent-connect)

---

Source: https://callsphere.ai/blog/vw8d-twilio-voice-stream-bidirectional-patterns-2026
