---
title: "Replace Vapi with Twilio Media Streams + GPT-4o Realtime"
description: "Vapi orchestrates STT, LLM and TTS as separate services — fast, but you pay 3 vendor markups. Collapse the stack to Twilio + GPT-4o Realtime and own the orchestration."
canonical: https://callsphere.ai/blog/vw3h-replace-vapi-with-twilio-media-streams-gpt-4o-realtime
category: "AI Engineering"
tags: ["Vapi", "Twilio", "OpenAI Realtime", "Migration", "Tutorial"]
author: "CallSphere Team"
published: 2026-03-19T00:00:00.000Z
updated: 2026-05-07T09:59:34.124Z
---

# Replace Vapi with Twilio Media Streams + GPT-4o Realtime

> Vapi orchestrates STT, LLM and TTS as separate services — fast, but you pay 3 vendor markups. Collapse the stack to Twilio + GPT-4o Realtime and own the orchestration.

> **TL;DR** — Vapi is a great prototyping layer (5-minute "hello phone"), but at scale the orchestration cost and the inability to share state across calls becomes painful. Move to Twilio Media Streams + GPT-4o Realtime, keep the same assistant config in code, and lose ~30% of the per-minute cost.

## What you'll build

A Node.js service that exposes the same surface as a Vapi assistant (`POST /assistant/start`, server-tool webhooks, transcript stream) but speaks directly to GPT-4o Realtime. Your existing Vapi tool URLs keep working with a thin shim.

## Prerequisites

1. Vapi account with at least one assistant in production and 30+ days of `call.ended` webhooks logged.
2. Twilio account with a programmable voice number.
3. OpenAI API key with Realtime access.
4. Node.js 22+, `ws`, `fastify`, `twilio`.
5. A diff tool — you'll be comparing tool-call sequences.

## Architecture

```mermaid
flowchart LR
  C[Caller] -->|PSTN| TW[Twilio Number]
  TW -->|Media Streams WSS| SH[Shim Service]
  SH -->|WSS Realtime| OAI[GPT-4o Realtime]
  SH -->|HTTPS webhooks| TOOLS[Your existing Vapi tool URLs]
```

## Step 1 — Export the Vapi assistant config

```bash
curl -H "Authorization: Bearer $VAPI_KEY" \
  [https://api.vapi.ai/assistant/$ID](https://api.vapi.ai/assistant/$ID) > assistant.json
```

You only need three sections: `model.messages` (system prompt), `model.tools` (function definitions), and `voice` settings.

## Step 2 — Translate to OpenAI session.update

```js
import fs from "node:fs";
const v = JSON.parse(fs.readFileSync("assistant.json"));
const sessionUpdate = {
  type: "session.update",
  session: {
    instructions: v.model.messages.find(m => m.role === "system").content,
    voice: v.voice?.voiceId === "rachel" ? "shimmer" : "alloy",
    input_audio_format: "g711_ulaw",
    output_audio_format: "g711_ulaw",
    turn_detection: { type: "server_vad", silence_duration_ms: 320 },
    tools: v.model.tools.map(t => ({
      type: "function",
      name: t.function.name,
      description: t.function.description,
      parameters: t.function.parameters,
    })),
  },
};
```

## Step 3 — Wire Twilio Media Streams to OpenAI

```js
import Fastify from "fastify";
import websocket from "@fastify/websocket";
import WebSocket from "ws";

const app = Fastify();
await app.register(websocket);

app.get("/media", { websocket: true }, (conn) => {
  const oai = new WebSocket(
    "wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2025-06-03",
    { headers: { Authorization: `Bearer ${process.env.OPENAI_API_KEY}`,
                 "OpenAI-Beta": "realtime=v1" }});
  oai.on("open", () => oai.send(JSON.stringify(sessionUpdate)));
  conn.socket.on("message", (raw) => {
    const ev = JSON.parse(raw);
    if (ev.event === "media")
      oai.send(JSON.stringify({ type: "input_audio_buffer.append",
                                audio: ev.media.payload }));
  });
  oai.on("message", (raw) => {
    const ev = JSON.parse(raw);
    if (ev.type === "response.audio.delta")
      conn.socket.send(JSON.stringify({ event: "media",
                                        media: { payload: ev.delta }}));
  });
});
```

## Step 4 — Forward tool calls to your existing Vapi webhooks

Vapi posts `{message:{toolCalls:[{function:{name,arguments}}]}}`. Mirror that envelope so endpoints don't change:

```js
async function handleToolCall(name, args, callId) {
  const url = process.env[`TOOL_URL_${name.toUpperCase()}`];
  const res = await fetch(url, {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({ message: { toolCalls: [{ id: callId,
      type: "function", function: { name, arguments: JSON.stringify(args) }}]}}),
  });
  const data = await res.json();
  return data.results?.[0]?.result ?? data;
}
```

## Step 5 — Stream transcripts to your warehouse

OpenAI emits `response.audio_transcript.done`; capture both sides and write to BigQuery or Postgres for the same dashboards Vapi gave you.

## Step 6 — Smoke-test against the Vapi reference

Hit the same DID through both stacks 50 times and diff tool-call order. Anything > 5% drift means your prompt is too implicit — add explicit tool-use guardrails.

## Step 7 — Cut over per-DID

Twilio lets you point each number at a different webhook. Move one number a day for a week.

## Common pitfalls

- **Tool schemas with `oneOf`.** OpenAI Realtime is stricter than Vapi — flatten unions.
- **Voice mismatch.** Pre-record a 10-second comparison clip per voice to avoid customer surprise.
- **Silent failures.** Always log `response.error` events; Vapi hid these.

## How CallSphere does this in production

CallSphere does not use Vapi or any orchestration vendor — every voice path is direct OpenAI Realtime, ElevenLabs or self-hosted Whisper, glued by a CallSphere-owned dispatcher. 37 specialist agents, 90+ tools, 115+ DB tables. Healthcare runs FastAPI on :8084 with HIPAA logging, OneRoof Property dispatches across 10 specialists over WebRTC + Pion + NATS, Salon ships ElevenLabs with `GB-YYYYMMDD-###` booking IDs. Try the [demo](/demo) or compare on [/compare/vapi](/compare/vapi).

## FAQ

**Does Vapi block exports?** No — assistants and tools are JSON-exportable.

**What about Vapi's analytics?** Replace with Postgres + Metabase or Honeycomb; richer and cheaper at scale.

**Can I keep Vapi for prototyping?** Yes — many teams prototype on Vapi, ship on direct.

**Latency parity?** OpenAI Realtime hits 600–800ms; Vapi runs 750–1100ms. Direct usually wins.

**Cost at 50k min/mo?** Vapi: ~$5,250. Direct: ~$3,400 + Twilio.

## Sources

- [Vapi Docs](https://docs.vapi.ai/quickstart/introduction)
- [OpenAI Realtime Quickstart](https://platform.openai.com/docs/guides/realtime)
- [Twilio + OpenAI integration](https://www.twilio.com/en-us/blog/twilio-openai-realtime-api-launch-integration)
- [/compare/vapi](https://callsphere.ai/compare/vapi)

---

Source: https://callsphere.ai/blog/vw3h-replace-vapi-with-twilio-media-streams-gpt-4o-realtime