TL;DR — Retell ships a polished flow builder and $0.07/min PSTN, but it gates real WebRTC, agent-side video and multi-room orchestration. LiveKit Cloud's official OpenAI Realtime plugin gives you all three on infrastructure OpenAI itself partnered with for Advanced Voice.

What you'll build

A Python LiveKit agent that connects to a room, talks to OpenAI Realtime, supports video frame input (when the user shares camera), and warm-transfers to a human via SIP. Same UX a Retell flow gives you, but you own the audio path end-to-end.

Prerequisites

Retell account with at least one production flow + 30 days of analytics.
LiveKit Cloud project (free tier covers prototyping).
OpenAI API key with Realtime access (gpt-4o-realtime-preview).
Python 3.11+, livekit-agents, livekit-plugins-openai.
Twilio Elastic SIP for PSTN ingress.

Architecture

sequenceDiagram
  participant C as Caller (PSTN)
  participant SIP as Twilio Elastic SIP
  participant LK as LiveKit Cloud SFU
  participant AG as Python agent
  participant OAI as OpenAI Realtime
  C->>SIP: PSTN INVITE
  SIP->>LK: SIP -> Room
  AG->>LK: dispatch agent into Room
  AG->>OAI: WSS Realtime
  LK<-->AG: bidirectional Opus
  AG-->>LK: TTS audio frames
  LK-->>SIP: Opus
  SIP-->>C: PSTN audio

Step 1 — Spin up the agent

```python from livekit import agents from livekit.agents import AgentSession, Agent, RoomInputOptions from livekit.plugins import openai

class Receptionist(Agent): def init(self): super().init(instructions=open("prompts/recept.md").read())

async def entrypoint(ctx: agents.JobContext): session = AgentSession( llm=openai.realtime.RealtimeModel( voice="alloy", model="gpt-4o-realtime-preview-2025-06-03", ), ) await session.start( agent=Receptionist(), room=ctx.room, room_input_options=RoomInputOptions(video_enabled=True), )

if name == "main": agents.cli.run_app(agents.WorkerOptions(entrypoint_fnc=entrypoint)) ```

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

Step 2 — Wire Twilio Elastic SIP to LiveKit

LiveKit's SIP service needs an inbound trunk. Create one:

```bash lk sip inbound create '{ "name": "twilio-trunk", "numbers": ["+18452345678"], "auth_username": "lk_user", "auth_password": "" }' ```

Then on the Twilio side point the trunk's "Origination URI" at sip:<your-lk-host>:5060.

Step 3 — Define tools

LiveKit agents register tools via decorators:

```python from livekit.agents import function_tool, RunContext

class Receptionist(Agent): @function_tool async def book_appointment(self, ctx: RunContext, name: str, phone: str, slot_iso: str) -> dict: return await crm.book(name=name, phone=phone, when=slot_iso) ```

Step 4 — Warm transfer to a human

```python @function_tool async def transfer_to_human(self, ctx: RunContext, reason: str): await ctx.proc.job.room.local_participant.publish_data( f"transferring: {reason}".encode(), reliable=True) await sip.transfer_call(participant=ctx.proc.job.participant, sip_uri="sip:human@your-pbx") ```

Step 5 — Add video understanding

OpenAI Realtime now accepts video frames. RoomInputOptions(video_enabled=True) already enabled it; ask the user to share their camera and the model will consume frames.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Step 6 — Migrate Retell flows

Retell flow nodes typically map to a single Agent + tools. Where Retell uses "branch" nodes, use Agent.handoff(another_agent) for clean specialist transfers.

Step 7 — Cutover plan

Move 10% of inbound to LiveKit for a week. Compare booking conversion and average handle time. Ramp 10 → 30 → 70 → 100.

Common pitfalls

SIP NAT. LiveKit Cloud handles SIP traversal; on-prem deployments need a public IP.
Plugin version drift. Pin livekit-plugins-openai==0.x — the API moves quarterly.
Long-form context. Realtime drops oldest turns past ~25k tokens; offload to a tool that fetches context on demand.

How CallSphere does this in production

CallSphere's OneRoof Property stack runs 10 specialists over WebRTC + Pion + NATS — a similar shape to LiveKit but self-hosted for cost at scale. Healthcare uses OpenAI Realtime over FastAPI :8084 with HIPAA logging across 14 tools. Salon runs ElevenLabs voices and GB-YYYYMMDD-### references on 4 agents. 37 agents total, 90+ tools, 115+ DB tables, 6 verticals. /compare/retell · /pricing.

FAQ

Will Retell's voice match LiveKit's? Both use OpenAI voices — choose the same voice parameter.

Latency? LiveKit Cloud + Realtime hits 550–700ms p50.

SOC2/HIPAA? LiveKit Cloud is SOC2; HIPAA via BAA.

Can I A/B inside the same number? Yes — Twilio's <Dial><Sip> weights work fine.

Outbound campaigns? Use livekit-cli sip create-call to dial out.

Replace Retell With LiveKit Cloud + OpenAI Realtime

What you'll build

Prerequisites

Architecture

Step 1 — Spin up the agent

Step 2 — Wire Twilio Elastic SIP to LiveKit

Step 3 — Define tools

Step 4 — Warm transfer to a human

Step 5 — Add video understanding

Step 6 — Migrate Retell flows

Step 7 — Cutover plan

Common pitfalls

How CallSphere does this in production

FAQ

Sources

Try CallSphere AI Voice Agents

Related Articles You May Like

WebRTC Mobile Testing with BrowserStack + Sauce Labs (2026)

Build a Voice Agent on Cloudflare Workers AI (No External LLM)

Build a Chat Agent with Haystack RAG + Open LLM (Llama 3.2, 2026)

OpenAI's May 2026 WebRTC Rearchitecture: How Voice Latency Got Real

How to Build Voice Agent CI/CD with Evals as Gate (GitHub Actions)

Building a Custom Calling Platform: Enterprise Guide