Replace Retell With LiveKit Cloud + OpenAI Realtime
Retell's flow builder is great for SMB. When you need real WebRTC, video frames and SFU control, LiveKit Cloud + Realtime is the right move. Full code path.
TL;DR — Retell ships a polished flow builder and $0.07/min PSTN, but it gates real WebRTC, agent-side video and multi-room orchestration. LiveKit Cloud's official OpenAI Realtime plugin gives you all three on infrastructure OpenAI itself partnered with for Advanced Voice.
What you'll build
A Python LiveKit agent that connects to a room, talks to OpenAI Realtime, supports video frame input (when the user shares camera), and warm-transfers to a human via SIP. Same UX a Retell flow gives you, but you own the audio path end-to-end.
Prerequisites
- Retell account with at least one production flow + 30 days of analytics.
- LiveKit Cloud project (free tier covers prototyping).
- OpenAI API key with Realtime access (gpt-4o-realtime-preview).
- Python 3.11+,
livekit-agents,livekit-plugins-openai. - Twilio Elastic SIP for PSTN ingress.
Architecture
sequenceDiagram
participant C as Caller (PSTN)
participant SIP as Twilio Elastic SIP
participant LK as LiveKit Cloud SFU
participant AG as Python agent
participant OAI as OpenAI Realtime
C->>SIP: PSTN INVITE
SIP->>LK: SIP -> Room
AG->>LK: dispatch agent into Room
AG->>OAI: WSS Realtime
LK<-->AG: bidirectional Opus
AG-->>LK: TTS audio frames
LK-->>SIP: Opus
SIP-->>C: PSTN audio
Step 1 — Spin up the agent
```python from livekit import agents from livekit.agents import AgentSession, Agent, RoomInputOptions from livekit.plugins import openai
class Receptionist(Agent): def init(self): super().init(instructions=open("prompts/recept.md").read())
async def entrypoint(ctx: agents.JobContext): session = AgentSession( llm=openai.realtime.RealtimeModel( voice="alloy", model="gpt-4o-realtime-preview-2025-06-03", ), ) await session.start( agent=Receptionist(), room=ctx.room, room_input_options=RoomInputOptions(video_enabled=True), )
if name == "main": agents.cli.run_app(agents.WorkerOptions(entrypoint_fnc=entrypoint)) ```
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Step 2 — Wire Twilio Elastic SIP to LiveKit
LiveKit's SIP service needs an inbound trunk. Create one:
```bash
lk sip inbound create '{
"name": "twilio-trunk",
"numbers": ["+18452345678"],
"auth_username": "lk_user",
"auth_password": "
Then on the Twilio side point the trunk's "Origination URI" at sip:<your-lk-host>:5060.
Step 3 — Define tools
LiveKit agents register tools via decorators:
```python from livekit.agents import function_tool, RunContext
class Receptionist(Agent): @function_tool async def book_appointment(self, ctx: RunContext, name: str, phone: str, slot_iso: str) -> dict: return await crm.book(name=name, phone=phone, when=slot_iso) ```
Step 4 — Warm transfer to a human
```python @function_tool async def transfer_to_human(self, ctx: RunContext, reason: str): await ctx.proc.job.room.local_participant.publish_data( f"transferring: {reason}".encode(), reliable=True) await sip.transfer_call(participant=ctx.proc.job.participant, sip_uri="sip:human@your-pbx") ```
Step 5 — Add video understanding
OpenAI Realtime now accepts video frames. RoomInputOptions(video_enabled=True) already enabled it; ask the user to share their camera and the model will consume frames.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Step 6 — Migrate Retell flows
Retell flow nodes typically map to a single Agent + tools. Where Retell uses "branch" nodes, use Agent.handoff(another_agent) for clean specialist transfers.
Step 7 — Cutover plan
Move 10% of inbound to LiveKit for a week. Compare booking conversion and average handle time. Ramp 10 → 30 → 70 → 100.
Common pitfalls
- SIP NAT. LiveKit Cloud handles SIP traversal; on-prem deployments need a public IP.
- Plugin version drift. Pin
livekit-plugins-openai==0.x— the API moves quarterly. - Long-form context. Realtime drops oldest turns past ~25k tokens; offload to a tool that fetches context on demand.
How CallSphere does this in production
CallSphere's OneRoof Property stack runs 10 specialists over WebRTC + Pion + NATS — a similar shape to LiveKit but self-hosted for cost at scale. Healthcare uses OpenAI Realtime over FastAPI :8084 with HIPAA logging across 14 tools. Salon runs ElevenLabs voices and GB-YYYYMMDD-### references on 4 agents. 37 agents total, 90+ tools, 115+ DB tables, 6 verticals. /compare/retell · /pricing.
FAQ
Will Retell's voice match LiveKit's? Both use OpenAI voices — choose the same voice parameter.
Latency? LiveKit Cloud + Realtime hits 550–700ms p50.
SOC2/HIPAA? LiveKit Cloud is SOC2; HIPAA via BAA.
Can I A/B inside the same number? Yes — Twilio's <Dial><Sip> weights work fine.
Outbound campaigns? Use livekit-cli sip create-call to dial out.
Sources
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.