By Sagar Shankaran, Founder of CallSphere
Retell's flow builder is great for SMB. When you need real WebRTC, video frames and SFU control, LiveKit Cloud + Realtime is the right move. Full code path.
Key takeaways
TL;DR — Retell ships a polished flow builder and $0.07/min PSTN, but it gates real WebRTC, agent-side video and multi-room orchestration. LiveKit Cloud's official OpenAI Realtime plugin gives you all three on infrastructure OpenAI itself partnered with for Advanced Voice.
A Python LiveKit agent that connects to a room, talks to OpenAI Realtime, supports video frame input (when the user shares camera), and warm-transfers to a human via SIP. Same UX a Retell flow gives you, but you own the audio path end-to-end.
livekit-agents, livekit-plugins-openai.sequenceDiagram
participant C as Caller (PSTN)
participant SIP as Twilio Elastic SIP
participant LK as LiveKit Cloud SFU
participant AG as Python agent
participant OAI as OpenAI Realtime
C->>SIP: PSTN INVITE
SIP->>LK: SIP -> Room
AG->>LK: dispatch agent into Room
AG->>OAI: WSS Realtime
LK<-->AG: bidirectional Opus
AG-->>LK: TTS audio frames
LK-->>SIP: Opus
SIP-->>C: PSTN audio
```python from livekit import agents from livekit.agents import AgentSession, Agent, RoomInputOptions from livekit.plugins import openai
class Receptionist(Agent): def init(self): super().init(instructions=open("prompts/recept.md").read())
async def entrypoint(ctx: agents.JobContext): session = AgentSession( llm=openai.realtime.RealtimeModel( voice="alloy", model="gpt-4o-realtime-preview-2025-06-03", ), ) await session.start( agent=Receptionist(), room=ctx.room, room_input_options=RoomInputOptions(video_enabled=True), )
if name == "main": agents.cli.run_app(agents.WorkerOptions(entrypoint_fnc=entrypoint)) ```
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
LiveKit's SIP service needs an inbound trunk. Create one:
```bash
lk sip inbound create '{
"name": "twilio-trunk",
"numbers": ["+18452345678"],
"auth_username": "lk_user",
"auth_password": "
Then on the Twilio side point the trunk's "Origination URI" at sip:<your-lk-host>:5060.
LiveKit agents register tools via decorators:
```python from livekit.agents import function_tool, RunContext
class Receptionist(Agent): @function_tool async def book_appointment(self, ctx: RunContext, name: str, phone: str, slot_iso: str) -> dict: return await crm.book(name=name, phone=phone, when=slot_iso) ```
```python @function_tool async def transfer_to_human(self, ctx: RunContext, reason: str): await ctx.proc.job.room.local_participant.publish_data( f"transferring: {reason}".encode(), reliable=True) await sip.transfer_call(participant=ctx.proc.job.participant, sip_uri="sip:human@your-pbx") ```
OpenAI Realtime now accepts video frames. RoomInputOptions(video_enabled=True) already enabled it; ask the user to share their camera and the model will consume frames.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Retell flow nodes typically map to a single Agent + tools. Where Retell uses "branch" nodes, use Agent.handoff(another_agent) for clean specialist transfers.
Move 10% of inbound to LiveKit for a week. Compare booking conversion and average handle time. Ramp 10 → 30 → 70 → 100.
livekit-plugins-openai==0.x — the API moves quarterly.CallSphere's OneRoof Property stack runs 10 specialists over WebRTC + Pion + NATS — a similar shape to LiveKit but self-hosted for cost at scale. Healthcare uses OpenAI Realtime over FastAPI :8084 with HIPAA logging across 14 tools. Salon runs ElevenLabs voices and GB-YYYYMMDD-### references on 4 agents. 37 agents total, 90+ tools, 115+ DB tables, 6 verticals. /compare/retell · /pricing.
Will Retell's voice match LiveKit's? Both use OpenAI voices — choose the same voice parameter.
Latency? LiveKit Cloud + Realtime hits 550–700ms p50.
SOC2/HIPAA? LiveKit Cloud is SOC2; HIPAA via BAA.
Can I A/B inside the same number? Yes — Twilio's <Dial><Sip> weights work fine.
Outbound campaigns? Use livekit-cli sip create-call to dial out.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
Haystack 2.7's Agent component plus an Ollama-served Llama 3.2 gives you tool-calling RAG with citations. Here's a complete pipeline against your own document store.
BrowserStack offers 30,000+ real devices; Sauce Labs ships deep Appium automation. Here is how AI voice agent teams use both for WebRTC mobile QA in 2026.
WebTransport is Baseline as of March 2026. Media Over QUIC ships in production within the year. Here is what changes for AI voice agents — and what stays the same.
Run STT, LLM, and TTS entirely on Cloudflare's edge — no OpenAI, no ElevenLabs. Real working code with Whisper, Llama 3.3 70B, and Deepgram Aura.
Vapi 465ms optimal, Retell 580-620ms, Bland ~800ms, ElevenLabs 400-600ms — but those are best-case. We design a fair benchmark harness, P95 measurement, and a reproducible methodology for 2026.
On May 4 2026 OpenAI published its Realtime stack rebuild — split-relay plus transceiver edge. Here is what changed and what it means for production voice agents.
© 2026 CallSphere LLC. All rights reserved.
Watch how CallSphere handles real customer calls, schedules appointments, and processes payments — live.
Try Live DemoBook a DemoCalculate Your ROI