Build a Voice Agent with LiveKit Agents in Python (2026 Tutorial)
By Sagar Shankaran, Founder of CallSphere
Wire LiveKit Agents 1.x, Deepgram STT, GPT-4o, and ElevenLabs TTS into a sub-700ms WebRTC voice agent. Real Python code, room dispatch, and prod pitfalls.
Key takeaways
TL;DR — LiveKit Agents 1.x is the most-adopted open-source voice runtime of 2026. Two Python files (one entrypoint + one Dockerfile) give you a WebRTC voice agent with STT-LLM-TTS pipelining, server-side VAD, interruption handling, and one-command deploy to LiveKit Cloud or self-hosted SFU.
What you'll build
A LiveKit room participant that auto-joins any room called support-*, transcribes the caller with Deepgram Nova-3, reasons with GPT-4o, and speaks back through ElevenLabs Turbo v2.5 — all under 700ms voice-to-voice on the default plan.
Architecture
flowchart LR
CL[Caller browser/SIP] -- WebRTC --> SFU[LiveKit SFU]
SFU -- audio track --> AG[Python agents worker]
AG -- STT --> DG[Deepgram Nova-3]
AG -- LLM --> OA[OpenAI GPT-4o]
AG -- TTS --> EL[ElevenLabs Turbo 2.5]
AG -- audio track --> SFU --> CL
Step 1 — Install + bootstrap
```bash pip install "livekit-agents[deepgram,openai,elevenlabs,silero]~=1.0" lk app create --template agent-starter-python my-agent cd my-agent && cp .env.example .env # add LIVEKIT_URL, OPENAI_API_KEY, etc. ```
Step 2 — Define the agent class
```python from livekit import agents from livekit.agents import Agent, AgentSession, RoomInputOptions from livekit.plugins import openai, deepgram, elevenlabs, silero
class Concierge(Agent): def init(self) -> None: super().init( instructions="You are a friendly clinic concierge. " "Confirm the appointment, then ask if anything else.", ) ```
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Step 3 — Wire the session
```python async def entrypoint(ctx: agents.JobContext): await ctx.connect() session = AgentSession( stt=deepgram.STT(model="nova-3", language="en-US"), llm=openai.LLM(model="gpt-4o"), tts=elevenlabs.TTS(voice="rachel", model="eleven_turbo_v2_5"), vad=silero.VAD.load(), ) await session.start( room=ctx.room, agent=Concierge(), room_input_options=RoomInputOptions(noise_cancellation=True), ) await session.generate_reply(instructions="Greet the caller warmly.") ```
Step 4 — Add a tool with the function decorator
```python from livekit.agents.llm import function_tool
class Concierge(Agent): @function_tool async def book_slot(self, iso_time: str) -> str: """Book the requested ISO-8601 slot in the clinic calendar.""" # call your real backend here return f"Booked {iso_time}" ```
Step 5 — Run + auto-dispatch
```python if name == "main": agents.cli.run_app( agents.WorkerOptions(entrypoint_fnc=entrypoint, agent_name="concierge"), ) ```
```bash python agent.py dev # local hot-reload python agent.py start # production worker ```
Step 6 — Deploy
LiveKit Cloud: lk cloud agents deploy --agent concierge. Self-hosted: any container platform — agents pull jobs over WebSocket from your LiveKit server, so no inbound port is needed.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Step 7 — Wire SIP/PSTN
Use lk sip create-trunk + a SIPDispatchRule that routes inbound DIDs to the same room pattern; the worker auto-attaches.
Pitfalls
- VAD model size: Silero VAD is fine, but on >40 concurrent rooms per worker switch to LiveKit's hosted turn-detector for lower CPU.
- Plugin version drift: Pin
livekit-agents~=1.0and refresh plugins together — STT and TTS plugins share the worker contract. - Egress to PSTN: SIP works, but Twilio Elastic SIP requires
secure: falseor a TLS cert chain on your trunk. - Cold start: Each Python worker takes ~3s to load Silero — pre-warm with
num_idle_processes=2.
How CallSphere does this
CallSphere ships 37 production agents across 6 verticals with 90+ tools and 115+ Postgres tables. Healthcare, OneRoof real-estate, Salon, Sales, Behavioral Health, and Trades stacks all run a LiveKit Agents fleet handling 1.2M voice minutes/month at ~720ms p95 voice-to-voice. Pricing is $149/$499/$1,499 with a 14-day no-card trial and a 22% recurring affiliate.
FAQ
Can I bring my own LLM? Yes — openai.LLM(base_url=...) works with Groq, Together, vLLM, Ollama, and Anthropic via gateways.
Does it support Realtime API? openai.realtime.RealtimeModel() replaces STT+LLM+TTS with one model — drop the three plugins and pass llm=....
Latency tuning? Use turn-detector + VAD pre-emption + ElevenLabs streaming WebSocket; expect 600-750ms p50.
Multi-tenant isolation? Each room gets its own worker process; use job metadata to route by tenant.
Sources
- LiveKit Docs - Voice AI Quickstart - https://docs.livekit.io/agents/quickstarts/voice-agent/
- LiveKit Blog - Build Your First AI Voice Agent in Python - https://livekit.com/blog/build-your-first-ai-voice-agent-python
- GitHub - livekit/agents - https://github.com/livekit/agents
- ForaSoft - LiveKit AI Agents 2026 Playbook - https://www.forasoft.com/blog/article/livekit-ai-agents-guide
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.