Build a Voice Agent with LiveKit Agents in Python (2026 Tutorial)
Wire LiveKit Agents 1.x, Deepgram STT, GPT-4o, and ElevenLabs TTS into a sub-700ms WebRTC voice agent. Real Python code, room dispatch, and prod pitfalls.
TL;DR — LiveKit Agents 1.x is the most-adopted open-source voice runtime of 2026. Two Python files (one entrypoint + one Dockerfile) give you a WebRTC voice agent with STT-LLM-TTS pipelining, server-side VAD, interruption handling, and one-command deploy to LiveKit Cloud or self-hosted SFU.
What you'll build
A LiveKit room participant that auto-joins any room called support-*, transcribes the caller with Deepgram Nova-3, reasons with GPT-4o, and speaks back through ElevenLabs Turbo v2.5 — all under 700ms voice-to-voice on the default plan.
Architecture
flowchart LR
CL[Caller browser/SIP] -- WebRTC --> SFU[LiveKit SFU]
SFU -- audio track --> AG[Python agents worker]
AG -- STT --> DG[Deepgram Nova-3]
AG -- LLM --> OA[OpenAI GPT-4o]
AG -- TTS --> EL[ElevenLabs Turbo 2.5]
AG -- audio track --> SFU --> CL
Step 1 — Install + bootstrap
```bash pip install "livekit-agents[deepgram,openai,elevenlabs,silero]~=1.0" lk app create --template agent-starter-python my-agent cd my-agent && cp .env.example .env # add LIVEKIT_URL, OPENAI_API_KEY, etc. ```
Step 2 — Define the agent class
```python from livekit import agents from livekit.agents import Agent, AgentSession, RoomInputOptions from livekit.plugins import openai, deepgram, elevenlabs, silero
class Concierge(Agent): def init(self) -> None: super().init( instructions="You are a friendly clinic concierge. " "Confirm the appointment, then ask if anything else.", ) ```
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Step 3 — Wire the session
```python async def entrypoint(ctx: agents.JobContext): await ctx.connect() session = AgentSession( stt=deepgram.STT(model="nova-3", language="en-US"), llm=openai.LLM(model="gpt-4o"), tts=elevenlabs.TTS(voice="rachel", model="eleven_turbo_v2_5"), vad=silero.VAD.load(), ) await session.start( room=ctx.room, agent=Concierge(), room_input_options=RoomInputOptions(noise_cancellation=True), ) await session.generate_reply(instructions="Greet the caller warmly.") ```
Step 4 — Add a tool with the function decorator
```python from livekit.agents.llm import function_tool
class Concierge(Agent): @function_tool async def book_slot(self, iso_time: str) -> str: """Book the requested ISO-8601 slot in the clinic calendar.""" # call your real backend here return f"Booked {iso_time}" ```
Step 5 — Run + auto-dispatch
```python if name == "main": agents.cli.run_app( agents.WorkerOptions(entrypoint_fnc=entrypoint, agent_name="concierge"), ) ```
```bash python agent.py dev # local hot-reload python agent.py start # production worker ```
Step 6 — Deploy
LiveKit Cloud: lk cloud agents deploy --agent concierge. Self-hosted: any container platform — agents pull jobs over WebSocket from your LiveKit server, so no inbound port is needed.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Step 7 — Wire SIP/PSTN
Use lk sip create-trunk + a SIPDispatchRule that routes inbound DIDs to the same room pattern; the worker auto-attaches.
Pitfalls
- VAD model size: Silero VAD is fine, but on >40 concurrent rooms per worker switch to LiveKit's hosted turn-detector for lower CPU.
- Plugin version drift: Pin
livekit-agents~=1.0and refresh plugins together — STT and TTS plugins share the worker contract. - Egress to PSTN: SIP works, but Twilio Elastic SIP requires
secure: falseor a TLS cert chain on your trunk. - Cold start: Each Python worker takes ~3s to load Silero — pre-warm with
num_idle_processes=2.
How CallSphere does this
CallSphere ships 37 production agents across 6 verticals with 90+ tools and 115+ Postgres tables. Healthcare, OneRoof real-estate, Salon, Sales, Behavioral Health, and Trades stacks all run a LiveKit Agents fleet handling 1.2M voice minutes/month at ~720ms p95 voice-to-voice. Pricing is $149/$499/$1,499 with a 14-day no-card trial and a 22% recurring affiliate.
FAQ
Can I bring my own LLM? Yes — openai.LLM(base_url=...) works with Groq, Together, vLLM, Ollama, and Anthropic via gateways.
Does it support Realtime API? openai.realtime.RealtimeModel() replaces STT+LLM+TTS with one model — drop the three plugins and pass llm=....
Latency tuning? Use turn-detector + VAD pre-emption + ElevenLabs streaming WebSocket; expect 600-750ms p50.
Multi-tenant isolation? Each room gets its own worker process; use job metadata to route by tenant.
Sources
- LiveKit Docs - Voice AI Quickstart - https://docs.livekit.io/agents/quickstarts/voice-agent/
- LiveKit Blog - Build Your First AI Voice Agent in Python - https://livekit.com/blog/build-your-first-ai-voice-agent-python
- GitHub - livekit/agents - https://github.com/livekit/agents
- ForaSoft - LiveKit AI Agents 2026 Playbook - https://www.forasoft.com/blog/article/livekit-ai-agents-guide
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.