Skip to content
Build a Voice Agent with LiveKit Agents in Python (2026 Tutorial)
Voice & Chat Agents12 min read12 views

Build a Voice Agent with LiveKit Agents in Python (2026 Tutorial)

By Sagar Shankaran, Founder of CallSphere

Quick answer

Wire LiveKit Agents 1.x, Deepgram STT, GPT-4o, and ElevenLabs TTS into a sub-700ms WebRTC voice agent. Real Python code, room dispatch, and prod pitfalls.

Key takeaways

TL;DR — LiveKit Agents 1.x is the most-adopted open-source voice runtime of 2026. Two Python files (one entrypoint + one Dockerfile) give you a WebRTC voice agent with STT-LLM-TTS pipelining, server-side VAD, interruption handling, and one-command deploy to LiveKit Cloud or self-hosted SFU.

What you'll build

A LiveKit room participant that auto-joins any room called support-*, transcribes the caller with Deepgram Nova-3, reasons with GPT-4o, and speaks back through ElevenLabs Turbo v2.5 — all under 700ms voice-to-voice on the default plan.

Architecture

flowchart LR
  CL[Caller browser/SIP] -- WebRTC --> SFU[LiveKit SFU]
  SFU -- audio track --> AG[Python agents worker]
  AG -- STT --> DG[Deepgram Nova-3]
  AG -- LLM --> OA[OpenAI GPT-4o]
  AG -- TTS --> EL[ElevenLabs Turbo 2.5]
  AG -- audio track --> SFU --> CL

Step 1 — Install + bootstrap

```bash pip install "livekit-agents[deepgram,openai,elevenlabs,silero]~=1.0" lk app create --template agent-starter-python my-agent cd my-agent && cp .env.example .env # add LIVEKIT_URL, OPENAI_API_KEY, etc. ```

Step 2 — Define the agent class

```python from livekit import agents from livekit.agents import Agent, AgentSession, RoomInputOptions from livekit.plugins import openai, deepgram, elevenlabs, silero

class Concierge(Agent): def init(self) -> None: super().init( instructions="You are a friendly clinic concierge. " "Confirm the appointment, then ask if anything else.", ) ```

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

Step 3 — Wire the session

```python async def entrypoint(ctx: agents.JobContext): await ctx.connect() session = AgentSession( stt=deepgram.STT(model="nova-3", language="en-US"), llm=openai.LLM(model="gpt-4o"), tts=elevenlabs.TTS(voice="rachel", model="eleven_turbo_v2_5"), vad=silero.VAD.load(), ) await session.start( room=ctx.room, agent=Concierge(), room_input_options=RoomInputOptions(noise_cancellation=True), ) await session.generate_reply(instructions="Greet the caller warmly.") ```

Step 4 — Add a tool with the function decorator

```python from livekit.agents.llm import function_tool

class Concierge(Agent): @function_tool async def book_slot(self, iso_time: str) -> str: """Book the requested ISO-8601 slot in the clinic calendar.""" # call your real backend here return f"Booked {iso_time}" ```

Step 5 — Run + auto-dispatch

```python if name == "main": agents.cli.run_app( agents.WorkerOptions(entrypoint_fnc=entrypoint, agent_name="concierge"), ) ```

```bash python agent.py dev # local hot-reload python agent.py start # production worker ```

Step 6 — Deploy

LiveKit Cloud: lk cloud agents deploy --agent concierge. Self-hosted: any container platform — agents pull jobs over WebSocket from your LiveKit server, so no inbound port is needed.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Step 7 — Wire SIP/PSTN

Use lk sip create-trunk + a SIPDispatchRule that routes inbound DIDs to the same room pattern; the worker auto-attaches.

Pitfalls

  • VAD model size: Silero VAD is fine, but on >40 concurrent rooms per worker switch to LiveKit's hosted turn-detector for lower CPU.
  • Plugin version drift: Pin livekit-agents~=1.0 and refresh plugins together — STT and TTS plugins share the worker contract.
  • Egress to PSTN: SIP works, but Twilio Elastic SIP requires secure: false or a TLS cert chain on your trunk.
  • Cold start: Each Python worker takes ~3s to load Silero — pre-warm with num_idle_processes=2.

How CallSphere does this

CallSphere ships 37 production agents across 6 verticals with 90+ tools and 115+ Postgres tables. Healthcare, OneRoof real-estate, Salon, Sales, Behavioral Health, and Trades stacks all run a LiveKit Agents fleet handling 1.2M voice minutes/month at ~720ms p95 voice-to-voice. Pricing is $149/$499/$1,499 with a 14-day no-card trial and a 22% recurring affiliate.

FAQ

Can I bring my own LLM? Yes — openai.LLM(base_url=...) works with Groq, Together, vLLM, Ollama, and Anthropic via gateways.

Does it support Realtime API? openai.realtime.RealtimeModel() replaces STT+LLM+TTS with one model — drop the three plugins and pass llm=....

Latency tuning? Use turn-detector + VAD pre-emption + ElevenLabs streaming WebSocket; expect 600-750ms p50.

Multi-tenant isolation? Each room gets its own worker process; use job metadata to route by tenant.

Sources

Share
S

Written by

Sagar Shankaran· Founder, CallSphere

Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.