Skip to content
AI Voice Agents
AI Voice Agents12 min read0 views

Build a Voice Agent with LiveKit Agents in Python (2026 Tutorial)

Wire LiveKit Agents 1.x, Deepgram STT, GPT-4o, and ElevenLabs TTS into a sub-700ms WebRTC voice agent. Real Python code, room dispatch, and prod pitfalls.

TL;DR — LiveKit Agents 1.x is the most-adopted open-source voice runtime of 2026. Two Python files (one entrypoint + one Dockerfile) give you a WebRTC voice agent with STT-LLM-TTS pipelining, server-side VAD, interruption handling, and one-command deploy to LiveKit Cloud or self-hosted SFU.

What you'll build

A LiveKit room participant that auto-joins any room called support-*, transcribes the caller with Deepgram Nova-3, reasons with GPT-4o, and speaks back through ElevenLabs Turbo v2.5 — all under 700ms voice-to-voice on the default plan.

Architecture

flowchart LR
  CL[Caller browser/SIP] -- WebRTC --> SFU[LiveKit SFU]
  SFU -- audio track --> AG[Python agents worker]
  AG -- STT --> DG[Deepgram Nova-3]
  AG -- LLM --> OA[OpenAI GPT-4o]
  AG -- TTS --> EL[ElevenLabs Turbo 2.5]
  AG -- audio track --> SFU --> CL

Step 1 — Install + bootstrap

```bash pip install "livekit-agents[deepgram,openai,elevenlabs,silero]~=1.0" lk app create --template agent-starter-python my-agent cd my-agent && cp .env.example .env # add LIVEKIT_URL, OPENAI_API_KEY, etc. ```

Step 2 — Define the agent class

```python from livekit import agents from livekit.agents import Agent, AgentSession, RoomInputOptions from livekit.plugins import openai, deepgram, elevenlabs, silero

class Concierge(Agent): def init(self) -> None: super().init( instructions="You are a friendly clinic concierge. " "Confirm the appointment, then ask if anything else.", ) ```

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

Step 3 — Wire the session

```python async def entrypoint(ctx: agents.JobContext): await ctx.connect() session = AgentSession( stt=deepgram.STT(model="nova-3", language="en-US"), llm=openai.LLM(model="gpt-4o"), tts=elevenlabs.TTS(voice="rachel", model="eleven_turbo_v2_5"), vad=silero.VAD.load(), ) await session.start( room=ctx.room, agent=Concierge(), room_input_options=RoomInputOptions(noise_cancellation=True), ) await session.generate_reply(instructions="Greet the caller warmly.") ```

Step 4 — Add a tool with the function decorator

```python from livekit.agents.llm import function_tool

class Concierge(Agent): @function_tool async def book_slot(self, iso_time: str) -> str: """Book the requested ISO-8601 slot in the clinic calendar.""" # call your real backend here return f"Booked {iso_time}" ```

Step 5 — Run + auto-dispatch

```python if name == "main": agents.cli.run_app( agents.WorkerOptions(entrypoint_fnc=entrypoint, agent_name="concierge"), ) ```

```bash python agent.py dev # local hot-reload python agent.py start # production worker ```

Step 6 — Deploy

LiveKit Cloud: lk cloud agents deploy --agent concierge. Self-hosted: any container platform — agents pull jobs over WebSocket from your LiveKit server, so no inbound port is needed.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Step 7 — Wire SIP/PSTN

Use lk sip create-trunk + a SIPDispatchRule that routes inbound DIDs to the same room pattern; the worker auto-attaches.

Pitfalls

  • VAD model size: Silero VAD is fine, but on >40 concurrent rooms per worker switch to LiveKit's hosted turn-detector for lower CPU.
  • Plugin version drift: Pin livekit-agents~=1.0 and refresh plugins together — STT and TTS plugins share the worker contract.
  • Egress to PSTN: SIP works, but Twilio Elastic SIP requires secure: false or a TLS cert chain on your trunk.
  • Cold start: Each Python worker takes ~3s to load Silero — pre-warm with num_idle_processes=2.

How CallSphere does this

CallSphere ships 37 production agents across 6 verticals with 90+ tools and 115+ Postgres tables. Healthcare, OneRoof real-estate, Salon, Sales, Behavioral Health, and Trades stacks all run a LiveKit Agents fleet handling 1.2M voice minutes/month at ~720ms p95 voice-to-voice. Pricing is $149/$499/$1,499 with a 14-day no-card trial and a 22% recurring affiliate.

FAQ

Can I bring my own LLM? Yes — openai.LLM(base_url=...) works with Groq, Together, vLLM, Ollama, and Anthropic via gateways.

Does it support Realtime API? openai.realtime.RealtimeModel() replaces STT+LLM+TTS with one model — drop the three plugins and pass llm=....

Latency tuning? Use turn-detector + VAD pre-emption + ElevenLabs streaming WebSocket; expect 600-750ms p50.

Multi-tenant isolation? Each room gets its own worker process; use job metadata to route by tenant.

Sources

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.