Skip to content
AI Voice Agents
AI Voice Agents12 min read0 views

Build a Voice Agent with LiveKit Agents Python SDK 1.5 (2026)

LiveKit Agents 1.5 (April 2026) added an audio-based interruption model and native MCP tools. Here's a full self-hosted LiveKit voice agent with adaptive turn detection.

TL;DR — LiveKit Agents 1.5 ships an ML model that distinguishes real interruptions from "mm-hmm", coughs, and background noise. Combine that with native MCP tool support and the framework is the most polished OSS voice-agent stack in May 2026.

What you'll build

A self-hosted LiveKit server + an Agents 1.5 Python worker that joins rooms, runs Deepgram STT + OpenAI LLM + ElevenLabs TTS (all swappable), and exposes a calculator MCP tool. Browser test page included.

Prerequisites

  1. Docker (for the LiveKit server) + Python 3.11.
  2. pip install "livekit-agents[deepgram,openai,elevenlabs,silero,turn-detector]" python-dotenv.
  3. API keys for Deepgram, OpenAI, ElevenLabs (or swap to local providers).
  4. livekit-server Docker image.

Architecture

flowchart LR
  BR[Browser] -->|WebRTC| LK[LiveKit Server]
  LK <-->|Room| AG[Agents Worker 1.5]
  AG --> STT[Deepgram]
  AG --> LLM[OpenAI gpt-4o-mini]
  AG --> TTS[ElevenLabs]
  AG --> MCP[MCP Server -> tools]

Step 1 — Run LiveKit server

```bash docker run --rm -p 7880:7880 -p 7881:7881 -p 7882:7882/udp \ -e LIVEKIT_KEYS="devkey: secret" \ livekit/livekit-server --dev ```

This is fine for local dev; for production use Helm + a dedicated SFU.

Step 2 — Define the agent

```python

agent.py

import os from dotenv import load_dotenv from livekit import agents from livekit.agents import Agent, AgentSession, JobContext, RoomInputOptions, WorkerOptions from livekit.plugins import openai, deepgram, elevenlabs, silero from livekit.plugins.turn_detector.multilingual import MultilingualModel load_dotenv()

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

class Assistant(Agent): def init(self): super().init(instructions=( "You are a polite, concise voice assistant. Keep replies under 2 sentences. " "If asked to compute, call the calculator tool."))

async def entrypoint(ctx: JobContext): session = AgentSession( stt=deepgram.STT(model="nova-2", language="en"), llm=openai.LLM(model="gpt-4o-mini"), tts=elevenlabs.TTS(voice="EXAVITQu4vr4xnSDxMaL"), vad=silero.VAD.load(), turn_detection=MultilingualModel()) # 1.5 ML interruption model await session.start(agent=Assistant(), room=ctx.room, room_input_options=RoomInputOptions()) await ctx.connect() await session.generate_reply(instructions="Greet the user and ask what they need.")

if name == "main": agents.cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint)) ```

Step 3 — .env

```bash LIVEKIT_URL=ws://127.0.0.1:7880 LIVEKIT_API_KEY=devkey LIVEKIT_API_SECRET=secret OPENAI_API_KEY=sk-... DEEPGRAM_API_KEY=... ELEVEN_API_KEY=... ```

Step 4 — Run the agent worker

```bash python agent.py dev ```

The worker registers with LiveKit and waits for rooms.

Step 5 — Add an MCP tool (1.5 native)

```python from livekit.agents import function_tool, RunContext

@function_tool async def calculator(ctx: RunContext, expression: str) -> str: """Evaluate a basic arithmetic expression.""" import ast, operator as op OPS = {ast.Add: op.add, ast.Sub: op.sub, ast.Mult: op.mul, ast.Div: op.truediv} def ev(n): if isinstance(n, ast.Constant): return n.value if isinstance(n, ast.BinOp): return OPStype(n.op), ev(n.right)) raise ValueError("bad") return str(ev(ast.parse(expression, mode="eval").body))

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

class Assistant(Agent): def init(self): super().init(instructions="...", tools=[calculator]) ```

For production, point AgentSession(mcp_servers=[...]) at a remote MCP server (Anthropic, GitHub, your own).

Step 6 — Browser test client

Use the agent-starter-react repo or a quick livekit-client snippet to join the same room. The agent answers automatically.

Common pitfalls

  • Worker not registering. LIVEKIT_URL must match the host the browser uses (ws://127.0.0.1:7880, not localhost if your client is on a phone).
  • Turn detector model download. First run downloads ~120 MB of weights; pre-cache in CI.
  • Interruption model on slow CPUs. It's fine on M-series and modern x86; on Pi-class hardware, disable.

How CallSphere does this in production

CallSphere runs OneRoof Property's 10 specialists on a Pion-based WebRTC mesh — same architecture pattern as LiveKit, custom-tuned for property workflows. Healthcare uses 14 HIPAA-aware tools on FastAPI :8084 with OpenAI Realtime; Salon, Dental, F&B and Behavioral round out 6 verticals. Total: 37 agents · 90+ tools · 115+ DB tables. Flat pricing $149/$499/$1499 — 14-day trial · 22% affiliate · /industries/real-estate · /demo.

FAQ

Cloud vs self-hosted LiveKit? Cloud is faster to try; self-hosted is cheaper at scale.

Sub-agent / multi-agent patterns? Yes — session.update_agent(NewAgent()) mid-call.

Realtime API instead of STT/LLM/TTS? Yes — openai.realtime.RealtimeModel().

Mobile? Native iOS/Android SDKs.

Phone numbers? LiveKit Cloud Build plan includes one US DID.

Sources

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.