Build a Voice Agent with LiveKit Agents Python SDK 1.5 (2026)
LiveKit Agents 1.5 (April 2026) added an audio-based interruption model and native MCP tools. Here's a full self-hosted LiveKit voice agent with adaptive turn detection.
TL;DR — LiveKit Agents 1.5 ships an ML model that distinguishes real interruptions from "mm-hmm", coughs, and background noise. Combine that with native MCP tool support and the framework is the most polished OSS voice-agent stack in May 2026.
What you'll build
A self-hosted LiveKit server + an Agents 1.5 Python worker that joins rooms, runs Deepgram STT + OpenAI LLM + ElevenLabs TTS (all swappable), and exposes a calculator MCP tool. Browser test page included.
Prerequisites
- Docker (for the LiveKit server) + Python 3.11.
pip install "livekit-agents[deepgram,openai,elevenlabs,silero,turn-detector]" python-dotenv.- API keys for Deepgram, OpenAI, ElevenLabs (or swap to local providers).
livekit-serverDocker image.
Architecture
flowchart LR
BR[Browser] -->|WebRTC| LK[LiveKit Server]
LK <-->|Room| AG[Agents Worker 1.5]
AG --> STT[Deepgram]
AG --> LLM[OpenAI gpt-4o-mini]
AG --> TTS[ElevenLabs]
AG --> MCP[MCP Server -> tools]
Step 1 — Run LiveKit server
```bash docker run --rm -p 7880:7880 -p 7881:7881 -p 7882:7882/udp \ -e LIVEKIT_KEYS="devkey: secret" \ livekit/livekit-server --dev ```
This is fine for local dev; for production use Helm + a dedicated SFU.
Step 2 — Define the agent
```python
agent.py
import os from dotenv import load_dotenv from livekit import agents from livekit.agents import Agent, AgentSession, JobContext, RoomInputOptions, WorkerOptions from livekit.plugins import openai, deepgram, elevenlabs, silero from livekit.plugins.turn_detector.multilingual import MultilingualModel load_dotenv()
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
class Assistant(Agent): def init(self): super().init(instructions=( "You are a polite, concise voice assistant. Keep replies under 2 sentences. " "If asked to compute, call the calculator tool."))
async def entrypoint(ctx: JobContext): session = AgentSession( stt=deepgram.STT(model="nova-2", language="en"), llm=openai.LLM(model="gpt-4o-mini"), tts=elevenlabs.TTS(voice="EXAVITQu4vr4xnSDxMaL"), vad=silero.VAD.load(), turn_detection=MultilingualModel()) # 1.5 ML interruption model await session.start(agent=Assistant(), room=ctx.room, room_input_options=RoomInputOptions()) await ctx.connect() await session.generate_reply(instructions="Greet the user and ask what they need.")
if name == "main": agents.cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint)) ```
Step 3 — .env
```bash LIVEKIT_URL=ws://127.0.0.1:7880 LIVEKIT_API_KEY=devkey LIVEKIT_API_SECRET=secret OPENAI_API_KEY=sk-... DEEPGRAM_API_KEY=... ELEVEN_API_KEY=... ```
Step 4 — Run the agent worker
```bash python agent.py dev ```
The worker registers with LiveKit and waits for rooms.
Step 5 — Add an MCP tool (1.5 native)
```python from livekit.agents import function_tool, RunContext
@function_tool async def calculator(ctx: RunContext, expression: str) -> str: """Evaluate a basic arithmetic expression.""" import ast, operator as op OPS = {ast.Add: op.add, ast.Sub: op.sub, ast.Mult: op.mul, ast.Div: op.truediv} def ev(n): if isinstance(n, ast.Constant): return n.value if isinstance(n, ast.BinOp): return OPStype(n.op), ev(n.right)) raise ValueError("bad") return str(ev(ast.parse(expression, mode="eval").body))
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
class Assistant(Agent): def init(self): super().init(instructions="...", tools=[calculator]) ```
For production, point AgentSession(mcp_servers=[...]) at a remote MCP server (Anthropic, GitHub, your own).
Step 6 — Browser test client
Use the agent-starter-react repo or a quick livekit-client snippet to join the same room. The agent answers automatically.
Common pitfalls
- Worker not registering. LIVEKIT_URL must match the host the browser uses (
ws://127.0.0.1:7880, notlocalhostif your client is on a phone). - Turn detector model download. First run downloads ~120 MB of weights; pre-cache in CI.
- Interruption model on slow CPUs. It's fine on M-series and modern x86; on Pi-class hardware, disable.
How CallSphere does this in production
CallSphere runs OneRoof Property's 10 specialists on a Pion-based WebRTC mesh — same architecture pattern as LiveKit, custom-tuned for property workflows. Healthcare uses 14 HIPAA-aware tools on FastAPI :8084 with OpenAI Realtime; Salon, Dental, F&B and Behavioral round out 6 verticals. Total: 37 agents · 90+ tools · 115+ DB tables. Flat pricing $149/$499/$1499 — 14-day trial · 22% affiliate · /industries/real-estate · /demo.
FAQ
Cloud vs self-hosted LiveKit? Cloud is faster to try; self-hosted is cheaper at scale.
Sub-agent / multi-agent patterns? Yes — session.update_agent(NewAgent()) mid-call.
Realtime API instead of STT/LLM/TTS? Yes — openai.realtime.RealtimeModel().
Mobile? Native iOS/Android SDKs.
Phone numbers? LiveKit Cloud Build plan includes one US DID.
Sources
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.