Skip to content
AI Voice Agents
AI Voice Agents11 min read0 views

Build a Voice Agent with Bolna (Open-Source Production Stack)

Bolna 0.10 wires LiteLLM, Deepgram, ElevenLabs, Twilio and Plivo into one OSS orchestrator. Deploy a full conversational voice agent in under 200 lines of YAML + Python.

TL;DR — Bolna is an end-to-end OSS framework specifically for voice-driven LLM agents. Where Vocode and Pipecat give you primitives, Bolna gives you a YAML-driven assistant that wires STT, LLM (via LiteLLM — OpenAI/DeepSeek/Llama/Cohere/Mistral), TTS and telephony in one config.

What you'll build

A Bolna assistant that answers an inbound Twilio call, qualifies a real-estate lead via a structured prompt, and writes the result to Postgres via a webhook tool.

Prerequisites

  1. Python 3.11, pip install bolna fastapi uvicorn psycopg2-binary.
  2. Redis running (Bolna uses it for state).
  3. Twilio number with Voice + Media Streams.
  4. API keys for Deepgram and ElevenLabs (or LiteLLM-compatible alternatives).
  5. Ollama running with llama3.1:8b (we'll point LiteLLM at it).

Architecture

flowchart LR
  PSTN[Caller] --> TW[Twilio]
  TW -->|WSS| BOL[Bolna Orchestrator]
  BOL --> DG[Deepgram STT]
  BOL --> LL[LiteLLM -> Ollama]
  BOL --> EL[ElevenLabs TTS]
  BOL --> RD[(Redis state)]
  BOL -->|webhook| API[Your API]

Step 1 — .env configuration

```bash

.env

TWILIO_ACCOUNT_SID=... TWILIO_AUTH_TOKEN=... DEEPGRAM_AUTH_TOKEN=... ELEVENLABS_API_KEY=... REDIS_URL=redis://localhost:6379/0

LiteLLM points at Ollama

OPENAI_API_BASE=http://127.0.0.1:11434/v1 OPENAI_API_KEY=ollama ```

Step 2 — Define the assistant

```python

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

create_agent.py

import requests, json agent = { "agent_config": { "agent_name": "RealEstate Qualifier", "agent_type": "other", "agent_welcome_message": "Hi, this is the property concierge. Are you looking to buy, sell, or rent today?", "tasks": [{ "task_type": "conversation", "tools_config": { "input": {"format": "wav", "provider": "twilio"}, "output": {"format": "wav", "provider": "twilio"}, "transcriber": {"provider": "deepgram", "model": "nova-2", "language": "en", "stream": True, "endpointing": 500}, "synthesizer": {"provider": "elevenlabs", "model": "eleven_turbo_v2", "stream": True, "voice_id": "EXAVITQu4vr4xnSDxMaL"}, "llm_agent": {"provider": "openai", "model": "llama3.1:8b", "max_tokens": 200, "temperature": 0.4, "extra_config": {"base_url": "http://127.0.0.1:11434/v1"}} }, "task_config": {"hangup_after_silence": 12, "ambient_noise": "office"} }], "agent_prompts": {"system_prompt": "Qualify the caller in 4 questions: intent, budget, timeline, contact. " "When done, call the webhook tool 'save_lead' with the JSON payload, then politely end the call."} } } r = requests.post("http://127.0.0.1:5001/agent", json=agent) print(r.json()) ```

Step 3 — Add a webhook tool

```python agent["agent_config"]["tasks"][0]["tools_config"]["api_tools"] = [{ "name": "save_lead", "description": "Save the qualified lead to CRM.", "url": "https://your.api/leads", "method": "POST", "param_schema": {"type":"object","required":["intent","budget","timeline","contact"], "properties":{"intent":{"type":"string"},"budget":{"type":"string"}, "timeline":{"type":"string"},"contact":{"type":"string"}}}}] ```

Bolna will call this URL with the agent's structured output as the JSON body when the LLM emits the tool.

Step 4 — Run the orchestrator

```bash docker compose up -d # bolna server, redis ```

docker-compose.yml from the repo wires the Python server, Twilio bridge, and Redis. Hit POST /agent to register your config from Step 2.

Step 5 — Trigger a call

```python import requests r = requests.post("http://127.0.0.1:5001/call", json={ "agent_id": "<id from step 2>", "recipient_phone_number": "+15551234567", "from_number": "+18885550000" # Your Twilio DID }) ```

The recipient phone rings; Bolna handles the rest.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Step 6 — Inspect transcripts

```python r = requests.get(f"http://127.0.0.1:5001/executions/{execution_id}").json() for turn in r["transcript"]: print(turn["role"], "→", turn["content"]) ```

Common pitfalls

  • Redis-required. Without Redis, Bolna can't track multi-turn state — calls reset on each utterance.
  • LiteLLM model naming. llama3.1:8b works only if you've set OPENAI_API_BASE to Ollama; otherwise LiteLLM tries OpenAI's catalog.
  • Twilio Media Streams ingress. Make sure your Bolna is reachable on a public WSS URL.

How CallSphere does this in production

CallSphere runs 37 specialist agents in 6 verticals on a tighter-coupled stack (OpenAI Realtime + ElevenLabs + Pion WebRTC + Postgres). Bolna is a great open alternative for teams that want the YAML-config experience and a self-hostable LiteLLM gateway. Healthcare uses 14 HIPAA tools on FastAPI :8084; OneRoof's 10 property specialists are a perfect parallel to the qualifier agent above. Flat $149/$499/$1499 · 14-day trial · 22% affiliate · /industries/real-estate.

FAQ

Bolna vs Vocode? Bolna is config-driven; Vocode is code-driven.

Plivo support? Yes — swap twilio for plivo under tools_config.input.provider.

Local TTS? Set synthesizer.provider to coqui or piper (community plugins).

Multi-language? Deepgram nova-2-multi + ElevenLabs multilingual.

Latency? ~700–900 ms in our tests with Ollama on the same box.

Sources

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.