Build a Voice Agent with Vocode Open-Source (Telephony, 2026)
Vocode-core is the modular open-source voice framework with first-class Twilio + Vonage telephony. Here's a phone-ready Vocode agent talking to Ollama with a Deepgram fallback.
TL;DR — Vocode-core is the framework you reach for when you want a phone number on day one. Twilio and Vonage are first-class transports; STT/TTS/LLM are pluggable. The OSS package has feature parity with the hosted Vocode API for self-hosters.
What you'll build
A Twilio-connected Vocode StreamingConversation that uses Deepgram STT, an Ollama-backed OpenAI shim for the LLM, and ElevenLabs (or Coqui) TTS. Inbound calls hit a FastAPI webhook; the agent answers and chats.
Prerequisites
- Python 3.11,
pip install "vocode[all]" fastapi uvicorn. - Twilio account + a phone number with a Voice webhook.
- Deepgram API key (free tier works).
- Ollama with
llama3.1:8b. - An ngrok tunnel (or stable HTTPS URL) for Twilio's webhook.
Architecture
flowchart LR
PSTN[PSTN Caller] --> TW[Twilio Programmable Voice]
TW -->|Media Streams WSS| VOC[Vocode StreamingConversation]
VOC --> DG[Deepgram STT]
VOC --> OLL[Ollama OpenAI shim]
VOC --> EL[ElevenLabs TTS]
EL --> TW
Step 1 — Vocode TelephonyServer skeleton
```python
server.py
from fastapi import FastAPI from vocode.streaming.telephony.server.base import TelephonyServer, TwilioInboundCallConfig from vocode.streaming.models.telephony import TwilioConfig from vocode.streaming.agent.openai_chat_agent_config import OpenAIChatAgentConfig from vocode.streaming.models.agent import ChatGPTAgentConfig from vocode.streaming.models.message import BaseMessage from vocode.streaming.transcriber.deepgram_transcriber import DeepgramTranscriberConfig from vocode.streaming.synthesizer.eleven_labs_synthesizer import ElevenLabsSynthesizerConfig import os
app = FastAPI()
agent_config = ChatGPTAgentConfig( initial_message=BaseMessage(text="Hi, this is your AI assistant. How can I help?"), prompt_preamble="You are a polite, concise phone assistant. Reply in 1-2 sentences.", model_name="llama3.1:8b", openai_api_base="http://127.0.0.1:11434/v1", openai_api_key="ollama", end_conversation_on_goodbye=True)
config_manager = ... # see Step 4
server = TelephonyServer( base_url=os.environ["BASE_URL"].lstrip("https://"), config_manager=config_manager, inbound_call_configs=[TwilioInboundCallConfig( url="/inbound_call", agent_config=agent_config, twilio_config=TwilioConfig( account_sid=os.environ["TWILIO_ACCOUNT_SID"], auth_token=os.environ["TWILIO_AUTH_TOKEN"]), transcriber_config=DeepgramTranscriberConfig.from_telephone_input_device( api_key=os.environ["DEEPGRAM_API_KEY"]), synthesizer_config=ElevenLabsSynthesizerConfig.from_telephone_output_device( api_key=os.environ["ELEVEN_LABS_API_KEY"], voice_id="EXAVITQu4vr4xnSDxMaL"))])
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
app.include_router(server.get_router()) ```
Step 2 — Plug Ollama in via the OpenAI shim
Vocode's ChatGPTAgentConfig accepts openai_api_base. Ollama exposes /v1/chat/completions, so it's plug-and-play. Add openai_api_key="ollama" (any non-empty string passes the SDK validator).
Step 3 — Use Coqui XTTS instead of ElevenLabs (optional)
```python from vocode.streaming.synthesizer.coqui_synthesizer import CoquiSynthesizerConfig synth = CoquiSynthesizerConfig.from_telephone_output_device( voice_id="...", voice_name="amy") ```
This avoids per-character TTS spend at the cost of latency and clone-licence headache.
Step 4 — In-memory config manager
```python from vocode.streaming.telephony.config_manager.in_memory_config_manager import InMemoryConfigManager config_manager = InMemoryConfigManager() ```
For production, switch to RedisConfigManager so call state survives restarts.
Step 5 — Run, tunnel, and wire Twilio
```bash uvicorn server:app --host 0.0.0.0 --port 3000 & ngrok http 3000
Set Twilio number's Voice webhook to https:///inbound_call
```
Call your Twilio number — Vocode answers and you're talking to a fully OSS pipeline.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Step 6 — Add an action (tool call)
```python from vocode.streaming.action.base_action import BaseAction from pydantic import BaseModel
class BookSlotParams(BaseModel): iso: str
class BookSlotAction(BaseAction[BookSlotParams, dict]): description = "Book a slot at the given ISO time." parameters_type = BookSlotParams async def run(self, action_input): # write to your DB / CRM here return {"booked": True, "iso": action_input.params.iso}
agent_config.actions = [BookSlotAction()] ```
Vocode actions are how you give the agent real-world side effects.
Common pitfalls
base_urlstrips scheme. Don't includehttps://— Vocode adds it.- Twilio media format. Vocode handles
mulaw 8 kHzend-to-end; don't transcode. - Long greetings. Twilio will hang up a stalled call after 5s of silence; keep
initial_messageshort.
How CallSphere does this in production
CallSphere serves 37 specialist agents across 6 verticals — Healthcare's 14 tools on FastAPI :8084 with OpenAI Realtime, OneRoof's 10 specialists on Pion WebRTC, plus Salon, Dental, F&B, Behavioral — backed by 90+ tools and 115+ Postgres tables. Pricing is flat $149 / $499 / $1499 with a 14-day trial, a 22% affiliate program and full SOC 2 controls. See /pricing and /demo.
FAQ
Vocode vs Pipecat? Vocode is more telephony-focused; Pipecat is more pipeline-flexible.
Vocode hosted API still alive? Yes — but the OSS core has parity for self-hosters.
Is Twilio cheap enough? ~$0.0085/min inbound + ~$0.013/min Media Streams in the US.
Can I use Vonage instead? Yes — vocode.streaming.telephony.server.vonage_*.
Tools / actions? First-class via BaseAction; works with both OpenAI and Ollama.
Sources
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.