Build a Voice Agent with Vocode: Open-Source Python LLM Voice (2026)
Vocode-core wires phone, browser, and Zoom voice into one Python agent class. Build a Twilio inbound bot with GPT-4o and Azure TTS — code + pitfalls.
TL;DR — Vocode-core is the original (2023) open-source voice-LLM framework. It still ships clean adapters for Twilio, Vonage, Telnyx, Zoom, and browser, with pluggable Transcriber + Agent + Synthesizer classes. Best fit when you want a single Python class that runs on phone AND browser.
What you'll build
A FastAPI server that answers a Twilio number, transcribes with Deepgram, reasons with GPT-4o, and speaks back through Azure Neural TTS — under 200 lines of Python.
Architecture
flowchart LR
PSTN[Caller PSTN] --> TW[Twilio Voice]
TW -- Media Streams WS --> VC[Vocode FastAPI]
VC --> TRX[Deepgram Transcriber]
TRX --> AG[ChatGPTAgent]
AG --> SY[AzureSynthesizer]
SY --> TW --> PSTN
Step 1 — Install
```bash pip install "vocode==0.1.x" fastapi uvicorn ```
Step 2 — FastAPI app
```python import os from fastapi import FastAPI from vocode.streaming.telephony.server.base import TelephonyServer from vocode.streaming.telephony.config_manager.redis_config_manager import ( RedisConfigManager, ) from vocode.streaming.models.telephony import TwilioConfig from vocode.streaming.telephony.server.inbound_call_server import InboundCallServer
app = FastAPI() ```
Step 3 — Configure the agent
```python from vocode.streaming.models.agent import ChatGPTAgentConfig from vocode.streaming.models.transcriber import DeepgramTranscriberConfig from vocode.streaming.models.synthesizer import AzureSynthesizerConfig from vocode.streaming.models.message import BaseMessage
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
config_manager = RedisConfigManager()
inbound = InboundCallServer( agent_config=ChatGPTAgentConfig( model_name="gpt-4o", initial_message=BaseMessage(text="Thanks for calling — how can I help?"), prompt_preamble="You are a friendly clinic concierge.", ), transcriber_config=DeepgramTranscriberConfig.from_telephone_input_device( endpointing_config={"type": "punctuation_based"}, ), synthesizer_config=AzureSynthesizerConfig.from_telephone_output_device( voice_name="en-US-JennyNeural", ), twilio_config=TwilioConfig( account_sid=os.environ["TWILIO_ACCOUNT_SID"], auth_token=os.environ["TWILIO_AUTH_TOKEN"], ), config_manager=config_manager, ) ```
Step 4 — Mount + Twilio webhook
```python app.include_router(inbound.get_router())
In Twilio Console, set the inbound webhook to:
POST https://yourhost.com/inbound_call
Vocode handles the TwiML handshake.
```
Step 5 — Add tools (actions)
```python from vocode.streaming.action.base_action import BaseAction from vocode.streaming.models.actions import ActionConfig, ActionInput, ActionOutput
class BookSlot(BaseAction[ActionConfig, dict, dict]): description = "Book an appointment for the given ISO time." parameters_type = dict response_type = dict async def run(self, action_input: ActionInput) -> ActionOutput[dict]: return ActionOutput(action_type="book_slot", response={"ok": True, "iso": action_input.params["iso"]})
agent_config = ChatGPTAgentConfig(model_name="gpt-4o", actions=[BookSlot.get_config()]) ```
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Step 6 — Outbound calls
```python from vocode.streaming.telephony.conversation.outbound_call import OutboundCall
call = OutboundCall( base_url="yourhost.com", to_phone="+15551234567", from_phone="+15557654321", config_manager=config_manager, agent_config=agent_config, twilio_config=twilio_config, ) await call.start() ```
Step 7 — Run
```bash uvicorn main:app --host 0.0.0.0 --port 3000 \ --ssl-keyfile key.pem --ssl-certfile cert.pem ```
Pitfalls
- TLS required: Twilio Media Streams refuses non-WSS — use ngrok or a real cert.
- Redis required:
RedisConfigManageris the default; switch toInMemoryConfigManageronly for dev. - Endpointing: Punctuation-based works for English; for non-English flip to
time_basedto avoid premature cuts. - Maintenance pace: Vocode-core moves slower than LiveKit/Pipecat in 2026 — pin versions and run your own fork if you depend on niche providers.
How CallSphere does this
CallSphere ships 37 agents across 6 verticals with 90+ tools and 115+ DB tables. Older outbound campaigns still run on Vocode for the Twilio<>OpenAI path while net-new lines sit on LiveKit/Pipecat. $149/$499/$1,499 · 14-day trial · 22% affiliate.
FAQ
Vocode vs LiveKit? Vocode is simpler for a single phone-first use case; LiveKit scales better to thousands of concurrent rooms.
Zoom support? Yes — ZoomDialIn adapter joins meetings via SIP and behaves identically.
Open-source license? MIT — no royalties even at scale.
Is it still maintained? Yes, but at a lower velocity than 2023 — community forks (e.g. gitduck, niveshi) are common.
Sources
- Vocode Docs - Welcome - https://docs.vocode.dev/welcome
- GitHub - vocodedev/vocode-core - https://github.com/vocodedev/vocode-core
- Skywork - Vocode Developer's Guide - https://skywork.ai/skypage/en/vocode-developers-guide-voice-ai/1976850253883305984
- PyCon US 2026 - Real-Time Voice Agent in Python - https://us.pycon.org/2026/schedule/presentation/101/
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.