By Sagar Shankaran, Founder of CallSphere
Vocode-core is the modular open-source voice framework with first-class Twilio + Vonage telephony. Here's a phone-ready Vocode agent talking to Ollama with a Deepgram fallback.
Key takeaways
TL;DR — Vocode-core is the framework you reach for when you want a phone number on day one. Twilio and Vonage are first-class transports; STT/TTS/LLM are pluggable. The OSS package has feature parity with the hosted Vocode API for self-hosters.
A Twilio-connected Vocode StreamingConversation that uses Deepgram STT, an Ollama-backed OpenAI shim for the LLM, and ElevenLabs (or Coqui) TTS. Inbound calls hit a FastAPI webhook; the agent answers and chats.
pip install "vocode[all]" fastapi uvicorn.llama3.1:8b.flowchart LR
PSTN[PSTN Caller] --> TW[Twilio Programmable Voice]
TW -->|Media Streams WSS| VOC[Vocode StreamingConversation]
VOC --> DG[Deepgram STT]
VOC --> OLL[Ollama OpenAI shim]
VOC --> EL[ElevenLabs TTS]
EL --> TW
```python
from fastapi import FastAPI from vocode.streaming.telephony.server.base import TelephonyServer, TwilioInboundCallConfig from vocode.streaming.models.telephony import TwilioConfig from vocode.streaming.agent.openai_chat_agent_config import OpenAIChatAgentConfig from vocode.streaming.models.agent import ChatGPTAgentConfig from vocode.streaming.models.message import BaseMessage from vocode.streaming.transcriber.deepgram_transcriber import DeepgramTranscriberConfig from vocode.streaming.synthesizer.eleven_labs_synthesizer import ElevenLabsSynthesizerConfig import os
app = FastAPI()
agent_config = ChatGPTAgentConfig( initial_message=BaseMessage(text="Hi, this is your AI assistant. How can I help?"), prompt_preamble="You are a polite, concise phone assistant. Reply in 1-2 sentences.", model_name="llama3.1:8b", openai_api_base="http://127.0.0.1:11434/v1", openai_api_key="ollama", end_conversation_on_goodbye=True)
config_manager = ... # see Step 4
server = TelephonyServer( base_url=os.environ["BASE_URL"].lstrip("https://"), config_manager=config_manager, inbound_call_configs=[TwilioInboundCallConfig( url="/inbound_call", agent_config=agent_config, twilio_config=TwilioConfig( account_sid=os.environ["TWILIO_ACCOUNT_SID"], auth_token=os.environ["TWILIO_AUTH_TOKEN"]), transcriber_config=DeepgramTranscriberConfig.from_telephone_input_device( api_key=os.environ["DEEPGRAM_API_KEY"]), synthesizer_config=ElevenLabsSynthesizerConfig.from_telephone_output_device( api_key=os.environ["ELEVEN_LABS_API_KEY"], voice_id="EXAVITQu4vr4xnSDxMaL"))])
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
app.include_router(server.get_router()) ```
Vocode's ChatGPTAgentConfig accepts openai_api_base. Ollama exposes /v1/chat/completions, so it's plug-and-play. Add openai_api_key="ollama" (any non-empty string passes the SDK validator).
```python from vocode.streaming.synthesizer.coqui_synthesizer import CoquiSynthesizerConfig synth = CoquiSynthesizerConfig.from_telephone_output_device( voice_id="...", voice_name="amy") ```
This avoids per-character TTS spend at the cost of latency and clone-licence headache.
```python from vocode.streaming.telephony.config_manager.in_memory_config_manager import InMemoryConfigManager config_manager = InMemoryConfigManager() ```
For production, switch to RedisConfigManager so call state survives restarts.
```bash uvicorn server:app --host 0.0.0.0 --port 3000 & ngrok http 3000
```
Call your Twilio number — Vocode answers and you're talking to a fully OSS pipeline.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
```python from vocode.streaming.action.base_action import BaseAction from pydantic import BaseModel
class BookSlotParams(BaseModel): iso: str
class BookSlotAction(BaseAction[BookSlotParams, dict]): description = "Book a slot at the given ISO time." parameters_type = BookSlotParams async def run(self, action_input): # write to your DB / CRM here return {"booked": True, "iso": action_input.params.iso}
agent_config.actions = [BookSlotAction()] ```
Vocode actions are how you give the agent real-world side effects.
base_url strips scheme. Don't include https:// — Vocode adds it.mulaw 8 kHz end-to-end; don't transcode.initial_message short.CallSphere serves 37 specialist agents across 6 verticals — Healthcare's 14 tools on FastAPI :8084 with OpenAI Realtime, OneRoof's 10 specialists on Pion WebRTC, plus Salon, Dental, F&B, Behavioral — backed by 90+ tools and 115+ Postgres tables. Pricing is flat $149 / $499 / $1499 with a 14-day trial, a 22% affiliate program and full SOC 2 controls. See /pricing and /demo.
Vocode vs Pipecat? Vocode is more telephony-focused; Pipecat is more pipeline-flexible.
Vocode hosted API still alive? Yes — but the OSS core has parity for self-hosters.
Is Twilio cheap enough? ~$0.0085/min inbound + ~$0.013/min Media Streams in the US.
Can I use Vonage instead? Yes — vocode.streaming.telephony.server.vonage_*.
Tools / actions? First-class via BaseAction; works with both OpenAI and Ollama.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
A VoIP telephone number is a phone number that routes calls over the internet instead of copper lines. Learn what a VoIP number is, how to get one, what it costs, and how to pair it with an AI voice agent in 2026.
How we built a fault-tolerant HVAC emergency triage and tech-dispatch platform on Kubernetes — three-tier CQRS, 11 micro-agents on the OpenAI Agents SDK + LangGraph, NATS JetStream, DTMF/SMS/WebSocket acceptance, circuit breakers, and an evaluation pipeline that catches regressions before they wake a tech at 3 AM.
Haystack 2.7's Agent component plus an Ollama-served Llama 3.2 gives you tool-calling RAG with citations. Here's a complete pipeline against your own document store.
Run STT, LLM, and TTS entirely on Cloudflare's edge — no OpenAI, no ElevenLabs. Real working code with Whisper, Llama 3.3 70B, and Deepgram Aura.
Version your prompts in git, run a 50-case eval suite on every PR, block merges below threshold, and ship a new agent prompt with confidence — full GitHub Actions tutorial.
Replace expensive outbound SDR tooling with a self-hosted dialer that runs OpenAI Realtime agents at 100 concurrent calls. Full architecture and code.
© 2026 CallSphere LLC. All rights reserved.
Watch how CallSphere handles real customer calls, schedules appointments, and processes payments — live.
Try Live DemoBook a DemoCalculate Your ROI