Build a Voice Agent with Pipecat Self-Hosted (Open Framework, 2026)
Pipecat is the most flexible open voice framework: 20+ STT, 30+ TTS, 20+ LLMs, WebRTC and telephony. Here's a fully self-hosted Pipecat agent on FastAPI — no Pipecat Cloud needed.
TL;DR — Pipecat is the closest thing to a Lego kit for voice agents: pipes that connect STT → LLM → TTS with VAD, interruption handling, and tool calls baked in. The open-source core runs anywhere — no Pipecat Cloud subscription required.
What you'll build
A self-hosted Pipecat 0.0.83+ agent (May 2026) that speaks via SmallWebRTC transport, uses Deepgram STT, an Ollama LLM, and Piper TTS. Browser front-end connects directly with no SFU.
Prerequisites
- Python 3.12 (Pipecat needs 3.11+, recommends 3.12).
uv add 'pipecat-ai[silero,deepgram,piper,openai]'orpip install.- Ollama running with
llama3.1:8b. - Optional Deepgram key, or swap for local Whisper.
- A modern browser for the SmallWebRTC client demo.
Architecture
flowchart LR
BR[Browser SmallWebRTC] -->|RTP| PC[Pipecat Pipeline]
PC --> VAD[Silero VAD]
PC --> STT[Deepgram or local]
PC --> LLM[Ollama OpenAI-shim]
PC --> TTS[Piper local]
PC -->|RTP| BR
Step 1 — Project skeleton
```bash mkdir pipecat-self && cd pipecat-self uv init && uv add 'pipecat-ai[silero,deepgram,openai,piper,small-webrtc]' ```
Pipecat's plugin system means you only install what you need.
Step 2 — Build the bot
```python
bot.py
import asyncio, os from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.runner import PipelineRunner from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.services.deepgram.stt import DeepgramSTTService from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.piper.tts import PiperTTSService from pipecat.transports.network.small_webrtc import SmallWebRTCTransport from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
async def run_bot(webrtc_connection): transport = SmallWebRTCTransport( webrtc_connection=webrtc_connection, params={"audio_in_enabled": True, "audio_out_enabled": True, "vad_analyzer": SileroVADAnalyzer()}) stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"]) # Ollama as an OpenAI-compatible endpoint llm = OpenAILLMService(api_key="ollama", model="llama3.1:8b", base_url="http://127.0.0.1:11434/v1") tts = PiperTTSService(base_url="http://127.0.0.1:5000", voice_id="en_US-amy-medium") ctx = OpenAILLMContext([{"role":"system","content":"Be concise."}]) pipeline = Pipeline([ transport.input(), stt, llm.create_context_aggregator(ctx).user(), llm, tts, transport.output(), llm.create_context_aggregator(ctx).assistant()]) task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True)) await PipelineRunner().run(task) ```
Step 3 — Run a Piper HTTP server
```bash pip install piper-tts python -m piper.http_server --model en_US-amy-medium --port 5000 ```
Pipecat's PiperTTSService expects an HTTP endpoint, not a CLI.
Step 4 — FastAPI signaling
```python
server.py
from fastapi import FastAPI from pipecat.transports.network.small_webrtc import SmallWebRTCConnection import asyncio from bot import run_bot app = FastAPI()
@app.post("/offer") async def offer(req: dict): conn = SmallWebRTCConnection() answer = await conn.handle_offer(req["sdp"], req["type"]) asyncio.create_task(run_bot(conn)) return {"sdp": answer.sdp, "type": answer.type} ```
Step 5 — Minimal browser client
```html
```
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Step 6 — Add a tool call (function calling)
```python from pipecat.services.llm_service import FunctionCallParams
async def book_demo(params: FunctionCallParams): await params.result_callback({"booked":True,"slot":params.arguments["slot"]})
llm.register_function("book_demo", book_demo) ctx = OpenAILLMContext( messages=[{"role":"system","content":"Use book_demo to schedule."}], tools=[{"type":"function","function":{ "name":"book_demo", "description":"Book a demo slot", "parameters":{"type":"object","properties":{ "slot":{"type":"string"}},"required":["slot"]}}}]) ```
Common pitfalls
- uv vs pip extras. Some extras (
small-webrtc) require build tools — installbuild-essentialon Linux. - Interruption handling. Always set
allow_interruptions=Trueor replies cannot be cut off. - Ollama OpenAI shim. Use
api_key="ollama"(any non-empty string); empty fails the validator.
How CallSphere does this in production
CallSphere uses a similar pipeline pattern across our 37 agents in 6 verticals. Healthcare runs 14 tools on FastAPI :8084 with OpenAI Realtime; OneRoof's 10 property specialists run on Pion WebRTC; Salon, Dental, F&B and Behavioral round out the suite. 90+ tools and 115+ Postgres tables under the hood. Flat $149/$499/$1499. 14-day trial · 22% affiliate · /industries/real-estate · /demo.
FAQ
Pipecat vs LiveKit Agents? Pipecat is more pipeline-flexible; LiveKit is more transport-batteries-included.
Can I use OpenAI Realtime instead of STT/LLM/TTS? Yes — pipecat.services.openai.realtime.
Phone calls? Use the twilio or telnyx transport plugins.
Multi-tenant? Run multiple PipelineTask instances behind a process pool.
Self-hosted vs Pipecat Cloud? Same code; Cloud just manages scaling.
Sources
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.