Skip to content
AI Voice Agents
AI Voice Agents12 min read0 views

Build a Voice Agent with Pipecat Self-Hosted (Open Framework, 2026)

Pipecat is the most flexible open voice framework: 20+ STT, 30+ TTS, 20+ LLMs, WebRTC and telephony. Here's a fully self-hosted Pipecat agent on FastAPI — no Pipecat Cloud needed.

TL;DR — Pipecat is the closest thing to a Lego kit for voice agents: pipes that connect STT → LLM → TTS with VAD, interruption handling, and tool calls baked in. The open-source core runs anywhere — no Pipecat Cloud subscription required.

What you'll build

A self-hosted Pipecat 0.0.83+ agent (May 2026) that speaks via SmallWebRTC transport, uses Deepgram STT, an Ollama LLM, and Piper TTS. Browser front-end connects directly with no SFU.

Prerequisites

  1. Python 3.12 (Pipecat needs 3.11+, recommends 3.12).
  2. uv add 'pipecat-ai[silero,deepgram,piper,openai]' or pip install.
  3. Ollama running with llama3.1:8b.
  4. Optional Deepgram key, or swap for local Whisper.
  5. A modern browser for the SmallWebRTC client demo.

Architecture

flowchart LR
  BR[Browser SmallWebRTC] -->|RTP| PC[Pipecat Pipeline]
  PC --> VAD[Silero VAD]
  PC --> STT[Deepgram or local]
  PC --> LLM[Ollama OpenAI-shim]
  PC --> TTS[Piper local]
  PC -->|RTP| BR

Step 1 — Project skeleton

```bash mkdir pipecat-self && cd pipecat-self uv init && uv add 'pipecat-ai[silero,deepgram,openai,piper,small-webrtc]' ```

Pipecat's plugin system means you only install what you need.

Step 2 — Build the bot

```python

bot.py

import asyncio, os from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.runner import PipelineRunner from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.services.deepgram.stt import DeepgramSTTService from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.piper.tts import PiperTTSService from pipecat.transports.network.small_webrtc import SmallWebRTCTransport from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

async def run_bot(webrtc_connection): transport = SmallWebRTCTransport( webrtc_connection=webrtc_connection, params={"audio_in_enabled": True, "audio_out_enabled": True, "vad_analyzer": SileroVADAnalyzer()}) stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"]) # Ollama as an OpenAI-compatible endpoint llm = OpenAILLMService(api_key="ollama", model="llama3.1:8b", base_url="http://127.0.0.1:11434/v1") tts = PiperTTSService(base_url="http://127.0.0.1:5000", voice_id="en_US-amy-medium") ctx = OpenAILLMContext([{"role":"system","content":"Be concise."}]) pipeline = Pipeline([ transport.input(), stt, llm.create_context_aggregator(ctx).user(), llm, tts, transport.output(), llm.create_context_aggregator(ctx).assistant()]) task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True)) await PipelineRunner().run(task) ```

Step 3 — Run a Piper HTTP server

```bash pip install piper-tts python -m piper.http_server --model en_US-amy-medium --port 5000 ```

Pipecat's PiperTTSService expects an HTTP endpoint, not a CLI.

Step 4 — FastAPI signaling

```python

server.py

from fastapi import FastAPI from pipecat.transports.network.small_webrtc import SmallWebRTCConnection import asyncio from bot import run_bot app = FastAPI()

@app.post("/offer") async def offer(req: dict): conn = SmallWebRTCConnection() answer = await conn.handle_offer(req["sdp"], req["type"]) asyncio.create_task(run_bot(conn)) return {"sdp": answer.sdp, "type": answer.type} ```

Step 5 — Minimal browser client

```html

```

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Step 6 — Add a tool call (function calling)

```python from pipecat.services.llm_service import FunctionCallParams

async def book_demo(params: FunctionCallParams): await params.result_callback({"booked":True,"slot":params.arguments["slot"]})

llm.register_function("book_demo", book_demo) ctx = OpenAILLMContext( messages=[{"role":"system","content":"Use book_demo to schedule."}], tools=[{"type":"function","function":{ "name":"book_demo", "description":"Book a demo slot", "parameters":{"type":"object","properties":{ "slot":{"type":"string"}},"required":["slot"]}}}]) ```

Common pitfalls

  • uv vs pip extras. Some extras (small-webrtc) require build tools — install build-essential on Linux.
  • Interruption handling. Always set allow_interruptions=True or replies cannot be cut off.
  • Ollama OpenAI shim. Use api_key="ollama" (any non-empty string); empty fails the validator.

How CallSphere does this in production

CallSphere uses a similar pipeline pattern across our 37 agents in 6 verticals. Healthcare runs 14 tools on FastAPI :8084 with OpenAI Realtime; OneRoof's 10 property specialists run on Pion WebRTC; Salon, Dental, F&B and Behavioral round out the suite. 90+ tools and 115+ Postgres tables under the hood. Flat $149/$499/$1499. 14-day trial · 22% affiliate · /industries/real-estate · /demo.

FAQ

Pipecat vs LiveKit Agents? Pipecat is more pipeline-flexible; LiveKit is more transport-batteries-included.

Can I use OpenAI Realtime instead of STT/LLM/TTS? Yes — pipecat.services.openai.realtime.

Phone calls? Use the twilio or telnyx transport plugins.

Multi-tenant? Run multiple PipelineTask instances behind a process pool.

Self-hosted vs Pipecat Cloud? Same code; Cloud just manages scaling.

Sources

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.