---
title: "Build a Voice Agent with Pipecat Self-Hosted (Open Framework, 2026)"
description: "Pipecat is the most flexible open voice framework: 20+ STT, 30+ TTS, 20+ LLMs, WebRTC and telephony. Here's a fully self-hosted Pipecat agent on FastAPI — no Pipecat Cloud needed."
canonical: https://callsphere.ai/blog/vw4h-build-voice-agent-pipecat-self-hosted
category: "AI Voice Agents"
tags: ["Pipecat", "Self-hosted", "WebRTC", "Voice Agent", "Tutorial"]
author: "CallSphere Team"
published: 2026-04-09T00:00:00.000Z
updated: 2026-05-07T16:13:46.054Z
---

# Build a Voice Agent with Pipecat Self-Hosted (Open Framework, 2026)

> Pipecat is the most flexible open voice framework: 20+ STT, 30+ TTS, 20+ LLMs, WebRTC and telephony. Here's a fully self-hosted Pipecat agent on FastAPI — no Pipecat Cloud needed.

> **TL;DR** — Pipecat is the closest thing to a Lego kit for voice agents: pipes that connect STT → LLM → TTS with VAD, interruption handling, and tool calls baked in. The open-source core runs anywhere — no Pipecat Cloud subscription required.

## What you'll build

A self-hosted Pipecat 0.0.83+ agent (May 2026) that speaks via SmallWebRTC transport, uses Deepgram STT, an Ollama LLM, and Piper TTS. Browser front-end connects directly with no SFU.

## Prerequisites

1. Python 3.12 (Pipecat needs 3.11+, recommends 3.12).
2. `uv add 'pipecat-ai[silero,deepgram,piper,openai]'` or `pip install`.
3. Ollama running with `llama3.1:8b`.
4. Optional Deepgram key, or swap for local Whisper.
5. A modern browser for the SmallWebRTC client demo.

## Architecture

```mermaid
flowchart LR
  BR[Browser SmallWebRTC] -->|RTP| PC[Pipecat Pipeline]
  PC --> VAD[Silero VAD]
  PC --> STT[Deepgram or local]
  PC --> LLM[Ollama OpenAI-shim]
  PC --> TTS[Piper local]
  PC -->|RTP| BR
```

## Step 1 — Project skeleton

```bash
mkdir pipecat-self && cd pipecat-self
uv init && uv add 'pipecat-ai[silero,deepgram,openai,piper,small-webrtc]'
```

Pipecat's plugin system means you only install what you need.

## Step 2 — Build the bot

```python

# bot.py

import asyncio, os
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.piper.tts import PiperTTSService
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext

async def run_bot(webrtc_connection):
    transport = SmallWebRTCTransport(
        webrtc_connection=webrtc_connection,
        params={"audio_in_enabled": True, "audio_out_enabled": True,
                "vad_analyzer": SileroVADAnalyzer()})
    stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
    # Ollama as an OpenAI-compatible endpoint
    llm = OpenAILLMService(api_key="ollama", model="llama3.1:8b",
                           base_url="[http://127.0.0.1:11434/v1](http://127.0.0.1:11434/v1)")
    tts = PiperTTSService(base_url="[http://127.0.0.1:5000](http://127.0.0.1:5000)",
                          voice_id="en_US-amy-medium")
    ctx = OpenAILLMContext([{"role":"system","content":"Be concise."}])
    pipeline = Pipeline([
        transport.input(), stt, llm.create_context_aggregator(ctx).user(),
        llm, tts, transport.output(),
        llm.create_context_aggregator(ctx).assistant()])
    task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
    await PipelineRunner().run(task)
```

## Step 3 — Run a Piper HTTP server

```bash
pip install piper-tts
python -m piper.http_server --model en_US-amy-medium --port 5000
```

Pipecat's `PiperTTSService` expects an HTTP endpoint, not a CLI.

## Step 4 — FastAPI signaling

```python

# server.py

from fastapi import FastAPI
from pipecat.transports.network.small_webrtc import SmallWebRTCConnection
import asyncio
from bot import run_bot
app = FastAPI()

@app.post("/offer")
async def offer(req: dict):
    conn = SmallWebRTCConnection()
    answer = await conn.handle_offer(req["sdp"], req["type"])
    asyncio.create_task(run_bot(conn))
    return {"sdp": answer.sdp, "type": answer.type}
```

## Step 5 — Minimal browser client

```html

```

## Step 6 — Add a tool call (function calling)

```python
from pipecat.services.llm_service import FunctionCallParams

async def book_demo(params: FunctionCallParams):
    await params.result_callback({"booked":True,"slot":params.arguments["slot"]})

llm.register_function("book_demo", book_demo)
ctx = OpenAILLMContext(
  messages=[{"role":"system","content":"Use book_demo to schedule."}],
  tools=[{"type":"function","function":{
    "name":"book_demo",
    "description":"Book a demo slot",
    "parameters":{"type":"object","properties":{
      "slot":{"type":"string"}},"required":["slot"]}}}])
```

## Common pitfalls

- **uv vs pip extras.** Some extras (`small-webrtc`) require build tools — install `build-essential` on Linux.
- **Interruption handling.** Always set `allow_interruptions=True` or replies cannot be cut off.
- **Ollama OpenAI shim.** Use `api_key="ollama"` (any non-empty string); empty fails the validator.

## How CallSphere does this in production

CallSphere uses a similar pipeline pattern across our 37 agents in 6 verticals. Healthcare runs 14 tools on FastAPI :8084 with OpenAI Realtime; OneRoof's 10 property specialists run on Pion WebRTC; Salon, Dental, F&B and Behavioral round out the suite. 90+ tools and 115+ Postgres tables under the hood. Flat $149/$499/$1499. [14-day trial](/trial) · [22% affiliate](/affiliate) · [/industries/real-estate](/industries/real-estate) · [/demo](/demo).

## FAQ

**Pipecat vs LiveKit Agents?** Pipecat is more pipeline-flexible; LiveKit is more transport-batteries-included.

**Can I use OpenAI Realtime instead of STT/LLM/TTS?** Yes — `pipecat.services.openai.realtime`.

**Phone calls?** Use the `twilio` or `telnyx` transport plugins.

**Multi-tenant?** Run multiple PipelineTask instances behind a process pool.

**Self-hosted vs Pipecat Cloud?** Same code; Cloud just manages scaling.

## Sources

- [Pipecat on GitHub](https://github.com/pipecat-ai/pipecat)
- [Pipecat docs](https://docs.pipecat.ai/getting-started/introduction)
- [AssemblyAI Pipecat tutorial](https://www.assemblyai.com/blog/building-a-voice-agent-with-pipecat)
- [Modal one-second voice-to-voice](https://modal.com/blog/low-latency-voice-bot)

---

Source: https://callsphere.ai/blog/vw4h-build-voice-agent-pipecat-self-hosted