By Sagar Shankaran, Founder of CallSphere
Pipecat is the most flexible open voice framework: 20+ STT, 30+ TTS, 20+ LLMs, WebRTC and telephony. Here's a fully self-hosted Pipecat agent on FastAPI — no Pipecat Cloud needed.
Key takeaways
TL;DR — Pipecat is the closest thing to a Lego kit for voice agents: pipes that connect STT → LLM → TTS with VAD, interruption handling, and tool calls baked in. The open-source core runs anywhere — no Pipecat Cloud subscription required.
A self-hosted Pipecat 0.0.83+ agent (May 2026) that speaks via SmallWebRTC transport, uses Deepgram STT, an Ollama LLM, and Piper TTS. Browser front-end connects directly with no SFU.
uv add 'pipecat-ai[silero,deepgram,piper,openai]' or pip install.llama3.1:8b.flowchart LR
BR[Browser SmallWebRTC] -->|RTP| PC[Pipecat Pipeline]
PC --> VAD[Silero VAD]
PC --> STT[Deepgram or local]
PC --> LLM[Ollama OpenAI-shim]
PC --> TTS[Piper local]
PC -->|RTP| BR
```bash mkdir pipecat-self && cd pipecat-self uv init && uv add 'pipecat-ai[silero,deepgram,openai,piper,small-webrtc]' ```
Pipecat's plugin system means you only install what you need.
```python
import asyncio, os from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.runner import PipelineRunner from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.services.deepgram.stt import DeepgramSTTService from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.piper.tts import PiperTTSService from pipecat.transports.network.small_webrtc import SmallWebRTCTransport from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
async def run_bot(webrtc_connection): transport = SmallWebRTCTransport( webrtc_connection=webrtc_connection, params={"audio_in_enabled": True, "audio_out_enabled": True, "vad_analyzer": SileroVADAnalyzer()}) stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"]) # Ollama as an OpenAI-compatible endpoint llm = OpenAILLMService(api_key="ollama", model="llama3.1:8b", base_url="http://127.0.0.1:11434/v1") tts = PiperTTSService(base_url="http://127.0.0.1:5000", voice_id="en_US-amy-medium") ctx = OpenAILLMContext([{"role":"system","content":"Be concise."}]) pipeline = Pipeline([ transport.input(), stt, llm.create_context_aggregator(ctx).user(), llm, tts, transport.output(), llm.create_context_aggregator(ctx).assistant()]) task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True)) await PipelineRunner().run(task) ```
```bash pip install piper-tts python -m piper.http_server --model en_US-amy-medium --port 5000 ```
Pipecat's PiperTTSService expects an HTTP endpoint, not a CLI.
```python
from fastapi import FastAPI from pipecat.transports.network.small_webrtc import SmallWebRTCConnection import asyncio from bot import run_bot app = FastAPI()
@app.post("/offer") async def offer(req: dict): conn = SmallWebRTCConnection() answer = await conn.handle_offer(req["sdp"], req["type"]) asyncio.create_task(run_bot(conn)) return {"sdp": answer.sdp, "type": answer.type} ```
```html
```
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
```python from pipecat.services.llm_service import FunctionCallParams
async def book_demo(params: FunctionCallParams): await params.result_callback({"booked":True,"slot":params.arguments["slot"]})
llm.register_function("book_demo", book_demo) ctx = OpenAILLMContext( messages=[{"role":"system","content":"Use book_demo to schedule."}], tools=[{"type":"function","function":{ "name":"book_demo", "description":"Book a demo slot", "parameters":{"type":"object","properties":{ "slot":{"type":"string"}},"required":["slot"]}}}]) ```
small-webrtc) require build tools — install build-essential on Linux.allow_interruptions=True or replies cannot be cut off.api_key="ollama" (any non-empty string); empty fails the validator.CallSphere uses a similar pipeline pattern across our 37 agents in 6 verticals. Healthcare runs 14 tools on FastAPI :8084 with OpenAI Realtime; OneRoof's 10 property specialists run on Pion WebRTC; Salon, Dental, F&B and Behavioral round out the suite. 90+ tools and 115+ Postgres tables under the hood. Flat $149/$499/$1499. 14-day trial · 22% affiliate · /industries/real-estate · /demo.
Pipecat vs LiveKit Agents? Pipecat is more pipeline-flexible; LiveKit is more transport-batteries-included.
Can I use OpenAI Realtime instead of STT/LLM/TTS? Yes — pipecat.services.openai.realtime.
Phone calls? Use the twilio or telnyx transport plugins.
Multi-tenant? Run multiple PipelineTask instances behind a process pool.
Self-hosted vs Pipecat Cloud? Same code; Cloud just manages scaling.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
The voice AI market hits $47.5B by 2034. For gyms and PT studios, voice agents now make economic sense for member intake, upsells, and reactivation campaigns.
With the voice AI market at $47.5B by 2034 and OpenAI's realtime release this week, every dealership and service shop should be evaluating voice agents. Here's how.
Spring 2026 AC season starts now. With the voice AI market at $47.5B by 2034, HVAC shops without after-hours voice agents will lose to those that have them.
Haystack 2.7's Agent component plus an Ollama-served Llama 3.2 gives you tool-calling RAG with citations. Here's a complete pipeline against your own document store.
BrowserStack offers 30,000+ real devices; Sauce Labs ships deep Appium automation. Here is how AI voice agent teams use both for WebRTC mobile QA in 2026.
OpenAI's GPT-Realtime-Translate handles 70 input languages live at $0.034/min. Here is what that means for multilingual restaurant takeout — and how CallSphere ships it.
© 2026 CallSphere LLC. All rights reserved.
Watch how CallSphere handles real customer calls, schedules appointments, and processes payments — live.
Try Live DemoBook a DemoCalculate Your ROI