Pipecat is the framework, Pipecat Cloud is the managed runtime, but Pipecat the open-source library is what most teams actually run. Frame-based pipelines, 100+ service plugins, MCP-style subagents, native Twilio and WebRTC transports, and a Python-first API that prototypes in 30 minutes. For 2026 voice AI builders not committed to LiveKit's room model, Pipecat is the obvious choice.

Background

Pipecat (formerly DailyAI) is Daily.co's open-source voice and multimodal AI framework. The core abstraction is a Pipeline: a series of FrameProcessors that pass typed Frames (audio, text, image, transcription, function call). Each processor consumes some frames and emits others. The pattern is borrowed from GStreamer and adapted for AI.

The plugin ecosystem is the moat: 100+ services covering Deepgram, AssemblyAI, OpenAI, Anthropic, Gemini, ElevenLabs, Cartesia, Krisp, every major STT/LLM/TTS, plus transports for Twilio Streams, Vonage, Plivo, Daily WebRTC, LiveKit room, FastAPI WebSocket, and more. SDKs for Python, JavaScript, React, iOS, Android, C++.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

Pipecat Subagents (2025) added distributed multi-agent systems where each agent runs its own pipeline and they communicate over a shared message bus. NVIDIA published an official Pipecat-based blueprint for the NIM platform.

Architecture

graph LR
    A[Twilio Stream / WebRTC] --> B[Transport Input Frame]
    B --> C[VAD Processor]
    C --> D[STT Processor]
    D --> E[Context Aggregator]
    E --> F[LLM Processor]
    F --> G[TTS Processor]
    G --> H[Transport Output]
    H --> I[Twilio / WebRTC out]
    F -.->|tool call| J[Subagent message bus]

from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketTransport
from pipecat.services.deepgram import DeepgramSTTService
from pipecat.services.openai import OpenAILLMService
from pipecat.services.elevenlabs import ElevenLabsTTSService

transport = FastAPIWebsocketTransport(websocket=ws, ...)
pipeline = Pipeline([
    transport.input(),
    DeepgramSTTService(api_key=DG_KEY),
    OpenAILLMService(api_key=OAI_KEY, model="gpt-4o-realtime"),
    ElevenLabsTTSService(api_key=EL_KEY, voice_id="..."),
    transport.output(),
])
runner = PipelineRunner()
await runner.run(PipelineTask(pipeline))

CallSphere implementation

CallSphere terminates every call on Twilio across our six verticals (Healthcare AI on FastAPI :8084 to OpenAI Realtime, Real Estate AI, Sales Calling AI with 5 concurrent outbound, Salon AI, IT Helpdesk AI, After-Hours AI Twilio simul call+SMS 120-second timeout). 37 agents, 90+ tools, 115+ DB tables, HIPAA + SOC 2, $149/$499/$1499 plans, 14-day trial, 22% affiliate. We do not run Pipecat in production because our orchestration is custom-built around our 90+ tool catalog and 115+ DB tables, with tighter coupling between the agent and our domain models than Pipecat's frame abstraction encourages. For prospects evaluating Pipecat-versus-build, our reference shows the same OpenAI Realtime + Deepgram + ElevenLabs stack runs end-to-end in Pipecat in roughly 60 lines of Python; the cost is observability and tool-call governance, which our managed stack provides on top.

Build steps

pip install pipecat-ai pipecat-ai-deepgram pipecat-ai-openai pipecat-ai-elevenlabs pipecat-ai-twilio.
Choose a transport (Twilio WebSocket, Vonage WebSocket, Daily room, LiveKit room, FastAPI WebSocket).
Compose a Pipeline of FrameProcessors: input, VAD, STT, LLM, TTS, output.
Add LLMUserContextAggregator and LLMAssistantContextAggregator for conversation memory.
Define functions with FunctionSchema; the LLM processor calls them and emits FunctionResultFrame back into the pipeline.
Run with PipelineRunner; integrate metrics with the built-in observer protocol.
Deploy: Pipecat Cloud for managed, Modal/AWS Bedrock AgentCore/Cerebrium for serverless GPU, or your own Kubernetes.

Pitfalls

Frame-based pipelines have a learning curve; debug with the FrameLogger processor.
Plugin versioning per service; pip freeze and pin everything.
VAD on Silero is the default but adds 10-20 ms; on quiet calls a simpler RMS detector works.
Subagent message bus is in-process by default; for multi-host deployments you need an external bus (Redis, NATS).
Pipecat 0.0.x to 0.1.x had breaking API changes; check release notes when upgrading.

FAQ

Pipecat or LiveKit Agents? Pipecat is more flexible (any transport, any service) and Python-first. LiveKit Agents is more opinionated and tied to LiveKit rooms. Pick on whether your use case fits LiveKit's room model.

Pipecat Cloud or self-host Pipecat? Cloud for sub-100 concurrent and small teams. Self-host when you need GPU placement or compliance isolation.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

MCP support? Yes via the FunctionSchema interface; MCP tool servers wrap as Pipecat function calls.

Latency? 500-900 ms voice-to-voice typical with external models; sub-second is achievable with Modal or NVIDIA NIM blueprints.

HIPAA? Pipecat Cloud and Daily are BAA-eligible on enterprise. Self-hosted is up to you.

Sources

Start a 14-day trial of our managed AI voice, see pricing for $149/$499/$1499, or book a demo to compare a Pipecat reference build against our stack.

Pipecat Framework for AI Voice Pipelines in 2026: 100+ Services, Frame-Based Orchestration

Background

Architecture

CallSphere implementation

Build steps

Pitfalls

FAQ

Sources

Try CallSphere AI Voice Agents

Related Articles You May Like

Texto a Voz: AI Voice Generators for Spanish Markets in 2026

Female Voice Generator: AI Voices That Sound Human in 2026

Siri Voice Generator: How AI Voice Cloning Actually Works in 2026

AI Voice Assistants for Ecommerce and Small Business in 2026

Robot Text to Speech in 2026: A Founder's Guide to TTS Voices

Customer Support Specialist in 2026: AI-Augmented Role Guide

Product

Resources

Company

Legal

Industries

Integrations

Solutions

Compare

Pillar Guides