By Sagar Shankaran, Founder of CallSphere
Pipecat 0.0.7x ships a frame-based pipeline for real-time voice. Wire Daily WebRTC, Deepgram, GPT-4o, and Cartesia into a working agent — code + pitfalls.
Key takeaways
TL;DR — Pipecat is an open-source frame-based pipeline framework from Daily.co. You compose a list of processors (transport → STT → context → LLM → TTS → transport) and Pipecat handles every microsecond of timing, interruption, and back-pressure between them.
A Daily room voice agent that joins a call, listens with Deepgram, reasons with GPT-4o, and speaks back with Cartesia Sonic-3 — running locally on python bot.py and deployable to Daily Bots, Cerebrium, or Modal.
flowchart LR
RM[Daily room] --> TR[DailyTransport]
TR --> STT[Deepgram STT]
STT --> CTX[OpenAILLMContext]
CTX --> LLM[OpenAI GPT-4o]
LLM --> TTS[Cartesia Sonic-3]
TTS --> TR --> RM
```bash python -m venv .venv && source .venv/bin/activate pip install "pipecat-ai[daily,deepgram,openai,cartesia,silero]" ```
```python import os, asyncio from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.runner import PipelineRunner from pipecat.pipeline.task import PipelineTask, PipelineParams from pipecat.transports.services.daily import DailyTransport, DailyParams from pipecat.services.deepgram.stt import DeepgramSTTService from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.cartesia.tts import CartesiaTTSService from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.audio.vad.silero import SileroVADAnalyzer ```
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
```python async def main(room_url: str, token: str): transport = DailyTransport( room_url, token, "Pipecat Bot", DailyParams(audio_in_enabled=True, audio_out_enabled=True, vad_analyzer=SileroVADAnalyzer()), ) stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"]) llm = OpenAILLMService(api_key=os.environ["OPENAI_API_KEY"], model="gpt-4o") tts = CartesiaTTSService( api_key=os.environ["CARTESIA_API_KEY"], voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", model="sonic-3", ) ctx = OpenAILLMContext([{"role": "system", "content": "You are a helpful clinic concierge."}]) agg = llm.create_context_aggregator(ctx) pipeline = Pipeline([ transport.input(), stt, agg.user(), llm, tts, transport.output(), agg.assistant(), ]) task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True)) await PipelineRunner().run(task) ```
```python @transport.event_handler("on_first_participant_joined") async def on_join(transport, participant): await transport.capture_participant_transcription(participant["id"]) await task.queue_frames([ ctx.create_user_message("Greet the caller and ask how you can help.") ]) ```
```bash DEEPGRAM_API_KEY=... OPENAI_API_KEY=... CARTESIA_API_KEY=... \ python bot.py --url https://yourorg.daily.co/agent --token ${DAILY_TOKEN} ```
Pipecat's OpenAILLMContext supports the OpenAI tools schema directly. Add tools=[...] and the LLM service emits FunctionCallInProgressFrame / FunctionCallResultFrame you handle with llm.register_function("name", handler).
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
SileroVADAnalyzer on the transport, NOT in the pipeline — frames must be VAD-tagged before they reach the aggregator.agg.user() BEFORE the LLM, agg.assistant() AFTER the TTS — reversing it loses tool messages.allow_interruptions: Off by default in some templates; turn it on or the agent talks over the user.CallSphere runs 37 agents in 6 verticals with 90+ tools and 115+ DB tables. Pipecat powers the salon and behavioral health products at a steady ~680ms p50. $149/$499/$1,499 plans, 14-day trial, 22% affiliate.
Pipecat vs LiveKit Agents? Pipecat is lower-level — you control every frame. LiveKit Agents is higher-level with built-in dispatch.
Can I swap transports? Yes — DailyTransport, LiveKitTransport, WebsocketServerTransport, FastAPIWebsocketTransport, and Twilio all share the same interface.
Is it production-ready? NVIDIA NIM ships Pipecat as their reference voice agent blueprint and AWS published a multi-part guide pairing it with Bedrock.
How do I observe it? Pipecat emits OpenTelemetry spans for every processor — point your collector at the runner.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
The voice AI market hits $47.5B by 2034. For gyms and PT studios, voice agents now make economic sense for member intake, upsells, and reactivation campaigns.
With the voice AI market at $47.5B by 2034 and OpenAI's realtime release this week, every dealership and service shop should be evaluating voice agents. Here's how.
Spring 2026 AC season starts now. With the voice AI market at $47.5B by 2034, HVAC shops without after-hours voice agents will lose to those that have them.
OpenAI's GPT-Realtime-Translate handles 70 input languages live at $0.034/min. Here is what that means for multilingual restaurant takeout — and how CallSphere ships it.
OpenAI's GPT-Realtime-Translate hits 70 languages at $0.034/min. For dental practices in diverse metros, this changes who picks up the phone — and who books the appointment.
Google Cloud Next rebranded Vertex AI as Gemini Enterprise Agent Platform with 2M context. Here is what that means for salon and beauty bookings — and where CallSphere fits.
© 2026 CallSphere LLC. All rights reserved.
Watch how CallSphere handles real customer calls, schedules appointments, and processes payments — live.