By Sagar Shankaran, Founder of CallSphere
fly.io runs voice agents close to every user. Real working fly.toml, Pipecat in Docker, and fly-replay for sticky WebSocket sessions across 35 regions.
Key takeaways
TL;DR — fly.io is the simplest way to put a voice agent within 50ms of every user worldwide. Drop a Dockerfile, declare regions in
fly.toml, andfly deployships your Pipecat bot to all of them.
A Pipecat-based voice agent containerized with Docker, deployed across iad, fra, syd. Fly's Anycast routes each user to the nearest healthy machine; the fly-replay header keeps WebRTC sessions sticky to one region for the duration of a call.
flyctl CLI installed and fly auth login.pip install pipecat-ai).OPENAI_API_KEY and DAILY_API_KEY (Pipecat default transport) saved as Fly secrets.fly.toml and a Dockerfile.flowchart LR
Ucs[US user] --> A[Anycast]
Ufr[EU user] --> A
Uau[AU user] --> A
A --> RIad[Machine in iad]
A --> RFra[Machine in fra]
A --> RSyd[Machine in syd]
bot.py:
```python import asyncio from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.runner import PipelineRunner from pipecat.pipeline.task import PipelineTask from pipecat.transports.services.daily import DailyParams, DailyTransport from pipecat.services.openai import OpenAILLMService from pipecat.services.deepgram import DeepgramSTTService from pipecat.services.cartesia import CartesiaTTSService
async def main(room_url, token): transport = DailyTransport(room_url, token, "CallSphere", DailyParams(audio_in_enabled=True, audio_out_enabled=True))
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_KEY"))
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o-mini")
tts = CartesiaTTSService(api_key=os.getenv("CARTESIA_KEY"))
pipeline = Pipeline([transport.input(), stt, llm, tts, transport.output()])
await PipelineRunner().run(PipelineTask(pipeline))
if name == "main": asyncio.run(main(os.environ["ROOM_URL"], os.environ["TOKEN"])) ```
server.py exposes a route that spawns a bot subprocess per call (Fly machines can fork):
```python from fastapi import FastAPI, Request import subprocess, uuid, os
app = FastAPI()
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
@app.post("/start") async def start(req: Request): body = await req.json() env = os.environ | {"ROOM_URL": body["room_url"], "TOKEN": body["token"]} subprocess.Popen(["python", "bot.py"], env=env) return {"id": str(uuid.uuid4())}
@app.get("/healthz") def healthz(): return {"ok": True} ```
```dockerfile FROM python:3.12-slim
RUN apt-get update && apt-get install -y ffmpeg && rm -rf /var/lib/apt/lists/*
WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 7860 CMD ["uvicorn", "server:app", "--host", "0.0.0.0", "--port", "7860"] ```
```toml app = "callsphere-voice-fly" primary_region = "iad"
[build] dockerfile = "Dockerfile"
[http_service] internal_port = 7860 force_https = true auto_stop_machines = "stop" auto_start_machines = true min_machines_running = 1 processes = ["app"]
[[vm]] cpu_kind = "shared" cpus = 2 memory = "2gb"
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
[deploy] strategy = "rolling" ```
Add regions:
```bash fly deploy fly scale count 2 --region iad fly scale count 1 --region fra fly scale count 1 --region syd ```
WebRTC SDP exchange must hit the same machine. If a request lands on iad but the session lives in syd, return the fly-replay header:
```python @app.post("/sdp") async def sdp(req: Request): sid = req.headers.get("X-Session-Id") region = lookup_region(sid) # your KV if region != os.environ["FLY_REGION"]: return Response(status_code=204, headers={"fly-replay": f"region={region}"}) return handle_sdp(await req.json()) ```
```bash fly secrets set OPENAI_API_KEY=sk-... fly secrets set DEEPGRAM_KEY=... fly secrets set CARTESIA_KEY=... fly secrets set DAILY_API_KEY=... ```
auto_stop_machines = stop — idle machines cost money.min_machines_running — first call cold-starts in 8s.performance, not shared.CallSphere's voice plane runs on a dedicated 72.62.162.83 box (k3s) for predictable latency, but we ship our affiliate dashboards (/affiliate, 22% commission) on Fly across 4 regions for low-latency partner UX. 37 agents, 90+ tools, 6 verticals — pricing $149/$499/$1499 with a 14-day trial.
Why not just one region? EU users get 200ms RTT to US-east; voice falls apart over 250ms.
Cost for 3-region voice? ~$45/mo for the warm pool + outbound bandwidth.
Volume scaling? fly scale count per region, or auto_start_machines for traffic-driven.
Can I use LiveKit? Yes — Daily and LiveKit both work on Fly.
Logs? fly logs streams from all regions.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
Haystack 2.7's Agent component plus an Ollama-served Llama 3.2 gives you tool-calling RAG with citations. Here's a complete pipeline against your own document store.
Run STT, LLM, and TTS entirely on Cloudflare's edge — no OpenAI, no ElevenLabs. Real working code with Whisper, Llama 3.3 70B, and Deepgram Aura.
Version your prompts in git, run a 50-case eval suite on every PR, block merges below threshold, and ship a new agent prompt with confidence — full GitHub Actions tutorial.
Replace expensive outbound SDR tooling with a self-hosted dialer that runs OpenAI Realtime agents at 100 concurrent calls. Full architecture and code.
HVAC companies miss 40–60% of inbound. Build a 4-agent dispatch (intake, scheduling, parts, emergency) that integrates with ServiceTitan in 600 lines.
LangChain v1 + LangGraph v1 in JS, paired with Ollama, gives you a fully local chat agent with tools, memory, and structured output. No OpenAI key required.
© 2026 CallSphere LLC. All rights reserved.
Watch how CallSphere handles real customer calls, schedules appointments, and processes payments — live.
Try Live DemoBook a DemoCalculate Your ROI