Deploy a Voice Agent on fly.io with Multi-Region Routing
fly.io runs voice agents close to every user. Real working fly.toml, Pipecat in Docker, and fly-replay for sticky WebSocket sessions across 35 regions.
TL;DR — fly.io is the simplest way to put a voice agent within 50ms of every user worldwide. Drop a Dockerfile, declare regions in
fly.toml, andfly deployships your Pipecat bot to all of them.
What you'll build
A Pipecat-based voice agent containerized with Docker, deployed across iad, fra, syd. Fly's Anycast routes each user to the nearest healthy machine; the fly-replay header keeps WebRTC sessions sticky to one region for the duration of a call.
Prerequisites
flyctlCLI installed andfly auth login.- Pipecat 0.0.50+ (
pip install pipecat-ai). OPENAI_API_KEYandDAILY_API_KEY(Pipecat default transport) saved as Fly secrets.- Docker for local builds.
- A
fly.tomland aDockerfile.
Architecture
flowchart LR
Ucs[US user] --> A[Anycast]
Ufr[EU user] --> A
Uau[AU user] --> A
A --> RIad[Machine in iad]
A --> RFra[Machine in fra]
A --> RSyd[Machine in syd]
Step 1 — Pipecat bot
bot.py:
```python import asyncio from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.runner import PipelineRunner from pipecat.pipeline.task import PipelineTask from pipecat.transports.services.daily import DailyParams, DailyTransport from pipecat.services.openai import OpenAILLMService from pipecat.services.deepgram import DeepgramSTTService from pipecat.services.cartesia import CartesiaTTSService
async def main(room_url, token): transport = DailyTransport(room_url, token, "CallSphere", DailyParams(audio_in_enabled=True, audio_out_enabled=True))
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_KEY"))
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o-mini")
tts = CartesiaTTSService(api_key=os.getenv("CARTESIA_KEY"))
pipeline = Pipeline([transport.input(), stt, llm, tts, transport.output()])
await PipelineRunner().run(PipelineTask(pipeline))
if name == "main": asyncio.run(main(os.environ["ROOM_URL"], os.environ["TOKEN"])) ```
Step 2 — Tiny HTTP front
server.py exposes a route that spawns a bot subprocess per call (Fly machines can fork):
```python from fastapi import FastAPI, Request import subprocess, uuid, os
app = FastAPI()
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
@app.post("/start") async def start(req: Request): body = await req.json() env = os.environ | {"ROOM_URL": body["room_url"], "TOKEN": body["token"]} subprocess.Popen(["python", "bot.py"], env=env) return {"id": str(uuid.uuid4())}
@app.get("/healthz") def healthz(): return {"ok": True} ```
Step 3 — Dockerfile
```dockerfile FROM python:3.12-slim
RUN apt-get update && apt-get install -y ffmpeg && rm -rf /var/lib/apt/lists/*
WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 7860 CMD ["uvicorn", "server:app", "--host", "0.0.0.0", "--port", "7860"] ```
Step 4 — fly.toml
```toml app = "callsphere-voice-fly" primary_region = "iad"
[build] dockerfile = "Dockerfile"
[http_service] internal_port = 7860 force_https = true auto_stop_machines = "stop" auto_start_machines = true min_machines_running = 1 processes = ["app"]
[[vm]] cpu_kind = "shared" cpus = 2 memory = "2gb"
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
[deploy] strategy = "rolling" ```
Add regions:
```bash fly deploy fly scale count 2 --region iad fly scale count 1 --region fra fly scale count 1 --region syd ```
Step 5 — Sticky sessions with fly-replay
WebRTC SDP exchange must hit the same machine. If a request lands on iad but the session lives in syd, return the fly-replay header:
```python @app.post("/sdp") async def sdp(req: Request): sid = req.headers.get("X-Session-Id") region = lookup_region(sid) # your KV if region != os.environ["FLY_REGION"]: return Response(status_code=204, headers={"fly-replay": f"region={region}"}) return handle_sdp(await req.json()) ```
Step 6 — Set secrets
```bash fly secrets set OPENAI_API_KEY=sk-... fly secrets set DEEPGRAM_KEY=... fly secrets set CARTESIA_KEY=... fly secrets set DAILY_API_KEY=... ```
Common pitfalls
- Forgetting
auto_stop_machines = stop— idle machines cost money. - Deploying without
min_machines_running— first call cold-starts in 8s. - No fly-replay — WebRTC reconnects fail on cross-region routing.
- CPU vs performance VM — voice with VAD wants
performance, notshared.
How CallSphere does this in production
CallSphere's voice plane runs on a dedicated 72.62.162.83 box (k3s) for predictable latency, but we ship our affiliate dashboards (/affiliate, 22% commission) on Fly across 4 regions for low-latency partner UX. 37 agents, 90+ tools, 6 verticals — pricing $149/$499/$1499 with a 14-day trial.
FAQ
Why not just one region? EU users get 200ms RTT to US-east; voice falls apart over 250ms.
Cost for 3-region voice? ~$45/mo for the warm pool + outbound bandwidth.
Volume scaling? fly scale count per region, or auto_start_machines for traffic-driven.
Can I use LiveKit? Yes — Daily and LiveKit both work on Fly.
Logs? fly logs streams from all regions.
Sources
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.