---
title: "Deploy a Voice Agent on fly.io with Multi-Region Routing"
description: "fly.io runs voice agents close to every user. Real working fly.toml, Pipecat in Docker, and fly-replay for sticky WebSocket sessions across 35 regions."
canonical: https://callsphere.ai/blog/vw2h-deploy-voice-agent-fly-io-multi-region
category: "AI Infrastructure"
tags: ["Tutorial", "Build", "fly.io", "Multi-region", "Pipecat", "WebSocket"]
author: "CallSphere Team"
published: 2026-04-25T00:00:00.000Z
updated: 2026-05-07T09:27:42.428Z
---

# Deploy a Voice Agent on fly.io with Multi-Region Routing

> fly.io runs voice agents close to every user. Real working fly.toml, Pipecat in Docker, and fly-replay for sticky WebSocket sessions across 35 regions.

> **TL;DR** — fly.io is the simplest way to put a voice agent within 50ms of every user worldwide. Drop a Dockerfile, declare regions in `fly.toml`, and `fly deploy` ships your Pipecat bot to all of them.

## What you'll build

A Pipecat-based voice agent containerized with Docker, deployed across `iad`, `fra`, `syd`. Fly's Anycast routes each user to the nearest healthy machine; the `fly-replay` header keeps WebRTC sessions sticky to one region for the duration of a call.

## Prerequisites

1. `flyctl` CLI installed and `fly auth login`.
2. Pipecat 0.0.50+ (`pip install pipecat-ai`).
3. `OPENAI_API_KEY` and `DAILY_API_KEY` (Pipecat default transport) saved as Fly secrets.
4. Docker for local builds.
5. A `fly.toml` and a `Dockerfile`.

## Architecture

```mermaid
flowchart LR
  Ucs[US user] --> A[Anycast]
  Ufr[EU user] --> A
  Uau[AU user] --> A
  A --> RIad[Machine in iad]
  A --> RFra[Machine in fra]
  A --> RSyd[Machine in syd]
```

## Step 1 — Pipecat bot

`bot.py`:

```python
import asyncio
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.services.openai import OpenAILLMService
from pipecat.services.deepgram import DeepgramSTTService
from pipecat.services.cartesia import CartesiaTTSService

async def main(room_url, token):
    transport = DailyTransport(room_url, token, "CallSphere",
        DailyParams(audio_in_enabled=True, audio_out_enabled=True))

```
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_KEY"))
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o-mini")
tts = CartesiaTTSService(api_key=os.getenv("CARTESIA_KEY"))

pipeline = Pipeline([transport.input(), stt, llm, tts, transport.output()])
await PipelineRunner().run(PipelineTask(pipeline))
```

if **name** == "**main**":
    asyncio.run(main(os.environ["ROOM_URL"], os.environ["TOKEN"]))
```

## Step 2 — Tiny HTTP front

`server.py` exposes a route that spawns a bot subprocess per call (Fly machines can fork):

```python
from fastapi import FastAPI, Request
import subprocess, uuid, os

app = FastAPI()

@app.post("/start")
async def start(req: Request):
    body = await req.json()
    env = os.environ | {"ROOM_URL": body["room_url"], "TOKEN": body["token"]}
    subprocess.Popen(["python", "bot.py"], env=env)
    return {"id": str(uuid.uuid4())}

@app.get("/healthz")
def healthz(): return {"ok": True}
```

## Step 3 — Dockerfile

```dockerfile
FROM python:3.12-slim

RUN apt-get update && apt-get install -y ffmpeg && rm -rf /var/lib/apt/lists/*

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

EXPOSE 7860
CMD ["uvicorn", "server:app", "--host", "0.0.0.0", "--port", "7860"]
```

## Step 4 — fly.toml

```toml
app = "callsphere-voice-fly"
primary_region = "iad"

[build]
  dockerfile = "Dockerfile"

[http_service]
  internal_port = 7860
  force_https = true
  auto_stop_machines = "stop"
  auto_start_machines = true
  min_machines_running = 1
  processes = ["app"]

[[vm]]
  cpu_kind = "shared"
  cpus = 2
  memory = "2gb"

[deploy]
  strategy = "rolling"
```

Add regions:

```bash
fly deploy
fly scale count 2 --region iad
fly scale count 1 --region fra
fly scale count 1 --region syd
```

## Step 5 — Sticky sessions with fly-replay

WebRTC SDP exchange must hit the same machine. If a request lands on `iad` but the session lives in `syd`, return the `fly-replay` header:

```python
@app.post("/sdp")
async def sdp(req: Request):
    sid = req.headers.get("X-Session-Id")
    region = lookup_region(sid)  # your KV
    if region != os.environ["FLY_REGION"]:
        return Response(status_code=204, headers={"fly-replay": f"region={region}"})
    return handle_sdp(await req.json())
```

## Step 6 — Set secrets

```bash
fly secrets set OPENAI_API_KEY=sk-...
fly secrets set DEEPGRAM_KEY=...
fly secrets set CARTESIA_KEY=...
fly secrets set DAILY_API_KEY=...
```

## Common pitfalls

- **Forgetting `auto_stop_machines = stop`** — idle machines cost money.
- **Deploying without `min_machines_running`** — first call cold-starts in 8s.
- **No fly-replay** — WebRTC reconnects fail on cross-region routing.
- **CPU vs performance VM** — voice with VAD wants `performance`, not `shared`.

## How CallSphere does this in production

CallSphere's voice plane runs on a dedicated 72.62.162.83 box (k3s) for predictable latency, but we ship our **affiliate dashboards** ([/affiliate](/affiliate), 22% commission) on Fly across 4 regions for low-latency partner UX. 37 agents, 90+ tools, 6 verticals — pricing $149/$499/$1499 with a 14-day [trial](/trial).

## FAQ

**Why not just one region?** EU users get 200ms RTT to US-east; voice falls apart over 250ms.

**Cost for 3-region voice?** ~$45/mo for the warm pool + outbound bandwidth.

**Volume scaling?** `fly scale count` per region, or `auto_start_machines` for traffic-driven.

**Can I use LiveKit?** Yes — Daily and LiveKit both work on Fly.

**Logs?** `fly logs` streams from all regions.

## Sources

- [Pipecat Fly.io guide](https://docs.pipecat.ai/deployment/platforms/fly)
- [Fly multi-region docs](https://fly.io/docs/rails/advanced-guides/multi-region/)
- [Fly WebSockets blog](https://fly.io/blog/websockets-and-fly/)
- [Fly machine placement](https://fly.io/docs/machines/guides-examples/machine-placement/)

---

Source: https://callsphere.ai/blog/vw2h-deploy-voice-agent-fly-io-multi-region