---
title: "Build a Multi-Region Voice Agent on Fly.io for Sub-500ms Global Latency (2026)"
description: "Deploy a voice agent to Fly.io's anycast network across 6 regions: Tokyo, Frankfurt, São Paulo, Sydney, Virginia, Los Angeles. fly-replay routes traffic to the closest healthy region."
canonical: https://callsphere.ai/blog/vw5h-build-multi-region-voice-agent-flyio-low-latency-global
category: "AI Infrastructure"
tags: ["Fly.io", "Multi-Region", "Voice Agent", "Anycast", "Tutorial"]
author: "CallSphere Team"
published: 2026-04-23T00:00:00.000Z
updated: 2026-05-07T16:30:09.048Z
---

# Build a Multi-Region Voice Agent on Fly.io for Sub-500ms Global Latency (2026)

> Deploy a voice agent to Fly.io's anycast network across 6 regions: Tokyo, Frankfurt, São Paulo, Sydney, Virginia, Los Angeles. fly-replay routes traffic to the closest healthy region.

> **TL;DR** — Fly.io routes via Anycast: a single IP, traffic hits the nearest region. Deploy your FastAPI voice bridge to 6 regions with one `fly deploy` and `fly scale count 6 --max-per-region 1 --region nrt,fra,gru,syd,iad,lax`. Voice-to-voice latency stays  AC[Fly Anycast IP]
  AC -->|nearest region| R1[NRT Tokyo]
  AC --> R2[FRA Frankfurt]
  AC --> R3[GRU São Paulo]
  AC --> R4[SYD Sydney]
  AC --> R5[IAD Virginia]
  AC --> R6[LAX Los Angeles]
  R1 -->|wss| OAI[OpenAI Realtime us-east-1]
  R2 -->|wss| OAI
  R5 -->|wss low-latency| OAI
```

## Step 1 — `fly.toml`

```toml
app = "voice-agent"
primary_region = "iad"

[build]
  dockerfile = "Dockerfile"

[http_service]
  internal_port = 8080
  force_https = true
  auto_stop_machines = false
  auto_start_machines = true
  min_machines_running = 1

[[services]]
  internal_port = 8080
  protocol = "tcp"

[[services.ports]]
    handlers = ["http", "tls"]
    port = 443

[services.http_checks]
    method = "get"
    path = "/healthz"
    interval = "10s"
    timeout = "2s"

[env]
  AI_REGION_HINT = "$FLY_REGION"
```

## Step 2 — Dockerfile

```dockerfile
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8080"]
```

## Step 3 — Deploy and spread regions

```bash
fly launch --no-deploy
fly secrets set OPENAI_API_KEY=sk-...
fly deploy
fly scale count 6 --max-per-region 1 --region nrt,fra,gru,syd,iad,lax
```

Each region runs one Machine. Twilio dials Fly's anycast; the closest region accepts.

## Step 4 — Region-aware OpenAI routing

OpenAI's Realtime API has lowest latency from `us-east-1` (Virginia) and `eu-west-1` (Ireland). Fly machines in NRT/SYD/GRU still go cross-Atlantic for OpenAI; budget +100-200ms.

For tighter latency, route by region:

```python
import os
REGION = os.environ.get("FLY_REGION", "iad")
OAI_HOST = "wss://api.openai.com/v1/realtime" if REGION in ("iad","lax","ord","sea") else "wss://api.openai.com/v1/realtime"

# OpenAI doesn't yet expose regional endpoints, so route choice is symbolic — but you can swap to Azure Voice Live (multi-region) here.

```

In FRA, swap to Azure Voice Live (West Europe) for best latency.

## Step 5 — `fly-replay` for failover

When a region is unhealthy, return `fly-replay: region=iad` header to immediately reroute the request to Virginia:

```python
@app.middleware("http")
async def health_replay(req, call_next):
    if not openai_healthy.is_set():
        return Response(headers={"fly-replay": "region=iad"}, status_code=503)
    return await call_next(req)
```

WebSockets are stickier — replay applies to the initial handshake; once upgraded, the WS stays in that region.

## Step 6 — Multi-region Postgres (Fly Postgres or Turso)

Fly Postgres can run with a primary in `iad` and read replicas in every voice region. For voice-turn writes, use Turso (libSQL) — multi-region writeable replicas with conflict resolution.

```python
import libsql_experimental as libsql
db = libsql.connect("voice.db", sync_url=os.environ["TURSO_URL"], auth_token=os.environ["TURSO_TOKEN"])
```

## Step 7 — Observability

`fly logs` shows logs across all regions; Grafana Cloud + the Fly metrics integration gives latency by region. Tune by moving Machines closer to OpenAI's nearest POP.

## Pitfalls

- **OpenAI doesn't have regional Realtime endpoints** as of May 2026 — distant regions add latency. Use Azure Voice Live for EU presence.
- **PSTN signaling**: Twilio's signaling lives in US/EU; PSTN audio still has carrier-side hops.
- **`fly-replay` doesn't work mid-WebSocket** — only for the upgrade request. Build reconnect on the client.
- **Cost**: 6 small machines (`shared-cpu-1x:512mb`) ≈ $30/mo + bandwidth. Add Postgres replicas: ~$60/mo.
- **Region pinning** for compliance: EU calls stuck in EU-only requires per-tenant routing; Fly's anycast doesn't enforce that — gate at TwiML.

## How CallSphere does this in production

CallSphere runs on bare k3s in two Hetzner regions (US + EU) at this stage; we'll move to Fly.io if/when we add APAC enterprise tenants. The Pion Go + NATS layer in our OneRoof multi-family stack is region-aware. 37 agents, 90+ tools, 115+ DB tables, 6 verticals. $149/$499/$1499, 14-day trial, 22% affiliate. The patterns above are exactly what we'd ship for global expansion.

## FAQ

**Q: How many regions does Fly support?**
35+ as of May 2026. For voice, 6-8 covers >95% of human population within 100ms.

**Q: Can I run a single shared Postgres?**
Yes for low write volume; no for voice agents at scale. Use read replicas + Turso/CockroachDB for writes.

**Q: Twilio + Fly latency?**
Best case (us-east + Twilio US): ~120ms WebSocket round-trip. Worst case (gru + OpenAI us-east): ~250ms.

**Q: Can I do active-active across clouds?**
Yes — split Twilio number traffic 50/50 between Fly and a Render fallback via Twilio's region routing.

**Q: Cost at 10k call-min/day?**
Compute $30, Postgres $60, bandwidth $20, OpenAI Realtime ~$300/day. Infra is <2% of model cost.

## Sources

- [Fly.io Regions documentation](https://fly.io/docs/reference/regions/)
- [Multi-region databases and fly-replay — Fly docs](https://fly.io/docs/blueprints/multi-region-fly-replay/)
- [Fly.io + Turso — multi-region low-latency app — DEV](https://dev.to/mpiorowski/breaking-the-myth-scalable-multi-region-low-latency-app-exists-and-will-not-cost-you-a-kidney-537a)
- [Fly.io vs Render — Northflank](https://northflank.com/blog/flyio-vs-render)
- [10 Fly.io Alternatives for Global App Deployment in 2026 — DigitalOcean](https://www.digitalocean.com/resources/articles/flyio-alternative)

---

Source: https://callsphere.ai/blog/vw5h-build-multi-region-voice-agent-flyio-low-latency-global
