Build a Multi-Region Voice Agent on Fly.io for Sub-500ms Global Latency (2026)
Deploy a voice agent to Fly.io's anycast network across 6 regions: Tokyo, Frankfurt, São Paulo, Sydney, Virginia, Los Angeles. fly-replay routes traffic to the closest healthy region.
TL;DR — Fly.io routes via Anycast: a single IP, traffic hits the nearest region. Deploy your FastAPI voice bridge to 6 regions with one
fly deployandfly scale count 6 --max-per-region 1 --region nrt,fra,gru,syd,iad,lax. Voice-to-voice latency stays <500ms for 90% of the world.
What you'll build
A Fly Machine running the FastAPI voice agent in 6 regions, each with a local OpenAI Realtime connection. Fly's edge routes Twilio's WebSocket to the closest region; if a region is unhealthy, fly-replay reroutes to the next.
Prerequisites
- Fly.io account (
fly auth login). - Existing FastAPI voice bridge.
- Twilio number, OpenAI API key.
flyctlCLI.
Architecture
flowchart TD
C[Caller worldwide] --> AC[Fly Anycast IP]
AC -->|nearest region| R1[NRT Tokyo]
AC --> R2[FRA Frankfurt]
AC --> R3[GRU São Paulo]
AC --> R4[SYD Sydney]
AC --> R5[IAD Virginia]
AC --> R6[LAX Los Angeles]
R1 -->|wss| OAI[OpenAI Realtime us-east-1]
R2 -->|wss| OAI
R5 -->|wss low-latency| OAI
Step 1 — fly.toml
```toml app = "voice-agent" primary_region = "iad"
[build] dockerfile = "Dockerfile"
[http_service] internal_port = 8080 force_https = true auto_stop_machines = false auto_start_machines = true min_machines_running = 1
[[services]] internal_port = 8080 protocol = "tcp"
[[services.ports]] handlers = ["http", "tls"] port = 443
[services.http_checks] method = "get" path = "/healthz" interval = "10s" timeout = "2s"
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
[env] AI_REGION_HINT = "$FLY_REGION" ```
Step 2 — Dockerfile
```dockerfile FROM python:3.11-slim WORKDIR /app COPY requirements.txt . RUN pip install -r requirements.txt COPY . . CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8080"] ```
Step 3 — Deploy and spread regions
```bash fly launch --no-deploy fly secrets set OPENAI_API_KEY=sk-... fly deploy fly scale count 6 --max-per-region 1 --region nrt,fra,gru,syd,iad,lax ```
Each region runs one Machine. Twilio dials Fly's anycast; the closest region accepts.
Step 4 — Region-aware OpenAI routing
OpenAI's Realtime API has lowest latency from us-east-1 (Virginia) and eu-west-1 (Ireland). Fly machines in NRT/SYD/GRU still go cross-Atlantic for OpenAI; budget +100-200ms.
For tighter latency, route by region:
```python import os REGION = os.environ.get("FLY_REGION", "iad") OAI_HOST = "wss://api.openai.com/v1/realtime" if REGION in ("iad","lax","ord","sea") else "wss://api.openai.com/v1/realtime"
OpenAI doesn't yet expose regional endpoints, so route choice is symbolic — but you can swap to Azure Voice Live (multi-region) here.
```
In FRA, swap to Azure Voice Live (West Europe) for best latency.
Step 5 — fly-replay for failover
When a region is unhealthy, return fly-replay: region=iad header to immediately reroute the request to Virginia:
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
```python @app.middleware("http") async def health_replay(req, call_next): if not openai_healthy.is_set(): return Response(headers={"fly-replay": "region=iad"}, status_code=503) return await call_next(req) ```
WebSockets are stickier — replay applies to the initial handshake; once upgraded, the WS stays in that region.
Step 6 — Multi-region Postgres (Fly Postgres or Turso)
Fly Postgres can run with a primary in iad and read replicas in every voice region. For voice-turn writes, use Turso (libSQL) — multi-region writeable replicas with conflict resolution.
```python import libsql_experimental as libsql db = libsql.connect("voice.db", sync_url=os.environ["TURSO_URL"], auth_token=os.environ["TURSO_TOKEN"]) ```
Step 7 — Observability
fly logs shows logs across all regions; Grafana Cloud + the Fly metrics integration gives latency by region. Tune by moving Machines closer to OpenAI's nearest POP.
Pitfalls
- OpenAI doesn't have regional Realtime endpoints as of May 2026 — distant regions add latency. Use Azure Voice Live for EU presence.
- PSTN signaling: Twilio's signaling lives in US/EU; PSTN audio still has carrier-side hops.
fly-replaydoesn't work mid-WebSocket — only for the upgrade request. Build reconnect on the client.- Cost: 6 small machines (
shared-cpu-1x:512mb) ≈ $30/mo + bandwidth. Add Postgres replicas: ~$60/mo. - Region pinning for compliance: EU calls stuck in EU-only requires per-tenant routing; Fly's anycast doesn't enforce that — gate at TwiML.
How CallSphere does this in production
CallSphere runs on bare k3s in two Hetzner regions (US + EU) at this stage; we'll move to Fly.io if/when we add APAC enterprise tenants. The Pion Go + NATS layer in our OneRoof multi-family stack is region-aware. 37 agents, 90+ tools, 115+ DB tables, 6 verticals. $149/$499/$1499, 14-day trial, 22% affiliate. The patterns above are exactly what we'd ship for global expansion.
FAQ
Q: How many regions does Fly support? 35+ as of May 2026. For voice, 6-8 covers >95% of human population within 100ms.
Q: Can I run a single shared Postgres? Yes for low write volume; no for voice agents at scale. Use read replicas + Turso/CockroachDB for writes.
Q: Twilio + Fly latency? Best case (us-east + Twilio US): ~120ms WebSocket round-trip. Worst case (gru + OpenAI us-east): ~250ms.
Q: Can I do active-active across clouds? Yes — split Twilio number traffic 50/50 between Fly and a Render fallback via Twilio's region routing.
Q: Cost at 10k call-min/day? Compute $30, Postgres $60, bandwidth $20, OpenAI Realtime ~$300/day. Infra is <2% of model cost.
Sources
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.