Skip to content
AI Infrastructure
AI Infrastructure11 min read0 views

Build a Multi-Region Voice Agent on Fly.io for Sub-500ms Global Latency (2026)

Deploy a voice agent to Fly.io's anycast network across 6 regions: Tokyo, Frankfurt, São Paulo, Sydney, Virginia, Los Angeles. fly-replay routes traffic to the closest healthy region.

TL;DR — Fly.io routes via Anycast: a single IP, traffic hits the nearest region. Deploy your FastAPI voice bridge to 6 regions with one fly deploy and fly scale count 6 --max-per-region 1 --region nrt,fra,gru,syd,iad,lax. Voice-to-voice latency stays <500ms for 90% of the world.

What you'll build

A Fly Machine running the FastAPI voice agent in 6 regions, each with a local OpenAI Realtime connection. Fly's edge routes Twilio's WebSocket to the closest region; if a region is unhealthy, fly-replay reroutes to the next.

Prerequisites

  1. Fly.io account (fly auth login).
  2. Existing FastAPI voice bridge.
  3. Twilio number, OpenAI API key.
  4. flyctl CLI.

Architecture

flowchart TD
  C[Caller worldwide] --> AC[Fly Anycast IP]
  AC -->|nearest region| R1[NRT Tokyo]
  AC --> R2[FRA Frankfurt]
  AC --> R3[GRU São Paulo]
  AC --> R4[SYD Sydney]
  AC --> R5[IAD Virginia]
  AC --> R6[LAX Los Angeles]
  R1 -->|wss| OAI[OpenAI Realtime us-east-1]
  R2 -->|wss| OAI
  R5 -->|wss low-latency| OAI

Step 1 — fly.toml

```toml app = "voice-agent" primary_region = "iad"

[build] dockerfile = "Dockerfile"

[http_service] internal_port = 8080 force_https = true auto_stop_machines = false auto_start_machines = true min_machines_running = 1

[[services]] internal_port = 8080 protocol = "tcp"

[[services.ports]] handlers = ["http", "tls"] port = 443

[services.http_checks] method = "get" path = "/healthz" interval = "10s" timeout = "2s"

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

[env] AI_REGION_HINT = "$FLY_REGION" ```

Step 2 — Dockerfile

```dockerfile FROM python:3.11-slim WORKDIR /app COPY requirements.txt . RUN pip install -r requirements.txt COPY . . CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8080"] ```

Step 3 — Deploy and spread regions

```bash fly launch --no-deploy fly secrets set OPENAI_API_KEY=sk-... fly deploy fly scale count 6 --max-per-region 1 --region nrt,fra,gru,syd,iad,lax ```

Each region runs one Machine. Twilio dials Fly's anycast; the closest region accepts.

Step 4 — Region-aware OpenAI routing

OpenAI's Realtime API has lowest latency from us-east-1 (Virginia) and eu-west-1 (Ireland). Fly machines in NRT/SYD/GRU still go cross-Atlantic for OpenAI; budget +100-200ms.

For tighter latency, route by region:

```python import os REGION = os.environ.get("FLY_REGION", "iad") OAI_HOST = "wss://api.openai.com/v1/realtime" if REGION in ("iad","lax","ord","sea") else "wss://api.openai.com/v1/realtime"

OpenAI doesn't yet expose regional endpoints, so route choice is symbolic — but you can swap to Azure Voice Live (multi-region) here.

```

In FRA, swap to Azure Voice Live (West Europe) for best latency.

Step 5 — fly-replay for failover

When a region is unhealthy, return fly-replay: region=iad header to immediately reroute the request to Virginia:

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

```python @app.middleware("http") async def health_replay(req, call_next): if not openai_healthy.is_set(): return Response(headers={"fly-replay": "region=iad"}, status_code=503) return await call_next(req) ```

WebSockets are stickier — replay applies to the initial handshake; once upgraded, the WS stays in that region.

Step 6 — Multi-region Postgres (Fly Postgres or Turso)

Fly Postgres can run with a primary in iad and read replicas in every voice region. For voice-turn writes, use Turso (libSQL) — multi-region writeable replicas with conflict resolution.

```python import libsql_experimental as libsql db = libsql.connect("voice.db", sync_url=os.environ["TURSO_URL"], auth_token=os.environ["TURSO_TOKEN"]) ```

Step 7 — Observability

fly logs shows logs across all regions; Grafana Cloud + the Fly metrics integration gives latency by region. Tune by moving Machines closer to OpenAI's nearest POP.

Pitfalls

  • OpenAI doesn't have regional Realtime endpoints as of May 2026 — distant regions add latency. Use Azure Voice Live for EU presence.
  • PSTN signaling: Twilio's signaling lives in US/EU; PSTN audio still has carrier-side hops.
  • fly-replay doesn't work mid-WebSocket — only for the upgrade request. Build reconnect on the client.
  • Cost: 6 small machines (shared-cpu-1x:512mb) ≈ $30/mo + bandwidth. Add Postgres replicas: ~$60/mo.
  • Region pinning for compliance: EU calls stuck in EU-only requires per-tenant routing; Fly's anycast doesn't enforce that — gate at TwiML.

How CallSphere does this in production

CallSphere runs on bare k3s in two Hetzner regions (US + EU) at this stage; we'll move to Fly.io if/when we add APAC enterprise tenants. The Pion Go + NATS layer in our OneRoof multi-family stack is region-aware. 37 agents, 90+ tools, 115+ DB tables, 6 verticals. $149/$499/$1499, 14-day trial, 22% affiliate. The patterns above are exactly what we'd ship for global expansion.

FAQ

Q: How many regions does Fly support? 35+ as of May 2026. For voice, 6-8 covers >95% of human population within 100ms.

Q: Can I run a single shared Postgres? Yes for low write volume; no for voice agents at scale. Use read replicas + Turso/CockroachDB for writes.

Q: Twilio + Fly latency? Best case (us-east + Twilio US): ~120ms WebSocket round-trip. Worst case (gru + OpenAI us-east): ~250ms.

Q: Can I do active-active across clouds? Yes — split Twilio number traffic 50/50 between Fly and a Render fallback via Twilio's region routing.

Q: Cost at 10k call-min/day? Compute $30, Postgres $60, bandwidth $20, OpenAI Realtime ~$300/day. Infra is <2% of model cost.

Sources

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.