By Sagar Shankaran, Founder of CallSphere
Deploy a voice agent to Fly.io's anycast network across 6 regions: Tokyo, Frankfurt, São Paulo, Sydney, Virginia, Los Angeles. fly-replay routes traffic to the closest healthy region.
Key takeaways
TL;DR — Fly.io routes via Anycast: a single IP, traffic hits the nearest region. Deploy your FastAPI voice bridge to 6 regions with one
fly deployandfly scale count 6 --max-per-region 1 --region nrt,fra,gru,syd,iad,lax. Voice-to-voice latency stays <500ms for 90% of the world.
A Fly Machine running the FastAPI voice agent in 6 regions, each with a local OpenAI Realtime connection. Fly's edge routes Twilio's WebSocket to the closest region; if a region is unhealthy, fly-replay reroutes to the next.
fly auth login).flyctl CLI.flowchart TD
C[Caller worldwide] --> AC[Fly Anycast IP]
AC -->|nearest region| R1[NRT Tokyo]
AC --> R2[FRA Frankfurt]
AC --> R3[GRU São Paulo]
AC --> R4[SYD Sydney]
AC --> R5[IAD Virginia]
AC --> R6[LAX Los Angeles]
R1 -->|wss| OAI[OpenAI Realtime us-east-1]
R2 -->|wss| OAI
R5 -->|wss low-latency| OAI
fly.toml```toml app = "voice-agent" primary_region = "iad"
[build] dockerfile = "Dockerfile"
[http_service] internal_port = 8080 force_https = true auto_stop_machines = false auto_start_machines = true min_machines_running = 1
[[services]] internal_port = 8080 protocol = "tcp"
[[services.ports]] handlers = ["http", "tls"] port = 443
[services.http_checks] method = "get" path = "/healthz" interval = "10s" timeout = "2s"
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
[env] AI_REGION_HINT = "$FLY_REGION" ```
```dockerfile FROM python:3.11-slim WORKDIR /app COPY requirements.txt . RUN pip install -r requirements.txt COPY . . CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8080"] ```
```bash fly launch --no-deploy fly secrets set OPENAI_API_KEY=sk-... fly deploy fly scale count 6 --max-per-region 1 --region nrt,fra,gru,syd,iad,lax ```
Each region runs one Machine. Twilio dials Fly's anycast; the closest region accepts.
OpenAI's Realtime API has lowest latency from us-east-1 (Virginia) and eu-west-1 (Ireland). Fly machines in NRT/SYD/GRU still go cross-Atlantic for OpenAI; budget +100-200ms.
For tighter latency, route by region:
```python import os REGION = os.environ.get("FLY_REGION", "iad") OAI_HOST = "wss://api.openai.com/v1/realtime" if REGION in ("iad","lax","ord","sea") else "wss://api.openai.com/v1/realtime"
```
In FRA, swap to Azure Voice Live (West Europe) for best latency.
fly-replay for failoverWhen a region is unhealthy, return fly-replay: region=iad header to immediately reroute the request to Virginia:
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
```python @app.middleware("http") async def health_replay(req, call_next): if not openai_healthy.is_set(): return Response(headers={"fly-replay": "region=iad"}, status_code=503) return await call_next(req) ```
WebSockets are stickier — replay applies to the initial handshake; once upgraded, the WS stays in that region.
Fly Postgres can run with a primary in iad and read replicas in every voice region. For voice-turn writes, use Turso (libSQL) — multi-region writeable replicas with conflict resolution.
```python import libsql_experimental as libsql db = libsql.connect("voice.db", sync_url=os.environ["TURSO_URL"], auth_token=os.environ["TURSO_TOKEN"]) ```
fly logs shows logs across all regions; Grafana Cloud + the Fly metrics integration gives latency by region. Tune by moving Machines closer to OpenAI's nearest POP.
fly-replay doesn't work mid-WebSocket — only for the upgrade request. Build reconnect on the client.shared-cpu-1x:512mb) ≈ $30/mo + bandwidth. Add Postgres replicas: ~$60/mo.CallSphere runs on bare k3s in two Hetzner regions (US + EU) at this stage; we'll move to Fly.io if/when we add APAC enterprise tenants. The Pion Go + NATS layer in our OneRoof multi-family stack is region-aware. 37 agents, 90+ tools, 115+ DB tables, 6 verticals. $149/$499/$1499, 14-day trial, 22% affiliate. The patterns above are exactly what we'd ship for global expansion.
Q: How many regions does Fly support? 35+ as of May 2026. For voice, 6-8 covers >95% of human population within 100ms.
Q: Can I run a single shared Postgres? Yes for low write volume; no for voice agents at scale. Use read replicas + Turso/CockroachDB for writes.
Q: Twilio + Fly latency? Best case (us-east + Twilio US): ~120ms WebSocket round-trip. Worst case (gru + OpenAI us-east): ~250ms.
Q: Can I do active-active across clouds? Yes — split Twilio number traffic 50/50 between Fly and a Render fallback via Twilio's region routing.
Q: Cost at 10k call-min/day? Compute $30, Postgres $60, bandwidth $20, OpenAI Realtime ~$300/day. Infra is <2% of model cost.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
The voice AI market hits $47.5B by 2034. For gyms and PT studios, voice agents now make economic sense for member intake, upsells, and reactivation campaigns.
With the voice AI market at $47.5B by 2034 and OpenAI's realtime release this week, every dealership and service shop should be evaluating voice agents. Here's how.
Spring 2026 AC season starts now. With the voice AI market at $47.5B by 2034, HVAC shops without after-hours voice agents will lose to those that have them.
Haystack 2.7's Agent component plus an Ollama-served Llama 3.2 gives you tool-calling RAG with citations. Here's a complete pipeline against your own document store.
OpenAI's GPT-Realtime-Translate handles 70 input languages live at $0.034/min. Here is what that means for multilingual restaurant takeout — and how CallSphere ships it.
OpenAI's GPT-Realtime-Translate hits 70 languages at $0.034/min. For dental practices in diverse metros, this changes who picks up the phone — and who books the appointment.
© 2026 CallSphere LLC. All rights reserved.
Watch how CallSphere handles real customer calls, schedules appointments, and processes payments — live.
Try Live DemoBook a DemoCalculate Your ROI