Skip to content
AI Infrastructure
AI Infrastructure10 min read0 views

Redis Pub/Sub for Multi-Region WebSocket Fanout in 2026

How to fan out WebSocket events across multiple regions with Redis pub/sub: shard-aware topology, federation, and the latency math that decides where to put the broker.

Stretching one Redis cluster across continents is the most expensive way to learn that pub/sub is not a database. Run a cluster per region and federate the subjects you actually need.

Why does multi-region make pub/sub harder?

flowchart TD
  Client[Client] --> Edge[Cloudflare Worker]
  Edge -->|WS upgrade| DO[Durable Object]
  DO --> AI[(OpenAI Realtime WS)]
  AI --> DO
  DO --> Client
  DO -.hibernation.-> Storage[(Persisted state)]
CallSphere reference architecture

Because pub/sub assumes the broker is "near" the subscribers. A New York publish that has to round-trip to Sydney before fanning out adds 200 ms minimum. Multiply by every event in a voice conversation and the agent feels broken.

The 2026 best practice is co-located brokers per region plus a federation layer that bridges only the subjects that need cross-region delivery. Most subjects (per-session call audio, per-tenant dashboard) never leave their region. A few (global presence, system-wide announcements) are explicitly bridged.

How does the topology actually work?

A standard pattern looks like:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
  1. Per-region Redis cluster — each region has its own clustered Redis with sharded pub/sub, sized for the local WebSocket pod count.
  2. WebSocket pods are region-pinned — a user's connection always lands in the closest region. DNS-based geo routing handles this.
  3. Federation via change-data feed — a small bridge service (often NATS or Kafka in the middle, sometimes a thin Redis-to-Redis pump) replicates a subset of channels across regions.
  4. Idempotent message envelope — every replicated event has a globally unique ID so duplicates from federation loops are dropped.

For voice agents specifically, you almost never want to replicate audio events. You want to replicate session metadata so a user who moves regions mid-call can resume.

CallSphere's implementation

CallSphere runs in two regions: us-east-1 (primary) and us-west-2 (failover and west-coast latency). The topology:

  • ElastiCache cluster per region with sharded pub/sub for the Sales Calling dashboard fan-out.
  • Cross-region federation only for tenant-level events — tenant settings updates, billing changes, affiliate commission events. None of the call audio crosses regions.
  • DynamoDB Global Tables for sticky session ownership: which region owns which call right now. Reads are local, writes replicate.

The dashboard works whether the manager is on the east or west coast because their session pins to the closer region; the data they see is replicated via federated subjects. We stress-tested across 115+ database tables at peak 37-agent load and the cross-region tail latency settled at 80 ms.

Code: federated publish with idempotency

import { createClient } from "redis";
import { randomUUID } from "crypto";

async function federatedPublish(
  channel: string,
  payload: object,
  regions: string[],
) {
  const envelope = JSON.stringify({
    id: randomUUID(),
    ts: Date.now(),
    payload,
  });
  await Promise.all(
    regions.map(async (r) => {
      const client = await getRegionalClient(r);
      await client.publish(channel, envelope);
    }),
  );
}

Build steps

  1. Stand up one Redis cluster per region. Size for local WebSocket connections, not global.
  2. Pin WebSocket pods to a region; use Route 53 or Cloudflare geo routing on the upgrade URL.
  3. Pick a federation transport. We use NATS gateway because it handles flow control between regions.
  4. Wrap every cross-region publish in an envelope with UUID; subscribers dedupe via a 60-second LRU.
  5. Make session ownership explicit — store sessionId → region in a globally consistent store and route reconnects accordingly.
  6. Alert on cross-region replication lag at 5 s; investigate at 1 s.

FAQ

Can I just use a single AWS ElastiCache global datastore? It works for cache, not for pub/sub. Pub/sub messages are ephemeral and global datastore replication is async, so subscribers in the secondary region miss events.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

What is the latency cost of federation? 60–120 ms inter-region for AWS within the US, 200–300 ms US ↔ EU. Plan accordingly.

Should I use Kafka for federation instead? Kafka is overkill for short-lived realtime events but excellent if you also need durable replay. NATS JetStream gives you both at lower ops cost.

How do I handle a region failure? Failover the WebSocket DNS, let connections drop, clients reconnect to the surviving region with their session ID. The session ownership table tells the new region "yes, replay this session."

Can I run hot-hot? Yes — every region accepts connections, federation keeps them in sync. Cost: double the broker capacity.

CallSphere is built for six verticals with multi-region failover — the realtime infrastructure is one piece. Start a 14-day trial at $149/$499/$1499 or book a demo.

Sources

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.