---
title: "Blue/Green Voice Agent Deploys with WebSocket Sticky Sessions (2026)"
description: "Blue/green deploy an AI voice agent without dropping calls. ALB stickiness, draining timeouts tuned for WebSockets, Redis-backed session state, and a clean cutover."
canonical: https://callsphere.ai/blog/vw6h-blue-green-voice-agent-websocket-sticky-sessions-2026
category: "AI Infrastructure"
tags: ["Blue Green", "WebSocket", "Voice AI", "Load Balancer", "Tutorial"]
author: "CallSphere Team"
published: 2026-04-13T00:00:00.000Z
updated: 2026-05-07T16:46:16.988Z
---

# Blue/Green Voice Agent Deploys with WebSocket Sticky Sessions (2026)

> Blue/green deploy an AI voice agent without dropping calls. ALB stickiness, draining timeouts tuned for WebSockets, Redis-backed session state, and a clean cutover.

> **TL;DR** — Blue/green for voice means: stand up green, drain blue with sticky sessions intact, cut over new calls only, keep state in Redis so reconnects work. Stickiness duration on the LB must match your call SLO, not 12 hours.

## What you'll set up

Two parallel ReplicaSets (`voice-agent-blue` and `voice-agent-green`) behind an Application Load Balancer with target-group stickiness, session state in Redis so a reconnecting call lands on the same color, and a cutover script that gates on "all blue calls drained".

## Architecture

```mermaid
flowchart TD
  CLIENT[Caller WS] --> ALB[ALB stickiness=5m]
  ALB --> BLUE[voice-agent-blue]
  ALB --> GREEN[voice-agent-green]
  BLUE --> REDIS[(Redis call state)]
  GREEN --> REDIS
  CUT[Cutover] -->|weight 0/100| ALB
  DRAIN[Drain blue] --> BLUE
```

## Step 1 — Two target groups, one ALB

```hcl
resource "aws_lb_target_group" "blue" {
  name = "voice-blue"
  port = 8080
  protocol = "HTTP"
  protocol_version = "HTTP1"   # WebSocket
  stickiness {
    type = "lb_cookie"
    cookie_duration = 300       # 5 minutes — matches our max call length
    enabled = true
  }
  deregistration_delay = 600    # let active calls finish (10 min)
  health_check { path = "/healthz/realtime" interval = 10 timeout = 3 }
}
resource "aws_lb_target_group" "green" { ... identical with name = "voice-green" }
```

`cookie_duration = 300` (5 min) is the critical knob. Default 12 hours means a customer who reconnects 11 hours later still hits the old color — keeps blue alive forever.

## Step 2 — Listener with weighted routing

```hcl
resource "aws_lb_listener_rule" "voice" {
  listener_arn = aws_lb_listener.https.arn
  action {
    type = "forward"
    forward {
      target_group { arn = aws_lb_target_group.blue.arn weight = 100 }
      target_group { arn = aws_lb_target_group.green.arn weight = 0 }
      stickiness { duration = 300 enabled = true }
    }
  }
  condition { path_pattern { values = ["/realtime/*"] } }
}
```

## Step 3 — Move call state to Redis

The voice agent must not keep call context in process memory. Otherwise a green pod can't resume a blue pod's call. Move to Redis:

```python
import redis.asyncio as redis
r = redis.from_url("redis://voice-redis:6379")

async def get_call(call_id): return json.loads(await r.get(f"call:{call_id}") or "{}")
async def set_call(call_id, state): await r.setex(f"call:{call_id}", 1800, json.dumps(state))
```

Now any pod (blue or green) can pick up state. Sticky sessions stay because of the cookie, but reconnects are safe.

## Step 4 — Deploy green alongside blue

```bash
kubectl apply -f voice-agent-green.yaml
kubectl rollout status deploy/voice-agent-green --timeout=120s
```

Green is live and registered in its target group, but `weight = 0` means no traffic.

## Step 5 — Cutover with a smoke test

```bash

# Promote green to 100, blue to 0

aws elbv2 modify-rule --rule-arn $RULE \
  --actions '[{"Type":"forward","ForwardConfig":{"TargetGroups":[
    {"TargetGroupArn":"${BLUE_TG}","Weight":0},
    {"TargetGroupArn":"${GREEN_TG}","Weight":100}],
    "TargetGroupStickinessConfig":{"Enabled":true,"DurationSeconds":300}}}]'

# New calls go green; existing sticky cookies still finish on blue

```

Then poll until blue drains:

```bash
while [ "$(aws cloudwatch get-metric-data ... blue active_calls)" != "0" ]; do sleep 30; done
```

## Step 6 — Decommission blue

```bash
kubectl delete deploy voice-agent-blue
```

After confirming blue active_calls == 0 for at least 5 min (a stragglers buffer), remove. ALB target deregistration_delay (10 min) handles late connection drains.

## Step 7 — Auto-rollback if green smoke fails

```bash

# Quick smoke

python smoke/realtime_ping.py --url wss://agent.example.com/realtime --color green || \
  aws elbv2 modify-rule ... --weights blue=100,green=0 && exit 1
```

## Pitfalls

- **Stickiness cookie domain mismatch** — set cookie domain explicitly on the listener; otherwise multi-subdomain setups lose stickiness across reconnects.
- **deregistration_delay too short** — 60s defaults will kill in-flight calls. Set ≥ your p99 call length.
- **TLS termination at ALB** with mTLS to upstream needs the right `ssl_policy`. Default ELBSecurityPolicy-2016-08 lacks TLS 1.3.
- **Connection multiplexing** — HTTP/2 means many calls share a connection. Stickiness on connection != stickiness on call. Test with a real client.
- **Forgetting Redis backups** — call state in Redis is great for blue/green; lose Redis and lose every active call. Run with replicas + AOF.

## How CallSphere does this in production

CallSphere does blue/green for major voice-agent versions and canary (Argo Rollouts) for prompt changes. Stickiness is 300s; call state lives in Redis Sentinel; deregistration delay 10 min. Across our k3s + Cloudflare Tunnel stack we cut over ~12 times a month with zero call drops on the blue/green path. 37 agents, 90+ tools, 115+ DB tables, $149/$499/$1499, 14-day [trial](/trial), 22% [affiliate](/affiliate).

## FAQ

**Q: Blue/green vs canary for voice?**
Blue/green for *infrastructure* swaps (LB config, runtime version). Canary for *agent* changes (prompts, models). Use both.

**Q: Why not WebRTC instead of WebSocket signaling?**
WebRTC media is point-to-point and doesn't go through the ALB. Only the signaling WebSocket needs sticky handling.

**Q: Redis instead of in-memory state — what's the latency cost?**
~1-2 ms per turn. Negligible vs voice round-trip.

**Q: Can I do this with k8s Services only, no ALB?**
Yes — use `sessionAffinity: ClientIP` and two Services with weighted Ingress. Less elegant than ALB target weights, but works.

## Sources

- [Fine-tuning blue/green deployments on Application Load Balancer — AWS](https://aws.amazon.com/blogs/devops/blue-green-deployments-with-application-load-balancer/)
- [Top 10 Blue/green Deployment Best Practices — Octopus Deploy](https://octopus.com/devops/software-deployments/blue-green-deployment-best-practices/)
- [WebApp Blue/Green Deployment Without Breaking Sessions — DZone](https://dzone.com/articles/webapp-bluegreen-deployment)
- [OpenAI Realtime API Production Voice Agents 2026 — ForaSoft](https://www.forasoft.com/blog/article/openai-realtime-api-voice-agent-production-guide-2026)
- [Switching AI voice agent from WebSocket to WebRTC — DEV](https://dev.to/aws-builders/switching-my-ai-voice-agent-from-websocket-to-webrtc-what-broke-and-what-i-learned-3dkn)

---

Source: https://callsphere.ai/blog/vw6h-blue-green-voice-agent-websocket-sticky-sessions-2026
