Blue/Green Voice Agent Deploys with WebSocket Sticky Sessions (2026)
Blue/green deploy an AI voice agent without dropping calls. ALB stickiness, draining timeouts tuned for WebSockets, Redis-backed session state, and a clean cutover.
TL;DR — Blue/green for voice means: stand up green, drain blue with sticky sessions intact, cut over new calls only, keep state in Redis so reconnects work. Stickiness duration on the LB must match your call SLO, not 12 hours.
What you'll set up
Two parallel ReplicaSets (voice-agent-blue and voice-agent-green) behind an Application Load Balancer with target-group stickiness, session state in Redis so a reconnecting call lands on the same color, and a cutover script that gates on "all blue calls drained".
Architecture
flowchart TD
CLIENT[Caller WS] --> ALB[ALB stickiness=5m]
ALB --> BLUE[voice-agent-blue]
ALB --> GREEN[voice-agent-green]
BLUE --> REDIS[(Redis call state)]
GREEN --> REDIS
CUT[Cutover] -->|weight 0/100| ALB
DRAIN[Drain blue] --> BLUE
Step 1 — Two target groups, one ALB
```hcl resource "aws_lb_target_group" "blue" { name = "voice-blue" port = 8080 protocol = "HTTP" protocol_version = "HTTP1" # WebSocket stickiness { type = "lb_cookie" cookie_duration = 300 # 5 minutes — matches our max call length enabled = true } deregistration_delay = 600 # let active calls finish (10 min) health_check { path = "/healthz/realtime" interval = 10 timeout = 3 } } resource "aws_lb_target_group" "green" { ... identical with name = "voice-green" } ```
cookie_duration = 300 (5 min) is the critical knob. Default 12 hours means a customer who reconnects 11 hours later still hits the old color — keeps blue alive forever.
Step 2 — Listener with weighted routing
```hcl resource "aws_lb_listener_rule" "voice" { listener_arn = aws_lb_listener.https.arn action { type = "forward" forward { target_group { arn = aws_lb_target_group.blue.arn weight = 100 } target_group { arn = aws_lb_target_group.green.arn weight = 0 } stickiness { duration = 300 enabled = true } } } condition { path_pattern { values = ["/realtime/*"] } } } ```
Step 3 — Move call state to Redis
The voice agent must not keep call context in process memory. Otherwise a green pod can't resume a blue pod's call. Move to Redis:
```python import redis.asyncio as redis r = redis.from_url("redis://voice-redis:6379")
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
async def get_call(call_id): return json.loads(await r.get(f"call:{call_id}") or "{}") async def set_call(call_id, state): await r.setex(f"call:{call_id}", 1800, json.dumps(state)) ```
Now any pod (blue or green) can pick up state. Sticky sessions stay because of the cookie, but reconnects are safe.
Step 4 — Deploy green alongside blue
```bash kubectl apply -f voice-agent-green.yaml kubectl rollout status deploy/voice-agent-green --timeout=120s ```
Green is live and registered in its target group, but weight = 0 means no traffic.
Step 5 — Cutover with a smoke test
```bash
Promote green to 100, blue to 0
aws elbv2 modify-rule --rule-arn $RULE \ --actions '[{"Type":"forward","ForwardConfig":{"TargetGroups":[ {"TargetGroupArn":"${BLUE_TG}","Weight":0}, {"TargetGroupArn":"${GREEN_TG}","Weight":100}], "TargetGroupStickinessConfig":{"Enabled":true,"DurationSeconds":300}}}]'
New calls go green; existing sticky cookies still finish on blue
```
Then poll until blue drains:
```bash while [ "$(aws cloudwatch get-metric-data ... blue active_calls)" != "0" ]; do sleep 30; done ```
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Step 6 — Decommission blue
```bash kubectl delete deploy voice-agent-blue ```
After confirming blue active_calls == 0 for at least 5 min (a stragglers buffer), remove. ALB target deregistration_delay (10 min) handles late connection drains.
Step 7 — Auto-rollback if green smoke fails
```bash
Quick smoke
python smoke/realtime_ping.py --url wss://agent.example.com/realtime --color green || \ aws elbv2 modify-rule ... --weights blue=100,green=0 && exit 1 ```
Pitfalls
- Stickiness cookie domain mismatch — set cookie domain explicitly on the listener; otherwise multi-subdomain setups lose stickiness across reconnects.
- deregistration_delay too short — 60s defaults will kill in-flight calls. Set ≥ your p99 call length.
- TLS termination at ALB with mTLS to upstream needs the right
ssl_policy. Default ELBSecurityPolicy-2016-08 lacks TLS 1.3. - Connection multiplexing — HTTP/2 means many calls share a connection. Stickiness on connection != stickiness on call. Test with a real client.
- Forgetting Redis backups — call state in Redis is great for blue/green; lose Redis and lose every active call. Run with replicas + AOF.
How CallSphere does this in production
CallSphere does blue/green for major voice-agent versions and canary (Argo Rollouts) for prompt changes. Stickiness is 300s; call state lives in Redis Sentinel; deregistration delay 10 min. Across our k3s + Cloudflare Tunnel stack we cut over ~12 times a month with zero call drops on the blue/green path. 37 agents, 90+ tools, 115+ DB tables, $149/$499/$1499, 14-day trial, 22% affiliate.
FAQ
Q: Blue/green vs canary for voice? Blue/green for infrastructure swaps (LB config, runtime version). Canary for agent changes (prompts, models). Use both.
Q: Why not WebRTC instead of WebSocket signaling? WebRTC media is point-to-point and doesn't go through the ALB. Only the signaling WebSocket needs sticky handling.
Q: Redis instead of in-memory state — what's the latency cost? ~1-2 ms per turn. Negligible vs voice round-trip.
Q: Can I do this with k8s Services only, no ALB?
Yes — use sessionAffinity: ClientIP and two Services with weighted Ingress. Less elegant than ALB target weights, but works.
Sources
- Fine-tuning blue/green deployments on Application Load Balancer — AWS
- Top 10 Blue/green Deployment Best Practices — Octopus Deploy
- WebApp Blue/Green Deployment Without Breaking Sessions — DZone
- OpenAI Realtime API Production Voice Agents 2026 — ForaSoft
- Switching AI voice agent from WebSocket to WebRTC — DEV
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.