---
title: "Canary AI Agent Versions with Argo Rollouts + Metric AI Plugin (2026)"
description: "Run a 5% → 25% → 50% → 100% canary on a voice agent with Argo Rollouts, AnalysisTemplate against eval pass-rate, and the new Metric AI plugin from ArgoCon 2026."
canonical: https://callsphere.ai/blog/vw6h-argo-rollouts-canary-ai-agent-metric-ai-plugin-2026
category: "AI Engineering"
tags: ["Argo Rollouts", "Canary", "Voice AI", "Progressive Delivery", "Tutorial"]
author: "CallSphere Team"
published: 2026-04-07T00:00:00.000Z
updated: 2026-05-07T16:46:16.236Z
---

# Canary AI Agent Versions with Argo Rollouts + Metric AI Plugin (2026)

> Run a 5% → 25% → 50% → 100% canary on a voice agent with Argo Rollouts, AnalysisTemplate against eval pass-rate, and the new Metric AI plugin from ArgoCon 2026.

> **TL;DR** — Argo Rollouts replaces a Deployment with a Rollout CRD. AnalysisTemplates measure live metrics (eval pass-rate, p95 first-token latency, error budget burn) at each step. The Metric AI plugin from ArgoCon 2026 lets the rollout controller reason about *why* a metric moved, not just whether it crossed a threshold.

## What you'll set up

An Argo Rollouts Rollout for the voice agent: 5% → 25% → 50% → 100% with AnalysisTemplates that hit our internal eval harness and Prometheus, plus the Metric AI plugin to root-cause regressions and auto-rollback.

## Architecture

```mermaid
flowchart LR
  GIT[deploy repo] --> ROLL[Rollout CRD]
  ROLL -->|5%| C1[canary v2]
  C1 --> ANA[AnalysisTemplate]
  ANA -->|prom + evals| METRIC[Metric AI plugin]
  METRIC --> DECIDE{Pass?}
  DECIDE -->|yes| C2[25% canary]
  DECIDE -->|no| RB[Auto rollback]
  C2 --> C3[50%]
  C3 --> FULL[100%]
```

## Step 1 — Install Argo Rollouts + the Metric AI plugin

```bash
kubectl create namespace argo-rollouts
kubectl apply -n argo-rollouts -f [https://github.com/argoproj/argo-rollouts/releases/latest/download/install.yaml](https://github.com/argoproj/argo-rollouts/releases/latest/download/install.yaml)
kubectl apply -f [https://github.com/argoproj-labs/rollouts-metricai-plugin/releases/latest/download/install.yaml](https://github.com/argoproj-labs/rollouts-metricai-plugin/releases/latest/download/install.yaml)
```

The Metric AI plugin runs as a sidecar to the rollouts controller; it can run `kubectl logs`, query Prometheus, and ask Claude to assess "is this canary healthy".

## Step 2 — Replace Deployment with Rollout

```yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata: { name: voice-agent }
spec:
  replicas: 10
  selector:
    matchLabels: { app: voice-agent }
  template:
    metadata: { labels: { app: voice-agent }}
    spec:
      containers:
        - name: agent
          image: ghcr.io/acme/voice-agent:v1
  strategy:
    canary:
      canaryService: voice-agent-canary
      stableService: voice-agent-stable
      trafficRouting:
        istio:
          virtualService: { name: voice-vs, routes: [primary] }
      steps:
        - setWeight: 5
        - pause: { duration: 5m }
        - analysis:
            templates: [{ templateName: voice-canary }]
        - setWeight: 25
        - pause: { duration: 10m }
        - analysis:
            templates: [{ templateName: voice-canary }]
        - setWeight: 50
        - pause: { duration: 15m }
        - setWeight: 100
```

## Step 3 — AnalysisTemplate: eval pass-rate + p95 latency

```yaml
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata: { name: voice-canary }
spec:
  metrics:
    - name: eval-pass-rate
      successCondition: result[0] >= 0.92
      failureLimit: 0
      provider:
        web:
          url: [https://evals.internal/api/run?suite=voice&version={{args.canary-hash}}](https://evals.internal/api/run?suite=voice&version=%7B%7Bargs.canary-hash%7D%7D)
          jsonPath: "{$.passRate}"
    - name: p95-first-token
      successCondition: result[0] > query window.
- **canary at `setWeight: 1`** can have zero pods on a 10-replica fleet (1% of 10 rounds to 0). Set `canary.dynamicStableScale: true` or use `maxSurge: 1`.
- **Istio VirtualService routes named wrong** in trafficRouting → silent no-op. Verify with `istioctl proxy-config routes`.
- **AI plugin cost runaway** — the AI judge calls Claude every analysis tick. Cap with `maxTokensPerAnalysis: 2000` plugin config.
- **`failureLimit: 0`** means one bad data point aborts. Sometimes you want `consecutiveErrorLimit: 3` for noisy metrics.

## How CallSphere does this in production

CallSphere canary-rolls every voice-agent change behind Argo Rollouts: 5% → 25% → 50% → 100% with eval pass-rate ≥0.92 and first-token p95 ≤ 800 ms gates. The Metric AI plugin caught a regression in our healthcare agent two weeks ago where a prompt edit caused +30% tool-call rate without changing latency — pure threshold gates would have shipped it. 37 agents, 90+ tools, 115+ DB tables, $149/$499/$1499, 14-day [trial](/trial), 22% [affiliate](/affiliate), [demo](/demo).

## FAQ

**Q: Argo Rollouts vs Flagger?**
Argo if you're already on ArgoCD. Flagger if you want zero manifest changes (uses standard Deployments). Both feature-parity for canary.

**Q: How do I avoid alert fatigue on noisy evals?**
Run the eval suite on a fixed seed, embed it as a frozen test file in the agent repo, version it, and gate canary on regression vs stable — not absolute pass-rate.

**Q: What about WebRTC sticky sessions during canary?**
Use `sessionAffinity: ClientIP` on the stable Service; canary picks up new sessions only. In-flight calls finish on stable.

**Q: Cost of the AI judge?**
~$0.01 per analysis tick with Claude Haiku. Cheap insurance.

## Sources

- [Canary Argo Rollouts docs](https://argo-rollouts.readthedocs.io/en/stable/features/canary/)
- [ArgoCon Europe 2026: Argo Rollouts AI integration — Carlos Sanchez & Kevin Dubois](https://tldrecap.tech/posts/2026/argocon-europe/argo-rollouts-ai-integration/)
- [Progressive Delivery: Canary Deployments with Argo Rollouts and Flagger — Calmops](https://calmops.com/architecture/progressive-delivery-canary-argo-rollouts-flagger/)
- [Canary deployment strategy with Argo Rollouts — Red Hat Developer](https://developers.redhat.com/articles/2024/05/01/canary-deployment-strategy-argo-rollouts)
- [A/B Testing and Canary Deployments for Models — APXML](https://apxml.com/courses/advanced-ai-infrastructure-design-optimization/chapter-4-high-performance-model-inference/ab-testing-canary-deployments-models)

---

Source: https://callsphere.ai/blog/vw6h-argo-rollouts-canary-ai-agent-metric-ai-plugin-2026