Skip to content
AI Infrastructure
AI Infrastructure11 min read0 views

ArgoCD GitOps for AI Agent Rollouts: Sync Waves + Progressive Sync (2026)

Order schema migrations, model warm-up, and traffic cutover for an AI voice agent using ArgoCD sync waves and ApplicationSet progressive syncs. Real YAML and gotchas.

TL;DR — Sync waves order resources within an Application; progressive syncs order Applications within an ApplicationSet. AI agents need both: DB migration first, model preload second, traffic cutover last.

What you'll set up

An ArgoCD ApplicationSet that fans out the same voice-agent chart to three k3s clusters (dev → staging → prod) with progressive sync, plus a single Application that uses sync waves to ensure Postgres migrations run before the agent Deployment, and the Deployment becomes Ready before a HealthCheck Job hits the model.

Architecture

flowchart TD
  GIT[deploy repo] --> APPSET[ApplicationSet progressive]
  APPSET --> DEV[App dev]
  APPSET --> STG[App staging]
  APPSET --> PRD[App prod]
  PRD --> W0[Wave 0: PG migrations]
  W0 --> W1[Wave 1: Secrets + ConfigMap]
  W1 --> W2[Wave 2: Deployment]
  W2 --> W3[Wave 3: Service + Ingress]
  W3 --> W4[Wave 4: PostSync warm-up Job]

Step 1 — Install ArgoCD with progressive syncs enabled

```bash helm install argocd argo/argo-cd -n argocd --create-namespace \ --set "applicationsetcontroller.enable.progressive.syncs=true" ```

This flag is off by default; without it, the strategy: block on ApplicationSet is silently ignored.

Step 2 — Author the Application with sync waves

```yaml apiVersion: batch/v1 kind: Job metadata: name: pg-migrate annotations: argocd.argoproj.io/sync-wave: "0" argocd.argoproj.io/hook: PreSync argocd.argoproj.io/hook-delete-policy: BeforeHookCreation spec: template: spec: restartPolicy: Never containers: - name: migrate image: ghcr.io/acme/voice-agent:${IMAGE_TAG} command: ["alembic", "upgrade", "head"]


apiVersion: v1 kind: Secret metadata: name: voice-agent-env annotations: { argocd.argoproj.io/sync-wave: "1" }

apiVersion: apps/v1 kind: Deployment metadata: name: voice-agent annotations: { argocd.argoproj.io/sync-wave: "2" }

apiVersion: v1 kind: Service metadata: name: voice-agent annotations: { argocd.argoproj.io/sync-wave: "3" }

apiVersion: batch/v1 kind: Job metadata: name: warmup annotations: argocd.argoproj.io/sync-wave: "4" argocd.argoproj.io/hook: PostSync spec: template: spec: restartPolicy: Never containers: - name: warm image: ghcr.io/acme/voice-agent:${IMAGE_TAG} command: ["python", "scripts/warmup.py"] ```

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

ArgoCD waits for every resource in wave N to be Healthy before starting wave N+1 — so the Deployment never starts until migration succeeds, and Service traffic only lights up after the Deployment is Ready.

Step 3 — ApplicationSet with progressive sync across clusters

```yaml apiVersion: argoproj.io/v1alpha1 kind: ApplicationSet metadata: name: voice-agent namespace: argocd spec: generators: - list: elements: - cluster: dev - cluster: staging - cluster: prod strategy: type: RollingSync rollingSync: steps: - matchExpressions: [{ key: cluster, operator: In, values: [dev] }] - matchExpressions: [{ key: cluster, operator: In, values: [staging] }] maxUpdate: "100%" - matchExpressions: [{ key: cluster, operator: In, values: [prod] }] maxUpdate: "20%" template: metadata: { name: 'voice-{{cluster}}' } spec: project: default source: repoURL: https://gitlab.com/acme/deploy.git targetRevision: main path: voice helm: valueFiles: ['values.yaml', 'values-{{cluster}}.yaml'] destination: server: https://kubernetes.default.svc namespace: voice syncPolicy: automated: { prune: true, selfHeal: true } ```

maxUpdate: 20% on prod means a 5-cluster fleet rolls one cluster at a time and waits for each to go Healthy — bad model versions get caught on cluster #1, not #5.

Step 4 — Health checks that actually probe the agent

ArgoCD's default health for a Deployment is "Ready replicas == desired". For an AI agent, that's not enough — the model could 500 on every request. Add a Lua health check:

```yaml

argocd-cm ConfigMap

data: resource.customizations.health.apps_Deployment: | hs = {} if obj.metadata.annotations["voice.agent/health"] == "ok" then hs.status = "Healthy"; hs.message = "Voice probe ok" else hs.status = "Progressing"; hs.message = "Awaiting voice probe" end return hs ```

Have a separate cron Job that pings /realtime and PATCHes the annotation. ArgoCD now refuses to roll waves forward until a real WebRTC handshake works.

Step 5 — Notifications on rollout boundaries

```yaml

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

argocd-notifications-cm

data: trigger.on-degraded: | - when: app.status.health.status == 'Degraded' send: [slack-degraded] template.slack-degraded: | message: | :rotating_light: {{.app.metadata.name}} degraded on {{.app.spec.destination.name}} Sync: {{.app.status.sync.revision}} ```

Step 6 — Rollback drill

```bash argocd app rollback voice-prod --revision ```

Or in pure GitOps style, git revert the deploy repo; ArgoCD picks it up in <1m.

Pitfalls

  • PreSync hook + sync-wave 0 is redundant — pick one. Mixing them sometimes causes the migration to run twice.
  • automated: selfHeal: true in prod can re-create resources you intentionally deleted. Disable on prod ApplicationSets, leave on dev.
  • Wave numbers must be unique per resource — a tie defaults to alphabetical-by-kind order; race conditions ensue.
  • Progressive sync needs the feature flag — without it, the strategy block is parsed but ignored. Verify with kubectl logs -n argocd deploy/argocd-applicationset-controller | grep progressive.
  • Custom health Lua reload requires controller pod restart on older ArgoCD versions; test before relying on it.

How CallSphere does this in production

CallSphere runs ArgoCD on each tenant's k3s for the behavioral-health and healthcare HIPAA SKUs. We use sync waves to migrate 115+ Postgres tables before any of the 37 voice agents come up, and progressive sync to cut over from one model version to the next 20% at a time across our edge fleet. The Lua health check pings the actual OpenAI Realtime endpoint per agent, so a partially broken release fails the wave instead of silently degrading. $149/$499/$1499, 14-day trial, 22% affiliate, demo.

FAQ

Q: Sync waves vs Argo Rollouts? Sync waves order resources within an Application; Argo Rollouts orders traffic within a Deployment (canary/blue-green). Use both.

Q: How do I gate prod on staging soak time? Add an argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true to staging and a manual sync window on prod, or use a CronJob that PATCHes syncPolicy: { automated: null } after hours.

Q: Can ApplicationSets target external clusters? Yes — register them with argocd cluster add. Each gets a Secret in argocd ns; the controller loads it as a destination.

Q: What about secrets? Pair ArgoCD with External Secrets Operator or Sealed Secrets — never commit plain Secrets, even with sync waves.

Sources

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.