TL;DR — A solid voice-agent pipeline has five gates: lint, unit tests, an LLM eval suite, a signed container build with SLSA provenance, and a kubectl-driven progressive rollout. GitHub Actions does all five in one workflow with actions/attest-build-provenance@v2, sigstore/cosign-installer, and a self-hosted ARC runner.

What you'll set up

A GitHub Actions workflow (.github/workflows/voice-agent.yml) that runs on every PR and on main: it lints, unit-tests, runs an OpenAI-Evals based regression suite, builds a multi-arch image, signs it with cosign keyless OIDC, attests SLSA build provenance, and deploys to a k3s cluster via Cloudflare Tunnel.

Architecture

flowchart LR
  PR[PR push] --> LINT[lint + ruff + mypy]
  LINT --> UNIT[pytest unit]
  UNIT --> EVAL[LLM eval gate]
  EVAL --> BUILD[buildx multi-arch]
  BUILD --> SIGN[cosign keyless]
  SIGN --> PROV[SLSA provenance]
  PROV --> PUSH[ghcr.io push]
  PUSH --> ROLL[kubectl rollout]
  ROLL --> K3S[k3s edge cluster]

Step 1 — Define the matrix and concurrency guards

```yaml name: voice-agent on: pull_request: push: { branches: [main] } concurrency: group: ${{ github.workflow }}-${{ github.ref }} cancel-in-progress: true permissions: contents: read id-token: write # OIDC for cosign keyless packages: write # ghcr push attestations: write jobs: test: runs-on: ubuntu-24.04 strategy: matrix: { python: ["3.11", "3.12"] } ```

The id-token: write is non-negotiable for keyless cosign signing — without it the OIDC token issuer rejects the request.

Step 2 — Lint, type-check, and run an LLM eval gate

```yaml steps: - uses: actions/checkout@v4 - uses: astral-sh/setup-uv@v4 with: { python-version: ${{ matrix.python }} } - run: uv sync --frozen - run: uv run ruff check . - run: uv run mypy src/ - run: uv run pytest -q tests/unit - name: LLM eval regression env: OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} run: | uv run python evals/run.py --suite voice-regression \ --threshold 0.92 --max-cost-usd 1.50 ```

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

The eval gate runs ~30 prompts against a frozen test set of (input, expected_intent, expected_tool_calls) and fails the build if pass-rate drops below 92%. We cap spend at $1.50 per CI run via the --max-cost-usd flag in our eval harness — past that we abort.

Step 3 — Build, sign, and attest the image

```yaml build: needs: test runs-on: ubuntu-24.04 outputs: { digest: ${{ steps.push.outputs.digest }} } steps: - uses: actions/checkout@v4 - uses: docker/setup-buildx-action@v3 - uses: docker/login-action@v3 with: registry: ghcr.io username: ${{ github.actor }} password: ${{ secrets.GITHUB_TOKEN }} - id: push uses: docker/build-push-action@v6 with: push: true platforms: linux/amd64,linux/arm64 tags: ghcr.io/${{ github.repository }}:${{ github.sha }} cache-from: type=gha cache-to: type=gha,mode=max - uses: sigstore/cosign-installer@v3 - run: cosign sign --yes ghcr.io/${{ github.repository }}@${{ steps.push.outputs.digest }} - uses: actions/attest-build-provenance@v2 with: subject-name: ghcr.io/${{ github.repository }} subject-digest: ${{ steps.push.outputs.digest }} push-to-registry: true ```

actions/attest-build-provenance@v2 writes the SLSA v1.0 provenance to the Sigstore transparency log and pushes it as a referrer in OCI 1.1 — anyone can verify with cosign verify-attestation --type slsaprovenance ....

Step 4 — Deploy to k3s via Cloudflare Tunnel

```yaml deploy: needs: build if: github.ref == 'refs/heads/main' runs-on: ubuntu-24.04 steps: - uses: cloudflare/cloudflared-action@v1 with: tunnel-token: ${{ secrets.CF_TUNNEL_TOKEN }} - run: | echo "${{ secrets.K3S_KUBECONFIG_B64 }}" | base64 -d > kubeconfig export KUBECONFIG=$PWD/kubeconfig kubectl set image deploy/voice-agent \ agent=ghcr.io/${{ github.repository }}@${{ needs.build.outputs.digest }} -n voice kubectl rollout status deploy/voice-agent -n voice --timeout=180s ```

Pinning by digest (not tag) means the image cosign signed is exactly the one running. kubectl rollout status blocks until the new ReplicaSet is healthy or fails, so the workflow turns red on bad rollouts.

Step 5 — Add a smoke test that hits the live agent

```yaml - name: Voice smoke test run: | uv run python smoke/realtime_ping.py \ --url wss://agent.example.com/realtime \ --prompt "What's your name?" \ --expect-keyword "voice agent" ```

The smoke test opens a real WebRTC session (we use aiortc), speaks a synthesized prompt, and asserts the agent's STT contains an expected keyword. Catches DNS/cert/Realtime-key rotation issues that unit tests can't.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Step 6 — Branch protection rules

In repo settings: require the test, build, and deploy checks; require code-owner review on evals/; require signed commits. Combined with the cosign attestation, main becomes provably "this image came from this commit reviewed by these humans".

Step 7 — Self-hosted runners for speed

For voice work, GitHub-hosted runners hit egress limits and have ~3-min cold start on Buildx. We run actions/actions-runner-controller (ARC) on the same k3s cluster — runners spin up in ~5s, share the buildx cache, and never hit GitHub's bandwidth meter.

Pitfalls

OIDC token without id-token: write — cosign will fail with a cryptic "no token found" error. Set the permission per-job, not just at workflow level.
Concurrency cancel-in-progress: true kills running deploys. Use a separate deploy concurrency group that doesn't cancel.
Eval cost runaway — never let an LLM eval suite call the API in a loop without a hard --max-cost-usd cap; one bad prompt template can rack up $50 in 5 minutes.
Cache poisoning on PRs from forks — cache-to: type=gha,mode=max will not write from fork PRs (good), but you must verify the cache was actually used on main builds, not just rebuilt blindly.
kubectl rollout status timeout — default is 0 (forever). Always set --timeout or your workflow hangs for hours.

How CallSphere does this in production

CallSphere ships 37 voice agents across 6 verticals through a single GitHub Actions monorepo workflow. We run an OpenAI-Evals based gate over a frozen 200-prompt test suite per vertical (healthcare, salon, behavioral health, multi-family, contractors, dental). Images are signed with cosign keyless and pushed to GHCR, then deployed to a k3s edge cluster fronted by Cloudflare Tunnel — no public ingress on the Postgres at 72.62.162.83. 90+ tools and 115+ DB tables are migrated via a separate db-migrate job that runs before deploy so schema drift never reaches production. Pricing is $149 / $499 / $1499 with a 14-day trial, 22% lifetime affiliate — try a demo or read the healthcare build.

FAQ

Q: Should the eval suite block PRs or only main? Block PRs. A bad prompt change shouldn't sit in main for 30 minutes before someone notices in Datadog.

Q: Why cosign keyless instead of a KMS key? Keyless ties signatures to the GitHub OIDC identity (workflow + repo + ref). Rotating a KMS key is painful; rotating an OIDC identity is automatic.

Q: How do I keep secrets out of logs? Use secrets.OPENAI_API_KEY (auto-masked), and set ACTIONS_STEP_DEBUG only on private repos. Never echo a secret.

Q: ARC runners or GitHub-hosted? ARC for monorepos with frequent buildx work; GitHub-hosted for everything else. The crossover point is around 50 builds/day.

GitHub Actions Pipeline for AI Voice Agents: Build, Sign, Deploy (2026)

What you'll set up

Architecture

Step 1 — Define the matrix and concurrency guards

Step 2 — Lint, type-check, and run an LLM eval gate

Step 3 — Build, sign, and attest the image

Step 4 — Deploy to k3s via Cloudflare Tunnel

Step 5 — Add a smoke test that hits the live agent

Step 6 — Branch protection rules

Step 7 — Self-hosted runners for speed

Pitfalls

How CallSphere does this in production

FAQ

Sources

Try CallSphere AI Voice Agents

Related Articles You May Like

Latency vs Cost: A Decision Matrix for Voice AI Spend in 2026

Build a Voice Agent on Cloudflare Workers AI (No External LLM)

Build a Chat Agent with Haystack RAG + Open LLM (Llama 3.2, 2026)

The Agent Evaluation Stack in 2026: From Trace to Eval Score

OpenAI's May 2026 WebRTC Rearchitecture: How Voice Latency Got Real

How to Build Voice Agent CI/CD with Evals as Gate (GitHub Actions)