By Sagar Shankaran, Founder of CallSphere
End-to-end GitHub Actions workflow for an OpenAI Realtime + LiveKit voice agent: matrix build, eval gate, cosign + SLSA provenance, and a kubectl rollout to k3s. Real YAML included.
Key takeaways
TL;DR — A solid voice-agent pipeline has five gates: lint, unit tests, an LLM eval suite, a signed container build with SLSA provenance, and a kubectl-driven progressive rollout. GitHub Actions does all five in one workflow with
actions/attest-build-provenance@v2,sigstore/cosign-installer, and a self-hosted ARC runner.
A GitHub Actions workflow (.github/workflows/voice-agent.yml) that runs on every PR and on main: it lints, unit-tests, runs an OpenAI-Evals based regression suite, builds a multi-arch image, signs it with cosign keyless OIDC, attests SLSA build provenance, and deploys to a k3s cluster via Cloudflare Tunnel.
flowchart LR
PR[PR push] --> LINT[lint + ruff + mypy]
LINT --> UNIT[pytest unit]
UNIT --> EVAL[LLM eval gate]
EVAL --> BUILD[buildx multi-arch]
BUILD --> SIGN[cosign keyless]
SIGN --> PROV[SLSA provenance]
PROV --> PUSH[ghcr.io push]
PUSH --> ROLL[kubectl rollout]
ROLL --> K3S[k3s edge cluster]
```yaml name: voice-agent on: pull_request: push: { branches: [main] } concurrency: group: ${{ github.workflow }}-${{ github.ref }} cancel-in-progress: true permissions: contents: read id-token: write # OIDC for cosign keyless packages: write # ghcr push attestations: write jobs: test: runs-on: ubuntu-24.04 strategy: matrix: { python: ["3.11", "3.12"] } ```
The id-token: write is non-negotiable for keyless cosign signing — without it the OIDC token issuer rejects the request.
```yaml steps: - uses: actions/checkout@v4 - uses: astral-sh/setup-uv@v4 with: { python-version: ${{ matrix.python }} } - run: uv sync --frozen - run: uv run ruff check . - run: uv run mypy src/ - run: uv run pytest -q tests/unit - name: LLM eval regression env: OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} run: | uv run python evals/run.py --suite voice-regression \ --threshold 0.92 --max-cost-usd 1.50 ```
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
The eval gate runs ~30 prompts against a frozen test set of (input, expected_intent, expected_tool_calls) and fails the build if pass-rate drops below 92%. We cap spend at $1.50 per CI run via the --max-cost-usd flag in our eval harness — past that we abort.
```yaml build: needs: test runs-on: ubuntu-24.04 outputs: { digest: ${{ steps.push.outputs.digest }} } steps: - uses: actions/checkout@v4 - uses: docker/setup-buildx-action@v3 - uses: docker/login-action@v3 with: registry: ghcr.io username: ${{ github.actor }} password: ${{ secrets.GITHUB_TOKEN }} - id: push uses: docker/build-push-action@v6 with: push: true platforms: linux/amd64,linux/arm64 tags: ghcr.io/${{ github.repository }}:${{ github.sha }} cache-from: type=gha cache-to: type=gha,mode=max - uses: sigstore/cosign-installer@v3 - run: cosign sign --yes ghcr.io/${{ github.repository }}@${{ steps.push.outputs.digest }} - uses: actions/attest-build-provenance@v2 with: subject-name: ghcr.io/${{ github.repository }} subject-digest: ${{ steps.push.outputs.digest }} push-to-registry: true ```
actions/attest-build-provenance@v2 writes the SLSA v1.0 provenance to the Sigstore transparency log and pushes it as a referrer in OCI 1.1 — anyone can verify with cosign verify-attestation --type slsaprovenance ....
```yaml deploy: needs: build if: github.ref == 'refs/heads/main' runs-on: ubuntu-24.04 steps: - uses: cloudflare/cloudflared-action@v1 with: tunnel-token: ${{ secrets.CF_TUNNEL_TOKEN }} - run: | echo "${{ secrets.K3S_KUBECONFIG_B64 }}" | base64 -d > kubeconfig export KUBECONFIG=$PWD/kubeconfig kubectl set image deploy/voice-agent \ agent=ghcr.io/${{ github.repository }}@${{ needs.build.outputs.digest }} -n voice kubectl rollout status deploy/voice-agent -n voice --timeout=180s ```
Pinning by digest (not tag) means the image cosign signed is exactly the one running. kubectl rollout status blocks until the new ReplicaSet is healthy or fails, so the workflow turns red on bad rollouts.
```yaml - name: Voice smoke test run: | uv run python smoke/realtime_ping.py \ --url wss://agent.example.com/realtime \ --prompt "What's your name?" \ --expect-keyword "voice agent" ```
The smoke test opens a real WebRTC session (we use aiortc), speaks a synthesized prompt, and asserts the agent's STT contains an expected keyword. Catches DNS/cert/Realtime-key rotation issues that unit tests can't.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
In repo settings: require the test, build, and deploy checks; require code-owner review on evals/; require signed commits. Combined with the cosign attestation, main becomes provably "this image came from this commit reviewed by these humans".
For voice work, GitHub-hosted runners hit egress limits and have ~3-min cold start on Buildx. We run actions/actions-runner-controller (ARC) on the same k3s cluster — runners spin up in ~5s, share the buildx cache, and never hit GitHub's bandwidth meter.
id-token: write — cosign will fail with a cryptic "no token found" error. Set the permission per-job, not just at workflow level.cancel-in-progress: true kills running deploys. Use a separate deploy concurrency group that doesn't cancel.--max-cost-usd cap; one bad prompt template can rack up $50 in 5 minutes.cache-to: type=gha,mode=max will not write from fork PRs (good), but you must verify the cache was actually used on main builds, not just rebuilt blindly.--timeout or your workflow hangs for hours.CallSphere ships 37 voice agents across 6 verticals through a single GitHub Actions monorepo workflow. We run an OpenAI-Evals based gate over a frozen 200-prompt test suite per vertical (healthcare, salon, behavioral health, multi-family, contractors, dental). Images are signed with cosign keyless and pushed to GHCR, then deployed to a k3s edge cluster fronted by Cloudflare Tunnel — no public ingress on the Postgres at 72.62.162.83. 90+ tools and 115+ DB tables are migrated via a separate db-migrate job that runs before deploy so schema drift never reaches production. Pricing is $149 / $499 / $1499 with a 14-day trial, 22% lifetime affiliate — try a demo or read the healthcare build.
Q: Should the eval suite block PRs or only main?
Block PRs. A bad prompt change shouldn't sit in main for 30 minutes before someone notices in Datadog.
Q: Why cosign keyless instead of a KMS key? Keyless ties signatures to the GitHub OIDC identity (workflow + repo + ref). Rotating a KMS key is painful; rotating an OIDC identity is automatic.
Q: How do I keep secrets out of logs?
Use secrets.OPENAI_API_KEY (auto-masked), and set ACTIONS_STEP_DEBUG only on private repos. Never echo a secret.
Q: ARC runners or GitHub-hosted? ARC for monorepos with frequent buildx work; GitHub-hosted for everything else. The crossover point is around 50 builds/day.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
A founder's guide to texto a voz (text-to-speech in Spanish): LATAM vs Castilian voices, free options, and how CallSphere ships Spanish agents.
A founder's guide to the female voice generator landscape: AI female voices, Japanese voices, robot voices, and how CallSphere ships 57+ voices live.
A founder's guide to the Siri voice generator landscape: how AI voice cloning works, what is legal, and how CallSphere uses 57+ voices in production.
A founder's guide to AI voice assistants for ecommerce: customer service, order lookup, and how CallSphere fits in versus virtual receptionists.
Robot text to speech in 2026: how I pick TTS APIs, when robotic voices help, and how CallSphere ships 57+ language voice agents. Hands-on guide.
The customer support specialist role in 2026 is half human, half AI. Here is what the job looks like, the AI tools that pair with it, and how we ship it.
© 2026 CallSphere LLC. All rights reserved.
Watch how CallSphere handles real customer calls, schedules appointments, and processes payments — live.
Try Live DemoBook a DemoCalculate Your ROI