Skip to content
AI Engineering
AI Engineering10 min read0 views

Voice Agent Error Budgets: Defining Acceptable Failure

An error budget is the unreliability you allow yourself in exchange for shipping. For voice agents, the budget is dollars and minutes — not just nines. Here's how CallSphere computes one.

TL;DR — Stop using 99.9% availability as your only error budget. Add a "model-regression budget," a "cost burn budget," and a "user-perceived-latency budget." Burn any of them and the next deploy is blocked.

What goes wrong

flowchart TD
  Client[Client] --> Edge[Cloudflare Worker]
  Edge -->|WS upgrade| DO[Durable Object]
  DO --> AI[(OpenAI Realtime WS)]
  AI --> DO
  DO --> Client
  DO -.hibernation.-> Storage[(Persisted state)]
CallSphere reference architecture

The classic Google SRE error budget — 1 minus your SLO target — was designed for stateless services where failure is binary. A voice agent fails in shades of gray. The call connected but the agent stalled for 4 seconds. The agent answered correctly but quoted last month's price. Token cost blew past forecast. None of these violate "availability" but all of them are expensive.

If you only track availability you'll burn through your real budget without any alarm firing. Then you'll ship a model swap that pushes accuracy from 95% to 91% and not notice for two weeks.

How to monitor

Run multiple parallel error budgets, each with its own burn-rate alert:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
  1. Availability budget — 100% minus your audio-uptime SLO. Standard.
  2. Conversational success budget — 100% minus your conv-success SLO. Burns when too many calls fail to complete.
  3. Latency budget — fraction of turns above the FTL threshold. Burns when speed degrades.
  4. Quality budget — fraction of turns where intent accuracy fell below threshold (sampled + LLM-as-judge). Burns on prompt or model regressions.
  5. Cost budget — dollars spent on tokens above a forecast band. Burns on token-burn outliers.

The decision rule: if any budget is < 25% remaining for the rolling 7-day window, deploys requiring that budget are blocked. Platform deploys block on (1)–(3); prompt and model deploys block on (4); finance-sensitive changes block on (5). Use multi-window multi-burn-rate alerts (1h fast burn + 6h slow burn).

CallSphere stack

CallSphere computes five budgets per vertical, daily, in Postgres. Each is a row in error_budgets with target, current burn, and remaining. A k3s admission webhook reads the table at deploy time and refuses pods if the relevant budget is exhausted.

  • Healthcare FastAPI :8084 — quality budget is the tight constraint; we've burned it twice in 12 months and both times caught a prompt regression early.
  • Real Estate — latency budget is tight because of the 6-container NATS pod's tail latency.
  • Sales WebSocket / PM2 — availability is the tight one (eight workers, restart storms).
  • After-hours Bull/Redis — cost budget binds because long-running batch calls are token-heavy.

We expose remaining budget on /admin/sre and via API. Customers on $499 and above get a per-tenant budget view. Try it on the 14-day trial.

Implementation

  1. Compute budgets daily.
INSERT INTO error_budgets (vertical, name, target, burn, remaining, computed_at)
SELECT
  'healthcare',
  'conv_success',
  0.95,
  1 - SUM(CASE WHEN ok THEN 1 ELSE 0 END)::float / COUNT(*),
  ((SUM(CASE WHEN ok THEN 1 ELSE 0 END)::float / COUNT(*)) - 0.95) / (1 - 0.95),
  NOW()
FROM calls
WHERE created_at > NOW() - INTERVAL '7 days';
  1. Multi-burn-rate alerts.
- alert: ConvSuccessBudgetBurn
  expr: (1h_burn > 14.4 and 5m_burn > 14.4) or (6h_burn > 6 and 30m_burn > 6)
  labels: { severity: page, alert_type: model }
  1. Block deploys via OPA. Admission webhook checks remaining budget on the relevant SLO before accepting pod manifests.

  2. Show the team. A Grafana panel per budget on the SRE dashboard. Engineers will only respect what they see.

  3. Forgive intentionally. A planned drill or migration consumes budget on purpose — log a "planned burn" event so the post-mortem doesn't blame anyone.

    Still reading? Stop comparing — try CallSphere live.

    CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

FAQ

Q: How tight should the cost budget be? A: ±15% of a 14-day rolling baseline is a sensible default. Tighter than that fires too often; looser misses spikes.

Q: What if I exhaust the budget mid-week? A: Stop shipping risky changes. Use the rest of the week for reliability work. That's the entire point.

Q: Should error budgets affect compensation? A: Indirectly — through the team's deploy velocity. Don't tie individual bonuses to a budget; you'll get gaming.

Q: How do I forecast? A: Time-series forecasting on the burn rate. Even a simple Holt-Winters from Postgres + cron beats not forecasting.

Q: Can I auto-escalate when a budget is < 10%? A: Yes — page the engineering manager. We do this at SEV2 with a Slack channel auto-created.

Sources

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like