By Sagar Shankaran, Founder of CallSphere
An error budget is the unreliability you allow yourself in exchange for shipping. For voice agents, the budget is dollars and minutes — not just nines. Here's how CallSphere computes one.
Key takeaways
TL;DR — Stop using 99.9% availability as your only error budget. Add a "model-regression budget," a "cost burn budget," and a "user-perceived-latency budget." Burn any of them and the next deploy is blocked.
flowchart TD
Client[Client] --> Edge[Cloudflare Worker]
Edge -->|WS upgrade| DO[Durable Object]
DO --> AI[(OpenAI Realtime WS)]
AI --> DO
DO --> Client
DO -.hibernation.-> Storage[(Persisted state)]The classic Google SRE error budget — 1 minus your SLO target — was designed for stateless services where failure is binary. A voice agent fails in shades of gray. The call connected but the agent stalled for 4 seconds. The agent answered correctly but quoted last month's price. Token cost blew past forecast. None of these violate "availability" but all of them are expensive.
If you only track availability you'll burn through your real budget without any alarm firing. Then you'll ship a model swap that pushes accuracy from 95% to 91% and not notice for two weeks.
Run multiple parallel error budgets, each with its own burn-rate alert:
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
The decision rule: if any budget is < 25% remaining for the rolling 7-day window, deploys requiring that budget are blocked. Platform deploys block on (1)–(3); prompt and model deploys block on (4); finance-sensitive changes block on (5). Use multi-window multi-burn-rate alerts (1h fast burn + 6h slow burn).
CallSphere computes five budgets per vertical, daily, in Postgres. Each is a row in error_budgets with target, current burn, and remaining. A k3s admission webhook reads the table at deploy time and refuses pods if the relevant budget is exhausted.
:8084 — quality budget is the tight constraint; we've burned it twice in 12 months and both times caught a prompt regression early.We expose remaining budget on /admin/sre and via API. Customers on $499 and above get a per-tenant budget view. Try it on the 14-day trial.
INSERT INTO error_budgets (vertical, name, target, burn, remaining, computed_at)
SELECT
'healthcare',
'conv_success',
0.95,
1 - SUM(CASE WHEN ok THEN 1 ELSE 0 END)::float / COUNT(*),
((SUM(CASE WHEN ok THEN 1 ELSE 0 END)::float / COUNT(*)) - 0.95) / (1 - 0.95),
NOW()
FROM calls
WHERE created_at > NOW() - INTERVAL '7 days';
- alert: ConvSuccessBudgetBurn
expr: (1h_burn > 14.4 and 5m_burn > 14.4) or (6h_burn > 6 and 30m_burn > 6)
labels: { severity: page, alert_type: model }
Block deploys via OPA. Admission webhook checks remaining budget on the relevant SLO before accepting pod manifests.
Show the team. A Grafana panel per budget on the SRE dashboard. Engineers will only respect what they see.
Forgive intentionally. A planned drill or migration consumes budget on purpose — log a "planned burn" event so the post-mortem doesn't blame anyone.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Q: How tight should the cost budget be? A: ±15% of a 14-day rolling baseline is a sensible default. Tighter than that fires too often; looser misses spikes.
Q: What if I exhaust the budget mid-week? A: Stop shipping risky changes. Use the rest of the week for reliability work. That's the entire point.
Q: Should error budgets affect compensation? A: Indirectly — through the team's deploy velocity. Don't tie individual bonuses to a budget; you'll get gaming.
Q: How do I forecast? A: Time-series forecasting on the burn rate. Even a simple Holt-Winters from Postgres + cron beats not forecasting.
Q: Can I auto-escalate when a budget is < 10%? A: Yes — page the engineering manager. We do this at SEV2 with a Slack channel auto-created.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
A founder's guide to texto a voz (text-to-speech in Spanish): LATAM vs Castilian voices, free options, and how CallSphere ships Spanish agents.
A founder's guide to the female voice generator landscape: AI female voices, Japanese voices, robot voices, and how CallSphere ships 57+ voices live.
A founder's guide to the Siri voice generator landscape: how AI voice cloning works, what is legal, and how CallSphere uses 57+ voices in production.
A founder's guide to AI voice assistants for ecommerce: customer service, order lookup, and how CallSphere fits in versus virtual receptionists.
Robot text to speech in 2026: how I pick TTS APIs, when robotic voices help, and how CallSphere ships 57+ language voice agents. Hands-on guide.
The customer support specialist role in 2026 is half human, half AI. Here is what the job looks like, the AI tools that pair with it, and how we ship it.
© 2026 CallSphere LLC. All rights reserved.