By Sagar Shankaran, Founder of CallSphere
Real users generate noise. Synthetic checks generate signal. Here's how to run a fake voice call against your agent every minute and catch regressions before customers do.
Key takeaways
TL;DR — Real-traffic SLOs detect regressions late. A 1-minute synthetic call detects them in 60 seconds. Combine both.
flowchart LR
Twilio["Twilio Media Streams"] -- "WS · μlaw 8kHz" --> Bridge["FastAPI Bridge :8084"]
Bridge -- "PCM16 24kHz" --> OAI["OpenAI Realtime"]
OAI --> Bridge
Bridge --> Twilio
Bridge --> Logs[(structured logs · OTel)]Synthetic monitoring is well-understood for HTTP — Datadog Synthetics and Checkly let you run a Playwright script every minute and alert on failure. The same idea applied to voice is rarer, because nobody ships an "audio Playwright." A real synthetic voice check has to: place a phone call (or open a WebRTC peer), play a pre-recorded utterance, score the agent's response, and report metrics.
Without it, your first signal of a regression is a real customer call — at which point the bad experience is already shipped.
A synthetic voice check should test:
Run one synthetic per vertical every minute. Run a longer transactional check every 15 minutes. Page on three consecutive failures.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
CallSphere built its own synthetic harness because off-the-shelf doesn't do voice well in 2026. Architecture:
synthetic_results table; metrics scraped by Prometheus.We run six synthetics every minute (one per vertical) plus three transactional flows every 15 minutes:
:8084 — synthetic calls 555-0100, says "I need to verify my insurance," expects intent insurance_verification.property_search and a successful tool call to the listings DB.Costs: ~$3.20/day per vertical for STT + gpt-4o-mini judging. Cheap enough to run forever.
We expose the synthetic dashboard publicly at status.callsphere.ai. $1499 enterprise tier gets per-tenant synthetics. Try the 14-day trial.
pc, _ := webrtc.NewPeerConnection(cfg)
audioTrack, _ := webrtc.NewTrackLocalStaticSample(...)
pc.AddTrack(audioTrack)
go playOpus(audioTrack, "fixtures/insurance_q.opus")
text = deepgram.transcribe(agent_audio)
verdict = openai.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role":"user", "content": f"Does this response answer 'insurance verification'? Reply yes or no.\n\n{text}"}],
)
INSERT INTO synthetic_results (vertical, ftl_ms, intent_ok, ts)
VALUES ('healthcare', 720, true, NOW());
Alertmanager alerts on 3 consecutive failures or FTL p95 > 1200ms.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Replay on regression. Every failed synthetic auto-creates a Linear ticket with the audio and the trace.
Q: Can I use Datadog Synthetics for voice? A: Their browser test can hit a WebRTC page; not a clean fit for SIP/PSTN. We use Datadog Synthetics for our HTTP APIs and homemade for voice.
Q: How realistic should the test utterance be? A: Use real recorded voices, not TTS — TTS hits the model differently and gives misleadingly high scores.
Q: Won't synthetics inflate my OpenAI bill? A: We see ~$0.15/check on gpt-4o-realtime. Six verticals × 1440 checks/day = ~$1300/mo across all. Worth it.
Q: How do I keep synthetics out of business metrics?
A: Tag every synthetic call with x-synthetic: true on the SIP INVITE; filter from analytics rollups.
Q: What about Checkly?
A: Great for HTTP/Playwright API checks (we use it for our /api/admin/* routes). Not voice.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
A founder's guide to texto a voz (text-to-speech in Spanish): LATAM vs Castilian voices, free options, and how CallSphere ships Spanish agents.
A founder's guide to the female voice generator landscape: AI female voices, Japanese voices, robot voices, and how CallSphere ships 57+ voices live.
A founder's guide to the Siri voice generator landscape: how AI voice cloning works, what is legal, and how CallSphere uses 57+ voices in production.
A founder's guide to AI voice assistants for ecommerce: customer service, order lookup, and how CallSphere fits in versus virtual receptionists.
Robot text to speech in 2026: how I pick TTS APIs, when robotic voices help, and how CallSphere ships 57+ language voice agents. Hands-on guide.
The customer support specialist role in 2026 is half human, half AI. Here is what the job looks like, the AI tools that pair with it, and how we ship it.
© 2026 CallSphere LLC. All rights reserved.