By Sagar Shankaran, Founder of CallSphere
When your AI voice agent gets one-way audio, missed DTMF, or codec mismatch, sngrep and Wireshark are still the fastest path to root cause in 2026. Here is the playbook.
Key takeaways
The AI agent answered, the customer talked, and silence came back. No log line on the bridge, no error from OpenAI, no Twilio webhook telling you anything went wrong. This is when you reach for sngrep on the SBC and Wireshark on the bridge - in 2026 those two tools still beat any GUI dashboard for finding voice path bugs.
flowchart LR
UA[SIP UA] -- REGISTER --> Reg[Registrar]
UA -- INVITE --> Proxy[SIP Proxy]
Proxy --> Dispatcher[Kamailio dispatcher]
Dispatcher --> Worker1[FreeSWITCH worker]
Dispatcher --> Worker2[FreeSWITCH worker]
Worker1 --> AI[(AI agent)]
Worker2 --> AIsngrep is a terminal SIP message viewer; it captures live SIP traffic on a chosen interface and renders call flow ladders that read like a whiteboard. It can replay captures from PCAP, write captures to PCAP, and it understands HEP (the Homer encapsulation protocol) for distributed tracing. Wireshark is the general packet analyzer; for VoIP you use the Telephony menu's "VoIP Calls" view to extract a SIP dialog and play back its RTP audio.
For AI voice debugging in 2026 you typically need both. sngrep on the SIP edge tells you if the INVITE/200/ACK three-way completed cleanly, what codec was negotiated, and where the BYE came from. Wireshark on the bridge tells you whether RTP actually flowed in both directions, whether DTMF events got through, and whether SRTP encryption was engaged.
A typical sngrep session for an AI voice incident:
# Capture SIP on UDP/5060 + TCP/5061 + WSS on port 7443
sudo sngrep -d eth0 \
-f "(port 5060 or port 5061 or port 7443) and host edge.callsphere.ai"
# Inside sngrep: F2 to filter, F8 to view ladder for selected call
# F11 to save selected dialogs to PCAP
The ladder view shows you exactly which side sent BYE first, what 4xx/5xx codes appeared, and whether the SDP offer/answer matches. Common one-way-audio causes that show up in the ladder:
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
c=10.0.0.5 (private IP) and the answer c=203.0.113.42 (public). One side is sending RTP into a black hole.For the RTP side, in Wireshark:
# Wireshark filter for the call's RTP
ip.addr == 203.0.113.42 and udp.port >= 10000 and udp.port <= 20000
# Telephony > RTP > RTP Streams shows packet count, jitter, loss
# Telephony > VoIP Calls plays the audio
If the inbound RTP stream packet count is 0 but the 200 OK SDP advertises an open RTP port, the caller's NAT or your SBC is failing. If RTP arrives but is silence (PCMU 0xFF, the "comfort noise" pattern), the carrier is muting one direction.
For AI bridges specifically, capture the WebSocket side too:
# Wireshark filter for Twilio Media Streams
tcp.port == 443 and websocket
The WebSocket frames are TLS-encrypted on the wire; you need to expose the TLS keys (export SSLKEYLOGFILE) and load them in Wireshark to decrypt. Or run mitmproxy in front of the bridge in dev environments only.
CallSphere uses Twilio Programmable Voice across all six verticals. For incident response, sngrep runs on a jump host with HEP capture from our bridge nodes; Wireshark runs locally for deeper RTP analysis when needed. Healthcare AI on FastAPI :8084 logs every WebSocket event to structured logs, so often we resolve incidents without packet capture, but for codec or jitter complaints sngrep is the first tool. Sales Calling AI's 5 concurrent outbound calls per tenant occasionally surface NAT-related one-way audio on cell-phone destinations - sngrep + Wireshark on a recreate is how we triage. After-Hours AI's simul call+SMS with 120-second timeout has a different failure mode (call-not-answered vs call-answered-no-audio); we instrument both. Across 37 agents, 90+ tools, 115+ DB tables, HIPAA + SOC 2, $149/$499/$1499 pricing, 14-day trial, and 22% affiliate, the SOP for any "audio missing" report is "pull last 30 minutes of HEP, run sngrep on the affected dialog, escalate to Wireshark if RTP shows zeros".
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Does Twilio give me SIP traces? The Voice Insights product gives you a useful approximation - jitter, packet loss, MOS. For raw SIP you need the underlying signaling, which Twilio does not expose to customers; you debug from your bridge side.
Can sngrep handle TLS-encrypted SIP? Yes if you export keys from the SBC; otherwise you see encrypted bytes. HEP from inside the SBC sees decrypted SIP.
Are there AI-assisted SIP debuggers in 2026? A few startups (Sipfront, Cekura) ship LLM-driven incident summaries on top of SIP traces. They reduce the floor for new on-call but do not replace direct trace reading for hard cases.
What about WebRTC debugging? Use chrome://webrtc-internals for browser-side issues; Wireshark with WebRTC dissector enabled on server-side captures.
Do I need sip stack-level expertise on the team? For a high-volume AI voice product, yes. SIP gotchas are the bulk of P1 voice incidents; one engineer who reads ladders in their sleep saves the team many hours.
Start a 14-day trial on a debuggable voice stack, see pricing, or contact us about voice incident response for AI products.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
A founder's guide to the female voice generator landscape: AI female voices, Japanese voices, robot voices, and how CallSphere ships 57+ voices live.
MOS 4.3+ is the band where AI voice feels human. Drop below 3.6 and conversations break. Here is how to measure, improve, and alert on MOS in production AI voice using G.711, Opus, and the underlying packet loss / jitter / latency math.
Texas SB 1188 requires US-resident EHRs from January 1, 2026; Nevada's consumer-health-data law constrains health data; Colorado AI Act takes effect June 30, 2026. AI voice agents must architect for state-by-state data localization.
PCI DSS 4.0.1 future-dated requirements went mandatory March 31, 2025. AI voice agents that take card payments on behalf of healthcare providers — copays, deductibles, payment plans — must meet 12 requirements with DTMF masking and scope reduction.
Transcoding RTP to WebSocket is more CPU-intensive than people expect. For AI voice in 2026, where you place the transcode (edge near the carrier vs central near the model) decides your cost-per-minute.
Twilio Frontline retires September 30, 2026. Real-estate teams running it for leasing and lead nurture need a 2026 plan. We outline three migration paths and the CallSphere Real Estate agent as a drop-in.
© 2026 CallSphere LLC. All rights reserved.