Tracing OpenAI Realtime Calls End-to-End
OpenAI Realtime traces look great in the OpenAI dashboard but vanish when the call leaves their servers. Here's how to stitch SIP, WebRTC, your tools, and Realtime into one trace.
TL;DR — OpenAI's Traces dashboard ends at OpenAI. To trace a real voice call you need to inject your own
traceparentand join SIP, WebRTC media, model events, and tools into one root.
What goes wrong
flowchart TD
Client[Client] --> Edge[Cloudflare Worker]
Edge -->|WS upgrade| DO[Durable Object]
DO --> AI[(OpenAI Realtime WS)]
AI --> DO
DO --> Client
DO -.hibernation.-> Storage[(Persisted state)]The OpenAI Agents SDK emits beautiful traces — model calls, tool calls, handoffs — into OpenAI's dashboard. The Realtime API does too, via session-level traces. Both stop at OpenAI's edge. Your phone-system layer (Twilio, Telnyx, your SIP trunk), your media transport (WebRTC), and your tool executors (databases, CRM, calendars) sit outside their view. When a call goes wrong you're flipping between three dashboards and a Postgres query, manually correlating timestamps.
The fix is to make your trace the parent and have OpenAI's traces become children. Inject a traceparent header on the WebSocket upgrade or HTTPS POST that opens the Realtime session, and propagate that ID through your tool calls, RAG lookups, and SIP signaling.
How to monitor
Build a single root span per call:
- Root:
callsphere.call(one per phone number ringing in) - Child:
sip.invite(Twilio webhook → your gateway) - Child:
webrtc.peer_connection(media negotiation) - Child:
gen_ai.realtime.session(the OpenAI session — they emit nested spans inside) - Children of (4):
gen_ai.tool.executeper tool,gen_ai.clientper model turn
Use OTel context propagation. The Realtime API doesn't accept traceparent directly, but you can stash your trace ID in the session metadata and re-attach on the model side.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
CallSphere stack
CallSphere runs Realtime for the Healthcare and Real Estate verticals. The Healthcare FastAPI on :8084 answers Twilio webhooks, mints a Realtime ephemeral key, and proxies the SDP through our edge. We open a root callsphere.call span when Twilio fires the inbound webhook. The trace ID is shoved into Realtime session metadata. Tool calls (insurance verification, EHR lookup) reuse the same trace context via OTel's HTTP propagator.
Real Estate's 6-container NATS pod is harder — the trace context flies across six microservices over NATS. We custom-coded a NATS header propagator (NATS doesn't carry HTTP headers natively) so the trace ID survives. The Sales WebSocket layer (PM2 + 8 workers) and the After-hours Bull/Redis queue use the same propagator pattern. The result: one click in Honeycomb shows the entire call, including the OpenAI-internal spans we pull from their trace export.
We see ~480ms first-token-out on Realtime calls; the trace tells us exactly which 480ms came from us vs them. $1499 enterprise tier on /pricing gets per-call trace links in the call recording UI.
Implementation
- Mint the trace ID at call ingress.
@app.post("/twilio/inbound")
async def inbound(request: Request):
with tracer.start_as_current_span("callsphere.call") as root:
trace_id = root.get_span_context().trace_id
ephemeral = await mint_realtime_key(metadata={"trace_id": format(trace_id, "032x")})
return twiml_with_session(ephemeral)
Read OpenAI's trace export (their Traces API supports webhook export as of Q1 2026) and graft their spans under your root using the metadata trace_id.
Propagate over NATS with a custom header carrier:
from opentelemetry.propagate import inject
def publish_with_trace(subject, payload):
headers = {}
inject(headers)
nats.publish(subject, payload, headers=headers)
Tag tool spans with
gen_ai.tool.nameandgen_ai.tool.call.idso they line up under the model turn that requested them.Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Persist the call_id ↔ trace_id map in Postgres (we use the
callstable) so support engineers can paste a phone number and get the trace.
FAQ
Q: Does the Realtime API natively emit OTel spans? A: As of Q1 2026, no — it emits OpenAI-format traces accessible via the dashboard and an export webhook. You graft them under your root.
Q: How do I trace TURN/STUN delays?
A: We instrument the WebRTC client with timing events (onicegatheringstatechange, etc.) and emit them as span events on webrtc.peer_connection.
Q: Can I trace barge-in events?
A: Yes — emit a span event gen_ai.audio.barge_in with audio.elapsed_ms so you can see how often users interrupt.
Q: Does sampling break voice traces? A: Tail-sample at the collector and always keep traces with errors or FTL > 1500ms. Head-sampling will drop the calls you most need.
Q: Is this worth it for a 5-call/day startup? A: No. Use the OpenAI dashboard until you're past 1k calls/day. Try the 14-day trial first.
Sources
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.