TL;DR — OpenAI's Traces dashboard ends at OpenAI. To trace a real voice call you need to inject your own traceparent and join SIP, WebRTC media, model events, and tools into one root.

What goes wrong

flowchart TD
  Client[Client] --> Edge[Cloudflare Worker]
  Edge -->|WS upgrade| DO[Durable Object]
  DO --> AI[(OpenAI Realtime WS)]
  AI --> DO
  DO --> Client
  DO -.hibernation.-> Storage[(Persisted state)]

CallSphere reference architecture

The OpenAI Agents SDK emits beautiful traces — model calls, tool calls, handoffs — into OpenAI's dashboard. The Realtime API does too, via session-level traces. Both stop at OpenAI's edge. Your phone-system layer (Twilio, Telnyx, your SIP trunk), your media transport (WebRTC), and your tool executors (databases, CRM, calendars) sit outside their view. When a call goes wrong you're flipping between three dashboards and a Postgres query, manually correlating timestamps.

The fix is to make your trace the parent and have OpenAI's traces become children. Inject a traceparent header on the WebSocket upgrade or HTTPS POST that opens the Realtime session, and propagate that ID through your tool calls, RAG lookups, and SIP signaling.

How to monitor

Build a single root span per call:

Root: callsphere.call (one per phone number ringing in)
Child: sip.invite (Twilio webhook → your gateway)
Child: webrtc.peer_connection (media negotiation)
Child: gen_ai.realtime.session (the OpenAI session — they emit nested spans inside)
Children of (4): gen_ai.tool.execute per tool, gen_ai.client per model turn

Use OTel context propagation. The Realtime API doesn't accept traceparent directly, but you can stash your trace ID in the session metadata and re-attach on the model side.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

CallSphere stack

CallSphere runs Realtime for the Healthcare and Real Estate verticals. The Healthcare FastAPI on :8084 answers Twilio webhooks, mints a Realtime ephemeral key, and proxies the SDP through our edge. We open a root callsphere.call span when Twilio fires the inbound webhook. The trace ID is shoved into Realtime session metadata. Tool calls (insurance verification, EHR lookup) reuse the same trace context via OTel's HTTP propagator.

Real Estate's 6-container NATS pod is harder — the trace context flies across six microservices over NATS. We custom-coded a NATS header propagator (NATS doesn't carry HTTP headers natively) so the trace ID survives. The Sales WebSocket layer (PM2 + 8 workers) and the After-hours Bull/Redis queue use the same propagator pattern. The result: one click in Honeycomb shows the entire call, including the OpenAI-internal spans we pull from their trace export.

We see ~480ms first-token-out on Realtime calls; the trace tells us exactly which 480ms came from us vs them. $1499 enterprise tier on /pricing gets per-call trace links in the call recording UI.

Implementation

Mint the trace ID at call ingress.

@app.post("/twilio/inbound")
async def inbound(request: Request):
    with tracer.start_as_current_span("callsphere.call") as root:
        trace_id = root.get_span_context().trace_id
        ephemeral = await mint_realtime_key(metadata={"trace_id": format(trace_id, "032x")})
        return twiml_with_session(ephemeral)

Read OpenAI's trace export (their Traces API supports webhook export as of Q1 2026) and graft their spans under your root using the metadata trace_id.
Propagate over NATS with a custom header carrier:

from opentelemetry.propagate import inject
def publish_with_trace(subject, payload):
    headers = {}
    inject(headers)
    nats.publish(subject, payload, headers=headers)

Tag tool spans with gen_ai.tool.name and gen_ai.tool.call.id so they line up under the model turn that requested them.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing
Persist the call_id ↔ trace_id map in Postgres (we use the calls table) so support engineers can paste a phone number and get the trace.

FAQ

Q: Does the Realtime API natively emit OTel spans? A: As of Q1 2026, no — it emits OpenAI-format traces accessible via the dashboard and an export webhook. You graft them under your root.

Q: How do I trace TURN/STUN delays? A: We instrument the WebRTC client with timing events (onicegatheringstatechange, etc.) and emit them as span events on webrtc.peer_connection.

Q: Can I trace barge-in events? A: Yes — emit a span event gen_ai.audio.barge_in with audio.elapsed_ms so you can see how often users interrupt.

Q: Does sampling break voice traces? A: Tail-sample at the collector and always keep traces with errors or FTL > 1500ms. Head-sampling will drop the calls you most need.

Q: Is this worth it for a 5-call/day startup? A: No. Use the OpenAI dashboard until you're past 1k calls/day. Try the 14-day trial first.

Tracing OpenAI Realtime Calls End-to-End

What goes wrong

How to monitor

CallSphere stack

Implementation

FAQ

Sources

Try CallSphere AI Voice Agents

Related Articles You May Like

Latency vs Cost: A Decision Matrix for Voice AI Spend in 2026

Monitoring WebSocket Health: Heartbeats and Prometheus in 2026

The Agent Evaluation Stack in 2026: From Trace to Eval Score

Agent Tracing 101: Spans, Sessions, and the Hidden Failure Modes They Reveal

MOS Call Quality Scoring for AI Voice Operations in 2026: Beyond 4.2

OpenAI's May 2026 WebRTC Rearchitecture: How Voice Latency Got Real