OpenTelemetry GenAI Conventions for AI Agents in 2026
The OTel GenAI semantic conventions exited experimental for client spans in early 2026. Here's how CallSphere instruments 37 voice and chat agents with gen_ai.* attributes that work across Datadog, Honeycomb, and Grafana.
TL;DR — In 2026 you don't write custom span attributes for "model name" anymore. You use
gen_ai.request.modeland your traces work in every backend that supports OTel.
What goes wrong
flowchart LR
Browser["Browser / Phone"] -- "WebSocket /ws" --> LB["Load Balancer<br/>sticky session"]
LB --> Pod1["Node A · Socket.IO"]
LB --> Pod2["Node B · Socket.IO"]
Pod1 -- "pub/sub" --> Redis[("Redis cluster")]
Pod2 -- "pub/sub" --> Redis
Pod1 --> AI["AI Worker · OpenAI Realtime"]
Pod2 --> AIFor two years every team rolled its own LLM-tracing schema. model, llm.model, openai.model, anthropic.model — all meant the same thing, none queried the same way. A platform team that wanted to chart "tokens spent per model per service" had to write a per-vendor adapter for every framework. By late 2025, the OTel GenAI SIG stabilized client spans and metrics, and most agent frameworks (OpenAI Agents SDK, LangChain, LlamaIndex, AutoGen) shipped emitters by Q1 2026.
The trap is that the agent spec is still experimental, and most production agents are agents — not single LLM calls. If you only instrument the chat-completions span you miss the tool-call planning, the handoff between sub-agents, and the loop. You end up with a trace that looks fast and an experience that feels slow.
How to monitor
Use three layers of OTel GenAI conventions:
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
gen_ai.clientspans (stable) — one per LLM round-trip. Attributes:gen_ai.request.model,gen_ai.usage.input_tokens,gen_ai.usage.output_tokens,gen_ai.response.finish_reasons.gen_ai.agentspans (experimental) — one per agent invocation. Attributes:gen_ai.agent.name,gen_ai.agent.id,gen_ai.agent.description.gen_ai.tool.*events — attached to agent spans. Captures every tool call the agent makes and its result.
Standard metrics in 2026: gen_ai.client.token.usage (histogram), gen_ai.client.operation.duration (histogram). Datadog, Honeycomb, Grafana, and OpenObserve all auto-detect these.
CallSphere stack
We run 37 agents across six verticals on k3s with Cloudflare Tunnel. Every agent emits OTel GenAI spans through an OpenTelemetry Collector deployed as a DaemonSet. The collector tail-samples to 5% (100% for errors and slow turns) and forwards to two backends:
- Honeycomb for tracing (developer ergonomics on agent traces)
- Prometheus + Grafana for SLO dashboards
The Healthcare FastAPI service on :8084 decorates each route with our @trace_genai_agent decorator that auto-emits parent agent span and child client spans. The Real Estate 6-container pod sends spans across NATS subjects and reuses the trace context header so a single call shows as one trace across all six containers. Sales WebSocket workers (PM2) batch-export every 5 seconds. The After-hours Bull/Redis queue worker emits one trace per job — Bull's job ID becomes the trace ID prefix.
Plans on /pricing include trace export to your own OTel collector at the $499 tier; $1499 enterprise gets a dedicated tenant in our Honeycomb. Try it on the 14-day trial.
Implementation
- Install the OTel SDK for your framework. For Python:
pip install opentelemetry-distro \
opentelemetry-instrumentation-openai \
opentelemetry-exporter-otlp
- Wrap your agent loop with explicit agent spans:
from opentelemetry import trace
tracer = trace.get_tracer("callsphere.healthcare")
def run_agent(user_input: str):
with tracer.start_as_current_span(
"gen_ai.agent.invoke",
attributes={
"gen_ai.agent.name": "healthcare_intake",
"gen_ai.agent.id": "hc-intake-v3",
"gen_ai.system": "openai",
},
) as span:
# tool calls and llm calls inside here
# auto-instrument adds gen_ai.client spans
result = agent_loop(user_input)
span.set_attribute("gen_ai.completion.text", result.text[:512])
return result
- Configure the collector to validate semconv:
processors:
transform:
metric_statements:
- context: datapoint
statements:
- keep_keys(attributes, ["gen_ai.request.model","gen_ai.system"])
Build dashboards on the standard names. A "tokens per model per route" panel that uses
gen_ai.request.modelworks for OpenAI, Anthropic, and Cohere with no code changes.Tail-sample. 100% of error traces, 100% of traces with FTL > 1500ms, 5% of everything else. Tail-sampling at the collector saves 95% of storage cost.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
FAQ
Q: Are GenAI agent spans stable yet? A: Client spans and metrics are stable. Agent and framework spans are experimental but have been very stable in practice through Q1 2026.
Q: Do I need a vendor SDK on top of OTel? A: No. OTel + auto-instrumentation covers 80% of needs. Add a vendor SDK (Langfuse, LangSmith) if you want their UI on top — they all consume OTel.
Q: How do I keep PII out of the spans?
A: Use the collector's redaction processor or run Microsoft Presidio in a sidecar before export. Our /industries/healthcare build does this in the collector.
Q: Will my Datadog APM see this? A: Yes. Datadog LLM Observability natively maps OTel GenAI semconv to its product UI as of late 2025.
Q: What about voice-specific attributes?
A: We add callsphere.audio.first_token_ms and callsphere.audio.barge_in_count as custom attributes — namespaced so they don't collide with future OTel additions.
Sources
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.