How to Add Voice Agent Observability with Langfuse and OpenTelemetry
Trace every turn, every tool call, every LLM round-trip with OpenTelemetry shipped to Langfuse. Find latency outliers, debug hallucinations, and watch p95 stay under 800ms.
TL;DR — Voice agents fail in three places: STT, LLM, TTS. Without per-component tracing you'll never know which one slowed a call. OpenTelemetry → Langfuse gives you span-level visibility in 30 lines of init code.
What you'll build
An instrumented voice bridge that emits OpenTelemetry spans for every turn (stt → llm → tts), tags them with the call ID, and ships them to Langfuse. You'll be able to open a single call in the Langfuse UI and see every span timing, every token count, and every prompt/response — including which turn pushed p95 latency over 800ms.
Prerequisites
- Langfuse Cloud account or self-hosted (open-source).
- Node 20+ or Python 3.11+.
npm install @opentelemetry/api @opentelemetry/sdk-node @opentelemetry/exporter-trace-otlp-http.- Working voice agent (any of posts 1–4).
LANGFUSE_PUBLIC_KEYandLANGFUSE_SECRET_KEY.
Architecture
flowchart LR
AGT[Voice Agent] -- spans --> SDK[OTel SDK]
SDK -- OTLP HTTP --> LF[Langfuse /api/public/otel]
LF --> UI[Langfuse UI]
LF --> EV[Eval suite]
Step 1 — OTel init shipping to Langfuse
```ts // otel.ts import { NodeSDK } from "@opentelemetry/sdk-node"; import { OTLPTraceExporter } from "@opentelemetry/exporter-trace-otlp-http"; import { resourceFromAttributes } from "@opentelemetry/resources";
const auth = Buffer.from( `${process.env.LANGFUSE_PUBLIC_KEY}:${process.env.LANGFUSE_SECRET_KEY}` ).toString("base64");
const sdk = new NodeSDK({ resource: resourceFromAttributes({ "service.name": "voice-bridge" }), traceExporter: new OTLPTraceExporter({ url: "https://cloud.langfuse.com/api/public/otel/v1/traces", headers: { Authorization: `Basic ${auth}` }, }), }); sdk.start(); ```
Import this at the top of your entrypoint before any other imports.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Step 2 — Wrap each turn in a span
```ts import { trace } from "@opentelemetry/api"; const tracer = trace.getTracer("voice-bridge");
async function handleTurn(callId: string, userText: string) { return tracer.startActiveSpan("turn", { attributes: { callId }}, async (turnSpan) => { try { const stt = await tracer.startActiveSpan("stt", async (s) => { const text = userText; s.setAttribute("text", text); s.end(); return text; });
const llm = await tracer.startActiveSpan("llm", async (s) => {
const r = await openai.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: stt }],
});
s.setAttributes({
"gen_ai.system": "openai",
"gen_ai.request.model": "gpt-4o-mini",
"gen_ai.usage.prompt_tokens": r.usage?.prompt_tokens ?? 0,
"gen_ai.usage.completion_tokens": r.usage?.completion_tokens ?? 0,
"gen_ai.response.text": r.choices[0].message.content,
});
s.end();
return r.choices[0].message.content!;
});
const tts = await tracer.startActiveSpan("tts", async (s) => {
const audio = await synthesize(llm);
s.setAttribute("audio.bytes", audio.length);
s.end();
return audio;
});
turnSpan.end();
return tts;
} catch (err) {
turnSpan.recordException(err as Error);
turnSpan.end();
throw err;
}
}); } ```
Step 3 — Use Langfuse semantic conventions
Use the gen_ai.* attribute namespace so Langfuse renders prompts, responses, and token counts in its UI without custom mapping. Important keys:
gen_ai.system(openai | anthropic | elevenlabs)gen_ai.request.modelgen_ai.request.temperaturegen_ai.usage.prompt_tokensgen_ai.usage.completion_tokensgen_ai.response.text
Step 4 — Tag tool calls
```ts async function callTool(name: string, args: object) { return tracer.startActiveSpan(`tool.${name}`, async (s) => { s.setAttributes({ "tool.name": name, "tool.args": JSON.stringify(args), }); const t0 = Date.now(); try { const result = await registryname; s.setAttribute("tool.latency_ms", Date.now() - t0); s.setAttribute("tool.result", JSON.stringify(result).slice(0, 1000)); return result; } catch (e) { s.recordException(e as Error); throw e; } finally { s.end(); } }); } ```
Step 5 — Group spans into one Langfuse "trace" per call
Set the trace ID at call start so every span (turn 1, turn 2, ..., post-call analytics) joins one trace:
```ts const callTrace = tracer.startSpan("call", { attributes: { callId, agentId }}); const ctx = trace.setSpan(context.active(), callTrace); context.with(ctx, async () => { // run the whole call here callTrace.end(); }); ```
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Step 6 — Build alerts on p95 latency
In Langfuse Dashboards:
- Filter
span.name = turn, last 24h. - Plot p50/p95/p99 of
duration. - Alert if p95 > 1200ms for 5 minutes.
Step 7 — Eval datasets from real calls
Click "Add to dataset" in Langfuse on any failed call to build a regression dataset. Run nightly evals and gate prompt PRs on quality (see post 13).
Common pitfalls
- No flush on shutdown: spans are buffered. Call
sdk.shutdown()on SIGTERM. - Spans not nesting: forgetting
startActiveSpan(which sets context) vsstartSpan(which doesn't). - PHI in span attributes: redact transcripts before
s.setAttribute("text", ...)if you're under HIPAA. - Cardinality explosion: don't set
callIdas a metric label — use as span attribute only.
How CallSphere does this in production
CallSphere ships every span — turn, STT, LLM, TTS, tool — to a self-hosted Langfuse via OpenTelemetry. Healthcare runs PHI redaction in a span processor before export. The eval dashboard surfaces p95 latency per vertical and alerts when any agent crosses 1.5s. Real-time observability is part of the platform; trial it.
FAQ
Langfuse vs LangSmith vs Phoenix? All emit OTel; pick on price and self-host needs. Langfuse is open-source and OTel-native.
Cost? Cloud free tier: 50k observations/mo. Self-host: just your DB.
Can I trace WebRTC sessions? Yes — instrument server-side handlers; client-side, use the OTel browser SDK and a CORS-enabled OTLP collector.
Sampling? Sample call traces at 100% for the first month, then drop to 10% with always-on for failures.
Sources
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.