TL;DR — Voice agents fail in three places: STT, LLM, TTS. Without per-component tracing you'll never know which one slowed a call. OpenTelemetry → Langfuse gives you span-level visibility in 30 lines of init code.

What you'll build

An instrumented voice bridge that emits OpenTelemetry spans for every turn (stt → llm → tts), tags them with the call ID, and ships them to Langfuse. You'll be able to open a single call in the Langfuse UI and see every span timing, every token count, and every prompt/response — including which turn pushed p95 latency over 800ms.

Prerequisites

Langfuse Cloud account or self-hosted (open-source).
Node 20+ or Python 3.11+.
npm install @opentelemetry/api @opentelemetry/sdk-node @opentelemetry/exporter-trace-otlp-http.
Working voice agent (any of posts 1–4).
LANGFUSE_PUBLIC_KEY and LANGFUSE_SECRET_KEY.

Architecture

flowchart LR
  AGT[Voice Agent] -- spans --> SDK[OTel SDK]
  SDK -- OTLP HTTP --> LF[Langfuse /api/public/otel]
  LF --> UI[Langfuse UI]
  LF --> EV[Eval suite]

Step 1 — OTel init shipping to Langfuse

```ts // otel.ts import { NodeSDK } from "@opentelemetry/sdk-node"; import { OTLPTraceExporter } from "@opentelemetry/exporter-trace-otlp-http"; import { resourceFromAttributes } from "@opentelemetry/resources";

const auth = Buffer.from( `${process.env.LANGFUSE_PUBLIC_KEY}:${process.env.LANGFUSE_SECRET_KEY}` ).toString("base64");

const sdk = new NodeSDK({ resource: resourceFromAttributes({ "service.name": "voice-bridge" }), traceExporter: new OTLPTraceExporter({ url: "https://cloud.langfuse.com/api/public/otel/v1/traces", headers: { Authorization: `Basic ${auth}` }, }), }); sdk.start(); ```

Import this at the top of your entrypoint before any other imports.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

Step 2 — Wrap each turn in a span

```ts import { trace } from "@opentelemetry/api"; const tracer = trace.getTracer("voice-bridge");

async function handleTurn(callId: string, userText: string) { return tracer.startActiveSpan("turn", { attributes: { callId }}, async (turnSpan) => { try { const stt = await tracer.startActiveSpan("stt", async (s) => { const text = userText; s.setAttribute("text", text); s.end(); return text; });

  const llm = await tracer.startActiveSpan("llm", async (s) => {
    const r = await openai.chat.completions.create({
      model: "gpt-4o-mini",
      messages: [{ role: "user", content: stt }],
    });
    s.setAttributes({
      "gen_ai.system": "openai",
      "gen_ai.request.model": "gpt-4o-mini",
      "gen_ai.usage.prompt_tokens": r.usage?.prompt_tokens ?? 0,
      "gen_ai.usage.completion_tokens": r.usage?.completion_tokens ?? 0,
      "gen_ai.response.text": r.choices[0].message.content,
    });
    s.end();
    return r.choices[0].message.content!;
  });

  const tts = await tracer.startActiveSpan("tts", async (s) => {
    const audio = await synthesize(llm);
    s.setAttribute("audio.bytes", audio.length);
    s.end();
    return audio;
  });

  turnSpan.end();
  return tts;
} catch (err) {
  turnSpan.recordException(err as Error);
  turnSpan.end();
  throw err;
}

}); } ```

Step 3 — Use Langfuse semantic conventions

Use the gen_ai.* attribute namespace so Langfuse renders prompts, responses, and token counts in its UI without custom mapping. Important keys:

gen_ai.system (openai | anthropic | elevenlabs)
gen_ai.request.model
gen_ai.request.temperature
gen_ai.usage.prompt_tokens
gen_ai.usage.completion_tokens
gen_ai.response.text

Step 4 — Tag tool calls

```ts async function callTool(name: string, args: object) { return tracer.startActiveSpan(`tool.${name}`, async (s) => { s.setAttributes({ "tool.name": name, "tool.args": JSON.stringify(args), }); const t0 = Date.now(); try { const result = await registryname; s.setAttribute("tool.latency_ms", Date.now() - t0); s.setAttribute("tool.result", JSON.stringify(result).slice(0, 1000)); return result; } catch (e) { s.recordException(e as Error); throw e; } finally { s.end(); } }); } ```

Step 5 — Group spans into one Langfuse "trace" per call

Set the trace ID at call start so every span (turn 1, turn 2, ..., post-call analytics) joins one trace:

```ts const callTrace = tracer.startSpan("call", { attributes: { callId, agentId }}); const ctx = trace.setSpan(context.active(), callTrace); context.with(ctx, async () => { // run the whole call here callTrace.end(); }); ```

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Step 6 — Build alerts on p95 latency

In Langfuse Dashboards:

Filter span.name = turn, last 24h.
Plot p50/p95/p99 of duration.
Alert if p95 > 1200ms for 5 minutes.

Step 7 — Eval datasets from real calls

Click "Add to dataset" in Langfuse on any failed call to build a regression dataset. Run nightly evals and gate prompt PRs on quality (see post 13).

Common pitfalls

No flush on shutdown: spans are buffered. Call sdk.shutdown() on SIGTERM.
Spans not nesting: forgetting startActiveSpan (which sets context) vs startSpan (which doesn't).
PHI in span attributes: redact transcripts before s.setAttribute("text", ...) if you're under HIPAA.
Cardinality explosion: don't set callId as a metric label — use as span attribute only.

How CallSphere does this in production

CallSphere ships every span — turn, STT, LLM, TTS, tool — to a self-hosted Langfuse via OpenTelemetry. Healthcare runs PHI redaction in a span processor before export. The eval dashboard surfaces p95 latency per vertical and alerts when any agent crosses 1.5s. Real-time observability is part of the platform; trial it.

FAQ

Langfuse vs LangSmith vs Phoenix? All emit OTel; pick on price and self-host needs. Langfuse is open-source and OTel-native.

Cost? Cloud free tier: 50k observations/mo. Self-host: just your DB.

Can I trace WebRTC sessions? Yes — instrument server-side handlers; client-side, use the OTel browser SDK and a CORS-enabled OTLP collector.

Sampling? Sample call traces at 100% for the first month, then drop to 10% with always-on for failures.

How to Add Voice Agent Observability with Langfuse and OpenTelemetry

What you'll build

Prerequisites

Architecture

Step 1 — OTel init shipping to Langfuse

Step 2 — Wrap each turn in a span

Step 3 — Use Langfuse semantic conventions

Step 4 — Tag tool calls

Step 5 — Group spans into one Langfuse "trace" per call

Step 6 — Build alerts on p95 latency

Step 7 — Eval datasets from real calls

Common pitfalls

How CallSphere does this in production

FAQ

Sources

Try CallSphere AI Voice Agents

Related Articles You May Like

Monitoring WebSocket Health: Heartbeats and Prometheus in 2026

Build a Voice Agent on Cloudflare Workers AI (No External LLM)

Build a Chat Agent with Haystack RAG + Open LLM (Llama 3.2, 2026)

The Agent Evaluation Stack in 2026: From Trace to Eval Score

MOS Call Quality Scoring for AI Voice Operations in 2026: Beyond 4.2

How to Build Voice Agent CI/CD with Evals as Gate (GitHub Actions)