---
title: "How to Add Voice Agent Observability with Langfuse and OpenTelemetry"
description: "Trace every turn, every tool call, every LLM round-trip with OpenTelemetry shipped to Langfuse. Find latency outliers, debug hallucinations, and watch p95 stay under 800ms."
canonical: https://callsphere.ai/blog/vw1h-build-voice-agent-observability-langfuse-opentelemetry
category: "AI Infrastructure"
tags: ["Tutorial", "Build", "Langfuse", "OpenTelemetry", "Observability"]
author: "CallSphere Team"
published: 2026-04-21T00:00:00.000Z
updated: 2026-05-07T06:45:03.081Z
---

# How to Add Voice Agent Observability with Langfuse and OpenTelemetry

> Trace every turn, every tool call, every LLM round-trip with OpenTelemetry shipped to Langfuse. Find latency outliers, debug hallucinations, and watch p95 stay under 800ms.

> **TL;DR** — Voice agents fail in three places: STT, LLM, TTS. Without per-component tracing you'll never know which one slowed a call. OpenTelemetry → Langfuse gives you span-level visibility in 30 lines of init code.

## What you'll build

An instrumented voice bridge that emits OpenTelemetry spans for every turn (`stt → llm → tts`), tags them with the call ID, and ships them to Langfuse. You'll be able to open a single call in the Langfuse UI and see every span timing, every token count, and every prompt/response — including which turn pushed p95 latency over 800ms.

## Prerequisites

1. Langfuse Cloud account or self-hosted (open-source).
2. Node 20+ or Python 3.11+.
3. `npm install @opentelemetry/api @opentelemetry/sdk-node @opentelemetry/exporter-trace-otlp-http`.
4. Working voice agent (any of posts 1–4).
5. `LANGFUSE_PUBLIC_KEY` and `LANGFUSE_SECRET_KEY`.

## Architecture

```mermaid
flowchart LR
  AGT[Voice Agent] -- spans --> SDK[OTel SDK]
  SDK -- OTLP HTTP --> LF[Langfuse /api/public/otel]
  LF --> UI[Langfuse UI]
  LF --> EV[Eval suite]
```

## Step 1 — OTel init shipping to Langfuse

```ts
// otel.ts
import { NodeSDK } from "@opentelemetry/sdk-node";
import { OTLPTraceExporter } from "@opentelemetry/exporter-trace-otlp-http";
import { resourceFromAttributes } from "@opentelemetry/resources";

const auth = Buffer.from(
  `${process.env.LANGFUSE_PUBLIC_KEY}:${process.env.LANGFUSE_SECRET_KEY}`
).toString("base64");

const sdk = new NodeSDK({
  resource: resourceFromAttributes({ "service.name": "voice-bridge" }),
  traceExporter: new OTLPTraceExporter({
    url: "[https://cloud.langfuse.com/api/public/otel/v1/traces](https://cloud.langfuse.com/api/public/otel/v1/traces)",
    headers: { Authorization: `Basic ${auth}` },
  }),
});
sdk.start();
```

Import this at the top of your entrypoint before any other imports.

## Step 2 — Wrap each turn in a span

```ts
import { trace } from "@opentelemetry/api";
const tracer = trace.getTracer("voice-bridge");

async function handleTurn(callId: string, userText: string) {
  return tracer.startActiveSpan("turn", { attributes: { callId }}, async (turnSpan) => {
    try {
      const stt = await tracer.startActiveSpan("stt", async (s) => {
        const text = userText;
        s.setAttribute("text", text);
        s.end();
        return text;
      });

```
  const llm = await tracer.startActiveSpan("llm", async (s) => {
    const r = await openai.chat.completions.create({
      model: "gpt-4o-mini",
      messages: [{ role: "user", content: stt }],
    });
    s.setAttributes({
      "gen_ai.system": "openai",
      "gen_ai.request.model": "gpt-4o-mini",
      "gen_ai.usage.prompt_tokens": r.usage?.prompt_tokens ?? 0,
      "gen_ai.usage.completion_tokens": r.usage?.completion_tokens ?? 0,
      "gen_ai.response.text": r.choices[0].message.content,
    });
    s.end();
    return r.choices[0].message.content!;
  });

  const tts = await tracer.startActiveSpan("tts", async (s) => {
    const audio = await synthesize(llm);
    s.setAttribute("audio.bytes", audio.length);
    s.end();
    return audio;
  });

  turnSpan.end();
  return tts;
} catch (err) {
  turnSpan.recordException(err as Error);
  turnSpan.end();
  throw err;
}
```

});
}
```

## Step 3 — Use Langfuse semantic conventions

Use the `gen_ai.*` attribute namespace so Langfuse renders prompts, responses, and token counts in its UI without custom mapping. Important keys:

- `gen_ai.system` (openai | anthropic | elevenlabs)
- `gen_ai.request.model`
- `gen_ai.request.temperature`
- `gen_ai.usage.prompt_tokens`
- `gen_ai.usage.completion_tokens`
- `gen_ai.response.text`

## Step 4 — Tag tool calls

```ts
async function callTool(name: string, args: object) {
  return tracer.startActiveSpan(`tool.${name}`, async (s) => {
    s.setAttributes({
      "tool.name": name,
      "tool.args": JSON.stringify(args),
    });
    const t0 = Date.now();
    try {
      const result = await registry[name](args);
      s.setAttribute("tool.latency_ms", Date.now() - t0);
      s.setAttribute("tool.result", JSON.stringify(result).slice(0, 1000));
      return result;
    } catch (e) {
      s.recordException(e as Error);
      throw e;
    } finally {
      s.end();
    }
  });
}
```

## Step 5 — Group spans into one Langfuse "trace" per call

Set the trace ID at call start so every span (turn 1, turn 2, ..., post-call analytics) joins one trace:

```ts
const callTrace = tracer.startSpan("call", { attributes: { callId, agentId }});
const ctx = trace.setSpan(context.active(), callTrace);
context.with(ctx, async () => {
  // run the whole call here
  callTrace.end();
});
```

## Step 6 — Build alerts on p95 latency

In Langfuse Dashboards:

- Filter `span.name = turn`, last 24h.
- Plot p50/p95/p99 of `duration`.
- Alert if p95 > 1200ms for 5 minutes.

## Step 7 — Eval datasets from real calls

Click "Add to dataset" in Langfuse on any failed call to build a regression dataset. Run nightly evals and gate prompt PRs on quality (see post 13).

## Common pitfalls

- **No flush on shutdown**: spans are buffered. Call `sdk.shutdown()` on SIGTERM.
- **Spans not nesting**: forgetting `startActiveSpan` (which sets context) vs `startSpan` (which doesn't).
- **PHI in span attributes**: redact transcripts before `s.setAttribute("text", ...)` if you're under HIPAA.
- **Cardinality explosion**: don't set `callId` as a metric label — use as span attribute only.

## How CallSphere does this in production

CallSphere ships every span — turn, STT, LLM, TTS, tool — to a self-hosted Langfuse via OpenTelemetry. Healthcare runs PHI redaction in a span processor before export. The eval dashboard surfaces p95 latency per vertical and alerts when any agent crosses 1.5s. [Real-time observability is part of the platform](/pricing); [trial it](/trial).

## FAQ

**Langfuse vs LangSmith vs Phoenix?** All emit OTel; pick on price and self-host needs. Langfuse is open-source and OTel-native.

**Cost?** Cloud free tier: 50k observations/mo. Self-host: just your DB.

**Can I trace WebRTC sessions?** Yes — instrument server-side handlers; client-side, use the OTel browser SDK and a CORS-enabled OTLP collector.

**Sampling?** Sample call traces at 100% for the first month, then drop to 10% with always-on for failures.

## Sources

- [Langfuse OpenTelemetry integration](https://langfuse.com/integrations/native/opentelemetry)
- [OpenTelemetry GenAI semantic conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/)
- [Langfuse self-hosting](https://langfuse.com/self-hosting)
- [OTel Node SDK](https://opentelemetry.io/docs/languages/js/getting-started/nodejs/)

---

Source: https://callsphere.ai/blog/vw1h-build-voice-agent-observability-langfuse-opentelemetry
