TL;DR — Wrap every LLM call with an OpenTelemetry span carrying model, prompt_tokens, completion_tokens, cached_tokens, cost_usd, tenant_id, and call_id. Sink to ClickHouse or Langfuse. Build per-tenant cost caps and a daily founder digest. CallSphere uses this to track $/call across 6 verticals.

Why this pipeline

If you can't say "what did we spend on GPT-4o-mini today?" in three seconds, you don't have an LLM observability story. The 2026 standard is OpenTelemetry: every LLM call becomes a span, every span carries token + cost attributes, and you sink to a backend that aggregates by tenant, model, route, and time.

This pipeline pays for itself the first time you catch a runaway prompt loop or a misconfigured max_tokens blowing $400 in an hour.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

Architecture

flowchart LR
  Code[App / agent code] -->|OpenAI SDK| OAI[(OpenAI / Anthropic)]
  Code -.OTel span.- Col[OTel collector]
  Col --> CH[(ClickHouse<br/>llm_spans)]
  Col --> LF[Langfuse]
  CH --> Cap[Cost cap worker]
  Cap -->|429 if over| Code
  CH --> Dig[Daily founder digest]

Cost cap worker enforces hard per-tenant limits at the gateway layer.

CallSphere implementation

CallSphere — 37 agents · 90+ tools · 115+ DB tables · 6 verticals. $149 / $499 / $1499 at /pricing. 14-day trial, 22% affiliate. Every LLM call across all verticals (Healthcare at /industries/healthcare, Real Estate, Salon, Sales, After-Hours, IT) emits an OTel span; ClickHouse aggregates cost_usd_per_call and a Slack digest lands every morning at 9 ET. Try the dashboard at /demo.

Build steps with code

Install OTel in your AI service (@opentelemetry/api + @opentelemetry/sdk-node).
Wrap LLM calls with a span; attach model, prompt_tokens, completion_tokens, cached_tokens, cost_usd, tenant_id, call_id.
Compute cost from a pricing table (model → $/1M tokens).
Sink to ClickHouse via OTel collector exporter/clickhouse, OR send to Langfuse.
Build per-tenant caps — a worker watches SUM(cost_usd) WHERE ts > now() - 1h and 429s if over.
Daily digest — top tenants, top models, top routes, anomalies vs. 7-day baseline.
Alert on any single call > $1.

import { trace, SpanStatusCode } from "@opentelemetry/api";

const PRICING = {
  "gpt-4o-mini":          { in: 0.15,  out: 0.60,  cached: 0.075 },
  "gpt-4o":               { in: 2.50,  out: 10.00, cached: 1.25  },
  "claude-haiku-4-5":     { in: 0.80,  out: 4.00,  cached: 0.08  },
};

async function callLlm(model: string, msgs: any[], ctx: any) {
  const tracer = trace.getTracer("callsphere-ai");
  return await tracer.startActiveSpan("llm.call", async (span) => {
    span.setAttribute("llm.model", model);
    span.setAttribute("tenant.id", ctx.tenantId);
    span.setAttribute("call.id", ctx.callId);
    try {
      const r = await ai.chat.completions.create({ model, messages: msgs });
      const u = r.usage!;
      const p = PRICING[model];
      const cost = (u.prompt_tokens * p.in + u.completion_tokens * p.out) / 1_000_000;
      span.setAttribute("llm.prompt_tokens", u.prompt_tokens);
      span.setAttribute("llm.completion_tokens", u.completion_tokens);
      span.setAttribute("llm.cost_usd", cost);
      return r;
    } catch (e: any) {
      span.recordException(e);
      span.setStatus({ code: SpanStatusCode.ERROR });
      throw e;
    } finally {
      span.end();
    }
  });
}

Pitfalls

Per-call attributes only, no aggregation — you'll have data but no answers; build the rollup in ClickHouse.
Forgetting cached tokens — OpenAI prompt caching halves input cost; track separately.
Pricing table out of date — automate refresh or pin to a known date.
No tenant_id — can't bill or cap.
Langfuse OR ClickHouse, not both — the gold standard is Langfuse for traces + ClickHouse for cost ops.

FAQ

Langfuse vs. Datadog LLM Observability vs. Braintrust? Langfuse if open-source matters; Datadog if you already pay them; Braintrust if you also want eval.

OTel auto-instrumentation? @opentelemetry/instrumentation-openai works for OpenAI SDK; for Anthropic use opentelemetry-instrumentation-anthropic (community).

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Per-tenant caps — hard or soft? Soft warns; hard 429s. Most SaaS does soft + email + hard at 2x.

How granular? One span per LLM call; one trace per user request; sample at 100% for cost, can downsample evals.

Cost dashboard latency? ClickHouse rollup updates every 5 min — good enough.

Realtime LLM Cost and Token Monitoring Pipeline With OpenTelemetry (2026)

Why this pipeline

Architecture

CallSphere implementation

Build steps with code

Pitfalls

FAQ

Sources

Try CallSphere AI Voice Agents

Related Articles You May Like

Online vs Offline Agent Evaluation: The Pre-Deploy / Post-Deploy Split

Regression Testing for AI Agents: Catching Silent Breakage Before Users Do

Helicone vs Langfuse vs Phoenix: 2026 Observability Picks Compared

Agent Memory Cost Modeling in 2026: An Honest Numbers Walkthrough

Langfuse 2026 Update: Evals, Prompt Management, and Datasets Mature

How to Add Voice Agent Observability with Langfuse and OpenTelemetry