---
title: "Realtime LLM Cost and Token Monitoring Pipeline With OpenTelemetry (2026)"
description: "Every LLM call should emit a span with model, input tokens, output tokens, and cost — collected via OTel and aggregated in ClickHouse or Langfuse. We show the schema, the per-tenant cost cap pattern, and the daily founder digest."
canonical: https://callsphere.ai/blog/vw5c-realtime-llm-cost-token-monitoring-pipeline-otel-2026
category: "AI Engineering"
tags: ["LLM Observability", "OpenTelemetry", "Token Cost", "Langfuse", "FinOps"]
author: "CallSphere Team"
published: 2026-04-18T00:00:00.000Z
updated: 2026-05-07T16:29:41.789Z
---

# Realtime LLM Cost and Token Monitoring Pipeline With OpenTelemetry (2026)

> Every LLM call should emit a span with model, input tokens, output tokens, and cost — collected via OTel and aggregated in ClickHouse or Langfuse. We show the schema, the per-tenant cost cap pattern, and the daily founder digest.

> **TL;DR** — Wrap every LLM call with an OpenTelemetry span carrying `model`, `prompt_tokens`, `completion_tokens`, `cached_tokens`, `cost_usd`, `tenant_id`, and `call_id`. Sink to ClickHouse or Langfuse. Build per-tenant cost caps and a daily founder digest. CallSphere uses this to track $/call across 6 verticals.

## Why this pipeline

If you can't say "what did we spend on GPT-4o-mini today?" in three seconds, you don't have an LLM observability story. The 2026 standard is OpenTelemetry: every LLM call becomes a span, every span carries token + cost attributes, and you sink to a backend that aggregates by tenant, model, route, and time.

This pipeline pays for itself the first time you catch a runaway prompt loop or a misconfigured `max_tokens` blowing $400 in an hour.

## Architecture

```mermaid
flowchart LR
  Code[App / agent code] -->|OpenAI SDK| OAI[(OpenAI / Anthropic)]
  Code -.OTel span.- Col[OTel collector]
  Col --> CH[(ClickHouse
llm_spans)]
  Col --> LF[Langfuse]
  CH --> Cap[Cost cap worker]
  Cap -->|429 if over| Code
  CH --> Dig[Daily founder digest]
```

Cost cap worker enforces hard per-tenant limits at the gateway layer.

## CallSphere implementation

CallSphere — **37 agents · 90+ tools · 115+ DB tables · 6 verticals**. **$149 / $499 / $1499** at [/pricing](/pricing). [14-day trial](/trial), [22% affiliate](/affiliate). Every LLM call across all verticals (Healthcare at [/industries/healthcare](/industries/healthcare), Real Estate, Salon, Sales, After-Hours, IT) emits an OTel span; ClickHouse aggregates `cost_usd_per_call` and a Slack digest lands every morning at 9 ET. Try the dashboard at [/demo](/demo).

## Build steps with code

1. **Install OTel** in your AI service (`@opentelemetry/api` + `@opentelemetry/sdk-node`).
2. **Wrap LLM calls** with a span; attach `model`, `prompt_tokens`, `completion_tokens`, `cached_tokens`, `cost_usd`, `tenant_id`, `call_id`.
3. **Compute cost** from a pricing table (model → $/1M tokens).
4. **Sink** to ClickHouse via OTel collector `exporter/clickhouse`, OR send to Langfuse.
5. **Build per-tenant caps** — a worker watches `SUM(cost_usd) WHERE ts > now() - 1h` and 429s if over.
6. **Daily digest** — top tenants, top models, top routes, anomalies vs. 7-day baseline.
7. **Alert** on any single call > $1.

```typescript
import { trace, SpanStatusCode } from "@opentelemetry/api";

const PRICING = {
  "gpt-4o-mini":          { in: 0.15,  out: 0.60,  cached: 0.075 },
  "gpt-4o":               { in: 2.50,  out: 10.00, cached: 1.25  },
  "claude-haiku-4-5":     { in: 0.80,  out: 4.00,  cached: 0.08  },
};

async function callLlm(model: string, msgs: any[], ctx: any) {
  const tracer = trace.getTracer("callsphere-ai");
  return await tracer.startActiveSpan("llm.call", async (span) => {
    span.setAttribute("llm.model", model);
    span.setAttribute("tenant.id", ctx.tenantId);
    span.setAttribute("call.id", ctx.callId);
    try {
      const r = await ai.chat.completions.create({ model, messages: msgs });
      const u = r.usage!;
      const p = PRICING[model];
      const cost = (u.prompt_tokens * p.in + u.completion_tokens * p.out) / 1_000_000;
      span.setAttribute("llm.prompt_tokens", u.prompt_tokens);
      span.setAttribute("llm.completion_tokens", u.completion_tokens);
      span.setAttribute("llm.cost_usd", cost);
      return r;
    } catch (e: any) {
      span.recordException(e);
      span.setStatus({ code: SpanStatusCode.ERROR });
      throw e;
    } finally {
      span.end();
    }
  });
}
```

## Pitfalls

- **Per-call attributes only, no aggregation** — you'll have data but no answers; build the rollup in ClickHouse.
- **Forgetting cached tokens** — OpenAI prompt caching halves input cost; track separately.
- **Pricing table out of date** — automate refresh or pin to a known date.
- **No `tenant_id`** — can't bill or cap.
- **Langfuse OR ClickHouse, not both** — the gold standard is Langfuse for traces + ClickHouse for cost ops.

## FAQ

**Langfuse vs. Datadog LLM Observability vs. Braintrust?** Langfuse if open-source matters; Datadog if you already pay them; Braintrust if you also want eval.

**OTel auto-instrumentation?** `@opentelemetry/instrumentation-openai` works for OpenAI SDK; for Anthropic use `opentelemetry-instrumentation-anthropic` (community).

**Per-tenant caps — hard or soft?** Soft warns; hard 429s. Most SaaS does soft + email + hard at 2x.

**How granular?** One span per LLM call; one trace per user request; sample at 100% for cost, can downsample evals.

**Cost dashboard latency?** ClickHouse rollup updates every 5 min — good enough.

## Sources

- [Langfuse Token & Cost Tracking](https://langfuse.com/docs/observability/features/token-and-cost-tracking)
- [Best LLM Observability Tools 2026 (Firecrawl)](https://www.firecrawl.dev/blog/best-llm-observability-tools)
- [LLM Token Usage and Cost (Traceloop)](https://www.traceloop.com/blog/from-bills-to-budgets-how-to-track-llm-token-usage-and-cost-per-user)
- [Datadog LLM Observability Cost](https://docs.datadoghq.com/llm_observability/monitoring/cost/)
- [LLM Cost Monitoring (OpenObserve)](https://openobserve.ai/blog/llm-cost-monitoring/)

---

Source: https://callsphere.ai/blog/vw5c-realtime-llm-cost-token-monitoring-pipeline-otel-2026