Skip to content
AI Engineering
AI Engineering11 min read0 views

Realtime LLM Cost and Token Monitoring Pipeline With OpenTelemetry (2026)

Every LLM call should emit a span with model, input tokens, output tokens, and cost — collected via OTel and aggregated in ClickHouse or Langfuse. We show the schema, the per-tenant cost cap pattern, and the daily founder digest.

TL;DR — Wrap every LLM call with an OpenTelemetry span carrying model, prompt_tokens, completion_tokens, cached_tokens, cost_usd, tenant_id, and call_id. Sink to ClickHouse or Langfuse. Build per-tenant cost caps and a daily founder digest. CallSphere uses this to track $/call across 6 verticals.

Why this pipeline

If you can't say "what did we spend on GPT-4o-mini today?" in three seconds, you don't have an LLM observability story. The 2026 standard is OpenTelemetry: every LLM call becomes a span, every span carries token + cost attributes, and you sink to a backend that aggregates by tenant, model, route, and time.

This pipeline pays for itself the first time you catch a runaway prompt loop or a misconfigured max_tokens blowing $400 in an hour.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

Architecture

flowchart LR
  Code[App / agent code] -->|OpenAI SDK| OAI[(OpenAI / Anthropic)]
  Code -.OTel span.- Col[OTel collector]
  Col --> CH[(ClickHouse<br/>llm_spans)]
  Col --> LF[Langfuse]
  CH --> Cap[Cost cap worker]
  Cap -->|429 if over| Code
  CH --> Dig[Daily founder digest]

Cost cap worker enforces hard per-tenant limits at the gateway layer.

CallSphere implementation

CallSphere — 37 agents · 90+ tools · 115+ DB tables · 6 verticals. $149 / $499 / $1499 at /pricing. 14-day trial, 22% affiliate. Every LLM call across all verticals (Healthcare at /industries/healthcare, Real Estate, Salon, Sales, After-Hours, IT) emits an OTel span; ClickHouse aggregates cost_usd_per_call and a Slack digest lands every morning at 9 ET. Try the dashboard at /demo.

Build steps with code

  1. Install OTel in your AI service (@opentelemetry/api + @opentelemetry/sdk-node).
  2. Wrap LLM calls with a span; attach model, prompt_tokens, completion_tokens, cached_tokens, cost_usd, tenant_id, call_id.
  3. Compute cost from a pricing table (model → $/1M tokens).
  4. Sink to ClickHouse via OTel collector exporter/clickhouse, OR send to Langfuse.
  5. Build per-tenant caps — a worker watches SUM(cost_usd) WHERE ts > now() - 1h and 429s if over.
  6. Daily digest — top tenants, top models, top routes, anomalies vs. 7-day baseline.
  7. Alert on any single call > $1.
import { trace, SpanStatusCode } from "@opentelemetry/api";

const PRICING = {
  "gpt-4o-mini":          { in: 0.15,  out: 0.60,  cached: 0.075 },
  "gpt-4o":               { in: 2.50,  out: 10.00, cached: 1.25  },
  "claude-haiku-4-5":     { in: 0.80,  out: 4.00,  cached: 0.08  },
};

async function callLlm(model: string, msgs: any[], ctx: any) {
  const tracer = trace.getTracer("callsphere-ai");
  return await tracer.startActiveSpan("llm.call", async (span) => {
    span.setAttribute("llm.model", model);
    span.setAttribute("tenant.id", ctx.tenantId);
    span.setAttribute("call.id", ctx.callId);
    try {
      const r = await ai.chat.completions.create({ model, messages: msgs });
      const u = r.usage!;
      const p = PRICING[model];
      const cost = (u.prompt_tokens * p.in + u.completion_tokens * p.out) / 1_000_000;
      span.setAttribute("llm.prompt_tokens", u.prompt_tokens);
      span.setAttribute("llm.completion_tokens", u.completion_tokens);
      span.setAttribute("llm.cost_usd", cost);
      return r;
    } catch (e: any) {
      span.recordException(e);
      span.setStatus({ code: SpanStatusCode.ERROR });
      throw e;
    } finally {
      span.end();
    }
  });
}

Pitfalls

  • Per-call attributes only, no aggregation — you'll have data but no answers; build the rollup in ClickHouse.
  • Forgetting cached tokens — OpenAI prompt caching halves input cost; track separately.
  • Pricing table out of date — automate refresh or pin to a known date.
  • No tenant_id — can't bill or cap.
  • Langfuse OR ClickHouse, not both — the gold standard is Langfuse for traces + ClickHouse for cost ops.

FAQ

Langfuse vs. Datadog LLM Observability vs. Braintrust? Langfuse if open-source matters; Datadog if you already pay them; Braintrust if you also want eval.

OTel auto-instrumentation? @opentelemetry/instrumentation-openai works for OpenAI SDK; for Anthropic use opentelemetry-instrumentation-anthropic (community).

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Per-tenant caps — hard or soft? Soft warns; hard 429s. Most SaaS does soft + email + hard at 2x.

How granular? One span per LLM call; one trace per user request; sample at 100% for cost, can downsample evals.

Cost dashboard latency? ClickHouse rollup updates every 5 min — good enough.

Sources

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like