Realtime LLM Cost and Token Monitoring Pipeline With OpenTelemetry (2026)
Every LLM call should emit a span with model, input tokens, output tokens, and cost — collected via OTel and aggregated in ClickHouse or Langfuse. We show the schema, the per-tenant cost cap pattern, and the daily founder digest.
TL;DR — Wrap every LLM call with an OpenTelemetry span carrying
model,prompt_tokens,completion_tokens,cached_tokens,cost_usd,tenant_id, andcall_id. Sink to ClickHouse or Langfuse. Build per-tenant cost caps and a daily founder digest. CallSphere uses this to track $/call across 6 verticals.
Why this pipeline
If you can't say "what did we spend on GPT-4o-mini today?" in three seconds, you don't have an LLM observability story. The 2026 standard is OpenTelemetry: every LLM call becomes a span, every span carries token + cost attributes, and you sink to a backend that aggregates by tenant, model, route, and time.
This pipeline pays for itself the first time you catch a runaway prompt loop or a misconfigured max_tokens blowing $400 in an hour.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Architecture
flowchart LR
Code[App / agent code] -->|OpenAI SDK| OAI[(OpenAI / Anthropic)]
Code -.OTel span.- Col[OTel collector]
Col --> CH[(ClickHouse<br/>llm_spans)]
Col --> LF[Langfuse]
CH --> Cap[Cost cap worker]
Cap -->|429 if over| Code
CH --> Dig[Daily founder digest]
Cost cap worker enforces hard per-tenant limits at the gateway layer.
CallSphere implementation
CallSphere — 37 agents · 90+ tools · 115+ DB tables · 6 verticals. $149 / $499 / $1499 at /pricing. 14-day trial, 22% affiliate. Every LLM call across all verticals (Healthcare at /industries/healthcare, Real Estate, Salon, Sales, After-Hours, IT) emits an OTel span; ClickHouse aggregates cost_usd_per_call and a Slack digest lands every morning at 9 ET. Try the dashboard at /demo.
Build steps with code
- Install OTel in your AI service (
@opentelemetry/api+@opentelemetry/sdk-node). - Wrap LLM calls with a span; attach
model,prompt_tokens,completion_tokens,cached_tokens,cost_usd,tenant_id,call_id. - Compute cost from a pricing table (model → $/1M tokens).
- Sink to ClickHouse via OTel collector
exporter/clickhouse, OR send to Langfuse. - Build per-tenant caps — a worker watches
SUM(cost_usd) WHERE ts > now() - 1hand 429s if over. - Daily digest — top tenants, top models, top routes, anomalies vs. 7-day baseline.
- Alert on any single call > $1.
import { trace, SpanStatusCode } from "@opentelemetry/api";
const PRICING = {
"gpt-4o-mini": { in: 0.15, out: 0.60, cached: 0.075 },
"gpt-4o": { in: 2.50, out: 10.00, cached: 1.25 },
"claude-haiku-4-5": { in: 0.80, out: 4.00, cached: 0.08 },
};
async function callLlm(model: string, msgs: any[], ctx: any) {
const tracer = trace.getTracer("callsphere-ai");
return await tracer.startActiveSpan("llm.call", async (span) => {
span.setAttribute("llm.model", model);
span.setAttribute("tenant.id", ctx.tenantId);
span.setAttribute("call.id", ctx.callId);
try {
const r = await ai.chat.completions.create({ model, messages: msgs });
const u = r.usage!;
const p = PRICING[model];
const cost = (u.prompt_tokens * p.in + u.completion_tokens * p.out) / 1_000_000;
span.setAttribute("llm.prompt_tokens", u.prompt_tokens);
span.setAttribute("llm.completion_tokens", u.completion_tokens);
span.setAttribute("llm.cost_usd", cost);
return r;
} catch (e: any) {
span.recordException(e);
span.setStatus({ code: SpanStatusCode.ERROR });
throw e;
} finally {
span.end();
}
});
}
Pitfalls
- Per-call attributes only, no aggregation — you'll have data but no answers; build the rollup in ClickHouse.
- Forgetting cached tokens — OpenAI prompt caching halves input cost; track separately.
- Pricing table out of date — automate refresh or pin to a known date.
- No
tenant_id— can't bill or cap. - Langfuse OR ClickHouse, not both — the gold standard is Langfuse for traces + ClickHouse for cost ops.
FAQ
Langfuse vs. Datadog LLM Observability vs. Braintrust? Langfuse if open-source matters; Datadog if you already pay them; Braintrust if you also want eval.
OTel auto-instrumentation? @opentelemetry/instrumentation-openai works for OpenAI SDK; for Anthropic use opentelemetry-instrumentation-anthropic (community).
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Per-tenant caps — hard or soft? Soft warns; hard 429s. Most SaaS does soft + email + hard at 2x.
How granular? One span per LLM call; one trace per user request; sample at 100% for cost, can downsample evals.
Cost dashboard latency? ClickHouse rollup updates every 5 min — good enough.
Sources
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.