By Sagar Shankaran, Founder of CallSphere
Every LLM call should emit a span with model, input tokens, output tokens, and cost — collected via OTel and aggregated in ClickHouse or Langfuse. We show the schema, the per-tenant cost cap pattern, and the daily founder digest.
Key takeaways
TL;DR — Wrap every LLM call with an OpenTelemetry span carrying
model,prompt_tokens,completion_tokens,cached_tokens,cost_usd,tenant_id, andcall_id. Sink to ClickHouse or Langfuse. Build per-tenant cost caps and a daily founder digest. CallSphere uses this to track $/call across 6 verticals.
If you can't say "what did we spend on GPT-4o-mini today?" in three seconds, you don't have an LLM observability story. The 2026 standard is OpenTelemetry: every LLM call becomes a span, every span carries token + cost attributes, and you sink to a backend that aggregates by tenant, model, route, and time.
This pipeline pays for itself the first time you catch a runaway prompt loop or a misconfigured max_tokens blowing $400 in an hour.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
flowchart LR
Code[App / agent code] -->|OpenAI SDK| OAI[(OpenAI / Anthropic)]
Code -.OTel span.- Col[OTel collector]
Col --> CH[(ClickHouse<br/>llm_spans)]
Col --> LF[Langfuse]
CH --> Cap[Cost cap worker]
Cap -->|429 if over| Code
CH --> Dig[Daily founder digest]
Cost cap worker enforces hard per-tenant limits at the gateway layer.
CallSphere — 37 agents · 90+ tools · 115+ DB tables · 6 verticals. $149 / $499 / $1499 at /pricing. 14-day trial, 22% affiliate. Every LLM call across all verticals (Healthcare at /industries/healthcare, Real Estate, Salon, Sales, After-Hours, IT) emits an OTel span; ClickHouse aggregates cost_usd_per_call and a Slack digest lands every morning at 9 ET. Try the dashboard at /demo.
@opentelemetry/api + @opentelemetry/sdk-node).model, prompt_tokens, completion_tokens, cached_tokens, cost_usd, tenant_id, call_id.exporter/clickhouse, OR send to Langfuse.SUM(cost_usd) WHERE ts > now() - 1h and 429s if over.import { trace, SpanStatusCode } from "@opentelemetry/api";
const PRICING = {
"gpt-4o-mini": { in: 0.15, out: 0.60, cached: 0.075 },
"gpt-4o": { in: 2.50, out: 10.00, cached: 1.25 },
"claude-haiku-4-5": { in: 0.80, out: 4.00, cached: 0.08 },
};
async function callLlm(model: string, msgs: any[], ctx: any) {
const tracer = trace.getTracer("callsphere-ai");
return await tracer.startActiveSpan("llm.call", async (span) => {
span.setAttribute("llm.model", model);
span.setAttribute("tenant.id", ctx.tenantId);
span.setAttribute("call.id", ctx.callId);
try {
const r = await ai.chat.completions.create({ model, messages: msgs });
const u = r.usage!;
const p = PRICING[model];
const cost = (u.prompt_tokens * p.in + u.completion_tokens * p.out) / 1_000_000;
span.setAttribute("llm.prompt_tokens", u.prompt_tokens);
span.setAttribute("llm.completion_tokens", u.completion_tokens);
span.setAttribute("llm.cost_usd", cost);
return r;
} catch (e: any) {
span.recordException(e);
span.setStatus({ code: SpanStatusCode.ERROR });
throw e;
} finally {
span.end();
}
});
}
tenant_id — can't bill or cap.Langfuse vs. Datadog LLM Observability vs. Braintrust? Langfuse if open-source matters; Datadog if you already pay them; Braintrust if you also want eval.
OTel auto-instrumentation? @opentelemetry/instrumentation-openai works for OpenAI SDK; for Anthropic use opentelemetry-instrumentation-anthropic (community).
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Per-tenant caps — hard or soft? Soft warns; hard 429s. Most SaaS does soft + email + hard at 2x.
How granular? One span per LLM call; one trace per user request; sample at 100% for cost, can downsample evals.
Cost dashboard latency? ClickHouse rollup updates every 5 min — good enough.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
Offline evals catch regressions before deploy on a fixed dataset. Online evals catch real-world drift on live traffic. You need both — here is how we run them.
Non-deterministic agents break silently when prompts, models, or tools change. Build a regression pipeline with frozen datasets, semantic diffing, and gate thresholds.
Embeddings, vector storage, graph nodes, and recall API calls all add up faster than expected. The cost model for serving 100k users with agent memory at scale.
Three serious observability stacks in 2026 took different design centers. The trade-offs across pricing, self-hosting options, and integration depth for agent teams today.
Langfuse's April 2026 release ships online evals, prompt versioning, and dataset workflows. Why self-hosted observability is worth the operational lift in 2026 builds.
Trace every turn, every tool call, every LLM round-trip with OpenTelemetry shipped to Langfuse. Find latency outliers, debug hallucinations, and watch p95 stay under 800ms.
© 2026 CallSphere LLC. All rights reserved.