TL;DR — Helicone is a one-line proxy that gives you logging, caching, cost tracking, and a dashboard for any LLM API call. The OSS version is feature-equal on the core observability surface; the Cloud adds managed infrastructure and 50-80ms latency you don't have to operate. Pick by how much infra pain you can absorb.

What Helicone is, in one sentence

flowchart LR
  Repo[GitHub repo] --> CI[GitHub Actions]
  CI --> Eval[Agent eval suite · PromptFoo]
  Eval -->|pass| Deploy[Deploy]
  Eval -->|fail| Block[Block PR]
  Deploy --> Prod[Production agent]
  Prod --> Trace[(LangSmith trace)]
  Trace --> Eval

CallSphere reference architecture

Helicone is an AI gateway — your app sends LLM requests through it instead of straight to the provider, and Helicone logs the request, the response, the token usage, the latency, and the cost. One line of code change (swap api.openai.com for oai.helicone.ai) and you have observability.

The OSS version (Apache 2.0, on GitHub) and the Cloud version (helicone.ai) share the same backend architecture: Cloudflare Workers + ClickHouse + Kafka. Helicone's pitch is that they've processed 2 billion+ LLM interactions with 50-80ms average added latency.

Cloud vs Self-Hosted

Dimension	Helicone Cloud	Helicone OSS (self-hosted)
Setup time	5 minutes	A weekend (Docker, Kubernetes, or manual)
Free tier	10k requests/month	Unlimited (your infra cost)
Paid plans	Starts $79/month	$0 software cost; pay for hosting
Data residency	Helicone's infra	Your VPC, your country
Infrastructure ownership	Helicone runs CF Workers + ClickHouse + Kafka	You run them
Updates	Automatic	You pull and redeploy
Compliance posture	SOC 2, ISO 27001	Whatever you certify

For early-stage and growth-stage teams, Cloud is the right default. For regulated industries (healthcare, defense, finance) where data residency matters more than ops cost, self-host.

When to self-host

Real reasons to take on the operational cost:

Regulatory data residency — EU AI Act, HIPAA, FedRAMP. Logs and prompts can't leave your VPC.
PII volume — you're logging requests that contain regulated data and the legal review takes 6 months for any third-party processor.
Sovereign cloud requirement — government or defense workloads with no exceptions.
Massive volume — at 100M+ requests/month, the per-request math eventually favors operating ClickHouse yourself.

If none of those apply, Cloud is cheaper end-to-end when you account for engineering time.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

How CallSphere uses it

CallSphere runs Helicone Cloud for all non-voice LLM traffic — content generation, scrapers, after-hours summarization, GTM automation. Voice runtime traces go to LangSmith and our internal Postgres-backed observability layer because the Helicone proxy hop is incompatible with WebRTC's session-based auth.

For our healthcare and behavioral-health verticals, where prompts may contain protected health information, we self-host the OSS version inside our HIPAA-eligible AWS account. Same Helicone UI, same ClickHouse, just under our BAA. The OSS path is genuinely production-grade — we audit-log every prompt and response with no third-party processor in the path.

Pricing: $149 / $499 / $1499. 14-day trial. 22% affiliate.

Build steps — Helicone Cloud in 5 minutes

Sign up at helicone.ai. Grab the proxy URL and an API key.
Swap your OpenAI base URL: baseURL: "https://oai.helicone.ai/v1".
Add the auth header: "Helicone-Auth": "Bearer hl-...".
Tag requests with Helicone-Property-User and Helicone-Property-Workflow so you can slice in the dashboard.
Enable caching for deterministic prompts: "Helicone-Cache-Enabled": "true".
Set a per-user budget alert.
Wire the dashboard to your on-call Slack.

Build steps — Helicone OSS self-hosted

Clone Helicone/helicone and read docker-compose.yml.
Stand up Postgres, ClickHouse, MinIO, and the proxy worker.
Configure your S3 bucket for request bodies (audit-grade).
Point your application at the self-hosted proxy URL.
Configure SSO (Cognito, Okta, Google Workspace) for the dashboard.
Set up daily ClickHouse backups; logs are your audit trail.
Subscribe to the Helicone GitHub releases; review and apply.

Code: tag every request

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  baseURL: "https://oai.helicone.ai/v1",  // or your self-hosted URL
  defaultHeaders: {
    "Helicone-Auth": `Bearer ${process.env.HELICONE_API_KEY}`,
    "Helicone-Property-Workflow": "post-call-summary",
    "Helicone-Property-User": userId,
    "Helicone-Property-Tenant": tenantId,
    "Helicone-Cache-Enabled": "true",
    "Helicone-Cache-Bucket-Max-Size": "20",
  },
});

Caching strategies that pay for themselves

Helicone's caching is the underrated cost-saver. Three patterns we use:

Deterministic prompt cache. For prompts where the same input always produces the same output (classifiers, parsers, embeddings), enable the cache with a long TTL. Hit rates over 30% on our SEO content classifier dropped that workload's cost by 28%.
User-bucketed cache. For per-user assistants, set Helicone-Cache-Bucket-Max-Size so cache entries are isolated per user. No cross-user contamination.
Burst protection. When a popular content URL spikes, Helicone's cache absorbs the burst before it hits the model provider. Saves us from rate-limit pain.

Cost analytics — slicing the bill

The Helicone dashboard slices cost by user, model, prompt, workflow, or any custom property you tagged. Two views we check daily:

Cost per workflow. Surfaces the agent that quietly tripled in tokens after a prompt change.
Cost per user. Surfaces the customer whose AI usage is exceeding their plan budget. Useful for quota enforcement and upsell.

For our Sales product, we tag every request with the customer tenant and the agent name. Within minutes of deploying we have an exact bill per tenant per agent — that fed directly into the per-tenant pricing page math.

Failover and rate-limit smoothing

Helicone's gateway can also act as a provider failover layer. If OpenAI rate-limits or returns 5xx, Helicone retries against a fallback (Anthropic, Azure OpenAI, Bedrock) without the application knowing. Configure once; deploy twice as much resilience.

"Helicone-Fallbacks": JSON.stringify([
  { "target-url": "https://api.openai.com", "weight": 0.7 },
  { "target-url": "https://api.anthropic.com", "weight": 0.3 },
]),

We use this for non-voice batch workloads where the model can be substituted; we do not use it for voice (the model behavior differences are too pronounced for live calls).

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

FAQ

How much latency does Helicone add? Average 50-80ms in our measurements; usually below the variance of the LLM call itself.

Does Helicone work with Anthropic and Gemini? Yes — distinct proxy URLs per provider.

Can I switch from Cloud to OSS later? Yes. The data model is the same; you can re-import or just start fresh.

What does it not do? It doesn't run agent-level evals. Pair Helicone (gateway observability) with Phoenix or Promptfoo (agent evals).

Where do I see this on CallSphere? Book a demo and we'll show the dashboard for our SEO content engine.

Can I run multiple Helicone instances in parallel? Yes — different Helicone-Auth keys, different dashboards. Useful when you want to isolate environments (staging vs prod) or business units.

How does Helicone compare to OpenLLMetry? OpenLLMetry is a pure OTel instrumentation library (no proxy hop). Helicone is a proxy-based gateway with a UI and caching. Different abstractions; not mutually exclusive.

Does Helicone support custom models? Yes — any OpenAI-compatible endpoint works. We've routed Together AI, Groq, and our own vLLM-hosted models through Helicone with no extra work.

Helicone OSS vs Cloud in 2026: When to Self-Host Your AI Gateway

What Helicone is, in one sentence

Cloud vs Self-Hosted

When to self-host

How CallSphere uses it

Build steps — Helicone Cloud in 5 minutes

Build steps — Helicone OSS self-hosted

Code: tag every request

Caching strategies that pay for themselves

Cost analytics — slicing the bill

Failover and rate-limit smoothing

FAQ

Sources

Try CallSphere AI Voice Agents

Related Articles You May Like

Monitoring WebSocket Health: Heartbeats and Prometheus in 2026

The Agent Evaluation Stack in 2026: From Trace to Eval Score

Zep Cloud vs Self-Hosted Zep: When to Pick Which Path in 2026

MOS Call Quality Scoring for AI Voice Operations in 2026: Beyond 4.2

Open-Source Agent Memory Libraries: 2026 Comparison Matrix

Arize Phoenix: Open-Source LLM Tracing in 2026 Reviewed Honestly