Self-Service Analytics With Claude: A Real Walkthrough
A real end-to-end walkthrough of shipping self-service analytics with Claude: spec, curated views, MCP tools, skills, evals, and a trusted rollout.
Abstract advice about self-service analytics is easy to nod along to and hard to act on. So let's build something concrete. Imagine a mid-sized e-commerce company where the head of merchandising keeps asking the analytics team the same kind of question — "which categories are losing margin and why" — and keeps waiting two days for an answer because the three analysts are buried. The merchandising lead is not going to learn SQL. The analysts are not going to get less busy. This is the exact gap self-service analytics with Claude is meant to close, and this post walks the whole journey from that stuck question to a shipped, trusted pipeline.
We will move through it in the order a real team would: framing the problem, curating the data, wiring the tools, teaching Claude the rules, verifying the answers, and rolling out without losing trust. Along the way I'll point out the decisions that look minor but determine whether the thing works.
Step one: turn the recurring question into a specification
The project does not start with technology. It starts with pinning down what "which categories are losing margin and why" actually means. Working with the merchandising lead, the team writes it down precisely: margin is net revenue minus landed cost, at the category-week grain, excluding returns and internal test orders, compared to the trailing eight-week baseline. The "why" decomposes into a handful of known drivers — price changes, cost increases, mix shifts, and promotion depth.
This specification is the real product. It defines the questions the system must answer well and the definitions it must use. Crucially, it is bounded: the first version handles margin questions for the merchandising team, not every question for every department. Scoping tightly is what lets the team ship in weeks instead of stalling for a year trying to model the entire business. A narrow, excellent system earns the trust that funds the next expansion.
Step two: curate the semantic layer the model will read
With the spec in hand, the semantic-layer owner builds the governed surface Claude will query against. They create curated views — not raw tables — that already encode the hard decisions: a category-week margin view with returns and test orders filtered out, cost joined at the right grain, and clearly named columns. They write a metric glossary in plain language: what "margin," "baseline," and "mix shift" mean, including the edge cases that trip people up.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
flowchart TD
A["Merchandising question"] --> B["Claude loads margin Skill & glossary"]
B --> C["Claude plans: which governed view & filters"]
C --> D["MCP server runs read-only query"]
D --> E["Sanity checks: row count, sign, reconcile"]
E -->|Fail| F["Decline & escalate to analyst"]
E -->|Pass| G["Claude explains drivers + shows query"]
G --> H["User feedback logged for evals"]This curated layer is where most of the project's correctness lives. The diagram shows why: every path the model can take runs through governed views and ends with checks and provenance. The model never sees the messy raw event stream where the test orders and double-counted returns lurk. By the time Claude is in the picture, the dangerous ambiguities have already been resolved in data, not left for the model to guess at.
Step three: wire safe tools and teach Claude the rules
The MCP toolsmith exposes the curated views through a Model Context Protocol server — read-only, scoped to the merchandising schema, row-capped, and running under a role that matches the merchandising team's permissions. Model Context Protocol is an open standard that connects Claude to external systems through servers, and here it is the controlled doorway between the model and the warehouse. The model can ask for margin data; it cannot write, cannot reach finance's tables, and cannot scan unbounded.
The prompt and skills engineer then packages the institutional knowledge into an Agent Skill — a folder of instructions Claude loads when a margin question arrives. The skill carries the glossary, the rule "always use the category-week view, never the raw orders table," the standard driver decomposition, and worked examples of good answers. When the merchandising lead asks their question, Claude loads this skill, plans which view and filters to use, calls the MCP tool, and assembles an explanation that names the drivers rather than just dumping a number.
Step four: verify before anyone trusts it
Before a single real user touches the system, the eval engineer builds a test suite from the merchandising lead's actual past questions, each paired with the answer the analysts produced by hand. The suite runs the questions through the pipeline and checks that the numbers match and the driver explanations are sound. The first run is humbling — it surfaces a case where the model used a calendar-week instead of an order-week boundary — and that gap gets fixed in the semantic layer, not patched in the prompt.
Alongside the offline evals, the live pipeline runs automatic sanity checks on every answer: does the margin reconcile to the known company-wide total, is the row count plausible, are the signs correct. When a check fails, the system declines to answer confidently and routes the question to a human analyst with the context attached. This is the moment the project stops being a demo and becomes something a busy executive can rely on, because it fails safely instead of silently.
Step five: roll out, measure, and expand
The rollout is deliberately small: the merchandising lead and two of their managers, with one of the analysts acting as translator — sitting with them for the first week, coaching them toward well-formed questions, and reviewing every flagged answer. Usage data accumulates: which questions get asked, which answers get a thumbs-down, which ones the translator had to correct. Each correction feeds back into the glossary, the skill, or the eval suite.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Within a few weeks the merchandising team is self-serving the margin questions that used to take two days, and the analysts have their time back for the genuinely hard, novel work the model can't do. The shipped outcome is not "we deployed an AI"; it is "a non-technical leader now answers their own recurring questions correctly, in seconds, with the query shown beneath each answer." That credibility is what justifies expanding to the next team and the next question domain — one bounded, verified surface at a time.
Frequently asked questions
How long does a first self-service analytics pipeline take to ship?
A tightly scoped first surface — one team, one question domain, curated views, a skill, and an eval suite — is typically a matter of weeks, not quarters. The long pole is curating the semantic layer and writing definitions, not the model integration.
Why curate views instead of letting Claude query raw tables?
Because correctness should live in data, not be re-derived by the model on every question. Curated views resolve grain, returns, and test-order filtering once, so the model can't accidentally re-introduce those errors. It also dramatically shrinks the surface for wrong-grain mistakes.
What makes the answer trustworthy to a skeptical executive?
Provenance and verification. Every answer shows the query, the metric definitions, and the row count, and high-stakes results are reconciled against a known control total. A leader can trust a number they can see the derivation of in seconds.
How do you know when to expand to the next team?
When the first surface is stable: the eval suite passes consistently, flagged-answer rates fall, and the translator's corrections drop near zero. Earned trust on a narrow surface is the prerequisite for widening scope without re-spending your credibility.
Same journey, applied to live conversations
Scope tightly, curate the knowledge, verify the output, then expand — the pattern that ships trustworthy analytics also ships trustworthy agents on the phone. CallSphere takes these same steps for voice and chat, deploying agents that answer every call, fetch the right data mid-conversation, and book work 24/7. See it live at callsphere.ai.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.