Skip to content
Agentic AI
Agentic AI7 min read0 views

A Claude Finance Walkthrough: Close to Board Deck

An end-to-end use case of a finance team using Claude to turn a monthly close into board-ready narrative — setup, the errors caught, and what shipped.

Abstract advice about using Claude in finance only gets you so far. What earns trust is watching a real workflow run from a concrete problem to a shipped result. So this post follows a single, realistic end-to-end use case: a mid-sized company's FP&A team turning the raw output of a monthly close into the polished narrative that goes in front of the board — with Claude doing the first draft and a controller signing off. We will walk the whole path: the problem, the setup, the run, the failures that surfaced, and what shipped.

The problem, stated plainly

Every month after the books close, an FP&A analyst spends the better part of two days writing commentary. The numbers are already in the BI tool; the work is explaining them — why revenue came in where it did, which costs moved and why, what the variances against budget mean, and how to say it all in the company's established voice. It is high-judgment writing wrapped around low-judgment lookups, and the deadline is always tight because the close finishes late and the board meeting does not move.

The team's goal was not to remove the analyst. It was to compress the lookup-and-first-draft portion so the analyst spends their two days on judgment and review instead of assembly. Claude was a good fit precisely because the inputs are structured and the output has a consistent shape every month.

Setting up the workflow

The team built three things before the first real run. First, an Agent Skill encoding house style: how the company refers to its segments, the materiality thresholds that decide which variances get called out, and a bank of prior commentary so the tone matched. Second, a context-assembly template specifying exactly which extracts go into Claude and in what order — current-period actuals, budget, two prior periods, and the variance bridge. Third, an eval suite that ties every number in the draft back to the BI layer and flags any variance over threshold that the narrative failed to mention.

flowchart TD
  A["Close finalized in ERP"] --> B["MCP pulls actuals, budget, priors"]
  B --> C["Context packet assembled"]
  C --> D["Claude drafts narrative w/ Skill"]
  D --> E{"Eval: numbers tie & variances covered?"}
  E -->|Fail| F["Flag cells, return to analyst"]
  E -->|Pass| G["Analyst edits judgment calls"]
  G --> H["Controller signs"]
  H --> I["Narrative -> board deck"]
  F --> D

Connecting Claude to the ERP and BI tool through Model Context Protocol meant the model read the actual closed figures rather than a pasted spreadsheet. Model Context Protocol is an open standard that lets Claude reach external systems through a server interface, and here it gave the workflow a single source of truth with read-only access. That one decision eliminated an entire class of stale-data errors before they could start.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

The first real run, including what broke

The first live close was instructive precisely because it was not clean. Claude produced a fluent draft in under a minute. The eval immediately caught two problems. One number — a year-over-year revenue growth rate — did not tie, because the model had computed it against the wrong prior period. And the draft confidently attributed a margin improvement to "operational efficiencies" when the underlying data showed the gain came almost entirely from a one-time vendor credit.

Both failures were exactly the kind the team had built the system to catch. The numeric check blocked the draft and flagged the revenue cell. The team had also added a rule that any causal claim must be traceable to a provided analysis, and the efficiencies claim had no such support, so it was flagged for the analyst. Neither error reached a human reviewer as a finished sentence to be trusted; they reached the analyst as flags to resolve.

From flagged draft to shipped narrative

The analyst fixed the prior-period reference in the context template — a one-line correction that would prevent the same error every future month — and rewrote the margin sentence to correctly credit the vendor credit, noting it as non-recurring. They re-ran. The eval passed. Then came the genuinely human part: the analyst sharpened the framing of two variances, added a forward-looking sentence about the next quarter that no model could responsibly generate from historical data alone, and adjusted emphasis to match what they knew the board cared about that month.

The controller reviewed the result against the source figures, acknowledged each flagged variance, and signed. The narrative went into the board deck. End to end, the assembly-and-first-draft work that used to consume most of two days was done in an afternoon, and crucially the analyst's time went almost entirely to judgment rather than lookups.

What made it work — and what would have made it fail

Three things made this succeed. The eval suite turned the model's two errors into harmless flags instead of board-deck mistakes. The MCP connection meant the data was real and current. And the Skill meant the output already spoke in the company's voice, so the analyst edited rather than rewrote. Remove any one and the value collapses: without evals you cannot trust it, without live data you reintroduce stale-paste errors, and without the Skill you spend your saved time fixing tone.

It is equally worth naming what would have sunk the project. Skipping the human review to capture more speed would have shipped the efficiencies error in an earlier version. Granting write access to the ERP for convenience would have turned a drafting tool into an operational risk. And treating Claude's fluent first draft as finished — rather than as a junior analyst's work to be checked — would have inverted the whole trust model. The win came from using the model for what it is genuinely good at and wrapping it in controls for what it is not.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

The shipped outcome in numbers the team tracks

The team did not measure success as "Claude wrote the narrative." They measured it as analyst hours shifted from assembly to judgment, eval pass rate over successive closes, and zero numeric errors reaching the board deck. After three cycles, the first-draft time was consistently short, the eval suite had grown to catch new categories as they appeared, and the context template had absorbed several one-line fixes that quietly removed recurring errors. That compounding — each close making the next one more reliable — is the real product, not any single draft.

Frequently asked questions

How long did the workflow take to build?

The core pieces — the house-style Skill, the context template, and the first evals — came together in a few days of focused work, with one technical owner wiring the MCP connection. The system then improved incrementally over the next two to three closes as new error categories surfaced and were added to the eval suite.

What did Claude actually get wrong on the first run?

Two things: a year-over-year growth rate computed against the wrong prior period, and an unsupported causal claim crediting a margin gain to efficiencies when it was a one-time vendor credit. Both were caught by automated checks before any human treated them as finished, which is exactly why the checks exist.

Did the analyst's job shrink?

No — it shifted. The lookup-and-assembly portion compressed dramatically, and that time moved to judgment work: framing variances, adding forward-looking context, and reviewing. The analyst remained fully accountable for the final narrative; Claude produced a checkable first draft, not a finished product.

Why read data through MCP instead of pasting it?

Because pasting reintroduces stale-data risk and manual error. Reading current, closed figures through a controlled, read-only MCP interface gives the workflow a single source of truth and eliminates the class of mistakes that come from the model reasoning over an out-of-date extract.

Bringing agentic AI to your phone lines

This same shape — real data through tools, a model drafting, controls catching errors, a human accountable — is how agents earn trust on live calls. CallSphere builds voice and chat assistants that pull from your systems mid-conversation and book real work. See it live at callsphere.ai.


Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.