Build a Claude Finance Narrative Agent: A Walkthrough
A step-by-step engineer's walkthrough to build a Claude agent that drafts grounded month-end financial commentary, from ledger ingestion to verified sign-off.
You've decided your finance team should stop hand-writing the same variance commentary every month and let Claude draft it instead. Good instinct — but a vague "let's use AI for the MD&A" turns into a stalled prototype fast. This walkthrough is the version that ships: an engineer can follow it start to finish and end up with a narrative agent that produces grounded, checkable commentary on real close data. We'll build it in stages, each one runnable on its own.
Step 1 — Get the numbers into a clean, canonical shape
Start with the data, never the prompt. Pull the general ledger for the current and prior period plus the budget, either from a warehouse table or a CSV export from your ERP. Your first job is normalization: map every raw account code to a canonical node. Build a mapping table — account_code → canonical_account → statement_line — and treat any unmapped code as a hard error. If "7250-Cloud-Hosting" appears and isn't in the map, the run should stop and tell you, not bucket it into "Other."
Once mapped, compute the deltas in code. For each canonical account you want: actual, prior, budget, the absolute and percentage variance versus each, and a materiality flag. Store these as a list of structured fact objects. This table is the contract between your deterministic layer and Claude — the model will only ever be allowed to reference numbers that live here.
Step 2 — Stand up the tools as MCP servers
Claude reaches your systems through MCP. An MCP server is a small program that exposes tools and data to Claude over the Model Context Protocol, so the model can query a warehouse or fetch a document without you hand-coding a one-off integration. For this build you need two at minimum: a read-only SQL server over the facts table and prior commentary, and a document server that can fetch the relevant budget memo or board note.
Define each tool with a tight schema. The SQL tool takes a canonical account and a period and returns rows; it must be read-only and parameterized so the model cannot issue arbitrary destructive queries. The document tool takes an account key and returns the most recent matching commentary. Keep tool surfaces small — every extra tool is another thing the model can misuse and another line in your context budget.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
flowchart TD
A["GL + budget export"] --> B["Normalize & compute variances"]
B --> C["Facts table (the contract)"]
C --> D["Filter to material lines"]
D --> E["Claude drafts per-line commentary"]
E --> F["Verifier extracts & checks numbers"]
F -->|Mismatch| E
F -->|Clean| G["Assemble full narrative"]
G --> H["Analyst review"]
Step 3 — Write the per-line drafting prompt
Resist the urge to ask Claude for the whole MD&A in one shot. Instead, loop over your material lines and ask for commentary on one account at a time. Each call's context contains: the fact object for that account, the two or three most relevant prior-period notes, the budget assumption, and a short instruction. The instruction is precise: explain the variance in two to three sentences, reference only the numbers provided, attribute causes only to the supplied context, and hedge if the cause is unknown.
This per-line structure pays off three ways. The context per call is tiny, so the model stays focused and cheap. Failures are isolated — bad compensation commentary doesn't poison the revenue section. And you can parallelize: with the Agent SDK you can run several line-level drafting calls concurrently, which collapses wall-clock time on a large statement.
Step 4 — Add the verifier pass
Now build the guard that makes this trustworthy. After each draft, run a verification step that extracts every number and percentage from the generated prose and checks it against the fact object. A clean approach: ask a cheap model like Haiku to return a structured list of every numeric claim and its asserted value, then compare those values to the facts table in code. If the draft says 14% and the table says 11%, that's a mismatch — send it back for a redraft with the discrepancy noted.
Numeric verification is mechanical; causal verification is judgment. For causes, enforce a convention: any sentence containing "because," "driven by," or "due to" must trace to a retrieved document. The verifier flags ungrounded causal claims for human attention rather than blocking the run. This two-tier check — hard-stop on numbers, soft-flag on causes — matches how finance reviewers actually think.
Step 5 — Assemble, summarize, and present for sign-off
With clean per-line commentary in hand, make one final Claude call to assemble them into a coherent narrative: an executive summary up top, then sections by statement line. This synthesis call is where Opus 4.8 earns its cost, because it has to weigh which drivers matter most and write a summary that a CFO can read in thirty seconds. Pass it the verified line commentaries and the top materiality-ranked facts; ask for a summary that names the two or three real drivers and nothing trivial.
Finally, present the output with its receipts. Every paragraph should link back to the fact object and source document it drew from, so the reviewing analyst can click a claim and see the underlying number and the prior note that justified the cause. Sign-off becomes verification, not rewriting — which is the entire point of the build.
Step 6 — Harden it for the monthly cadence
A prototype that works once is not a close-cycle tool. Add idempotency: rerunning the agent on the same period must produce the same facts and should reuse cached drafts unless the numbers changed. Add logging: capture the exact context sent to each Claude call so that if a reviewer disputes a sentence, you can reproduce it. And add a dry-run mode that surfaces unmapped accounts and materiality decisions before any model call, so data problems get caught in seconds rather than after a full generation.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Over a few cycles you'll tune the materiality thresholds and the prior-note retrieval until the narrative reads like your best analyst on a good day. The architecture won't change — ingestion, facts, per-line drafting, verification, synthesis — but the thresholds and prompts mature into something your team trusts on a deadline.
Frequently asked questions
Why draft one line at a time instead of the whole statement at once?
Per-line drafting keeps each context window small and focused, makes failures isolatable, and lets you parallelize calls with the Agent SDK. A single mega-prompt is harder to debug and more prone to drifting onto numbers it wasn't given.
What's the minimum tooling I need to start?
A normalization map, a deterministic variance computation, and one read-only MCP server over your facts and prior notes. The document server and parallelism are valuable but can come in a second iteration.
How do I stop the agent from inventing numbers?
Compute all numbers in code, pass them as a facts table, and run a verifier that extracts every numeric claim from the draft and compares it to that table. Mismatches trigger an automatic redraft before a human sees the output.
How long does a full run take?
With precomputed math and parallel per-line drafting, a mid-sized statement typically completes in a few minutes, leaving most of the close window for review rather than writing.
Bringing agentic AI to your phone lines
This grounded, step-by-step pattern isn't limited to financial reports. CallSphere uses the same building blocks — tools over MCP, tight context, verification — to run voice and chat agents that answer every call, look up real data live, and book work 24/7. See it in action at callsphere.ai.
Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.