Skip to content
Agentic AI
Agentic AI8 min read0 views

Skills Finance Teams Need to Run Claude Agents

The hiring and upskilling shifts finance teams need to run Claude for narrative work — prompt design, eval ownership, MCP data plumbing, and who to train.

When a finance team first asks Claude to draft the commentary that accompanies a monthly close, the bottleneck is never the model. It is the people. A controller who has spent fifteen years reconciling ledgers does not automatically know how to write an eval, structure a Skill, or reason about token budgets. And the analyst who can wire up an MCP server may have no instinct for why a variance of three hundred basis points in gross margin is the thing the CFO actually cares about. Making Claude useful in finance is, more than anything, a problem of new competencies layered onto deep domain knowledge.

This post is about that layering: what people on a finance team need to learn for Claude-driven narrative work to actually function, who you hire versus who you train, and the specific skills that turn a curious experiment into a dependable part of the close calendar.

Why the skill gap is the real blocker

Finance organizations are unusually good at process discipline and unusually cautious about new tooling. Those two traits cut both ways. The discipline means that once a Claude workflow is trusted, it gets run the same way every period, which is exactly what you want. The caution means that without people who understand how the model actually behaves, the team will either over-trust a fluent-sounding answer or reject the whole approach after one hallucinated number.

The skills that close this gap fall into three buckets. The first is prompt and context engineering: knowing how to give Claude the right ledger extracts, the prior-period commentary, and the house style so it produces narrative a CFO would sign. The second is evaluation: building the checks that catch a wrong number before it reaches a board deck. The third is plumbing: connecting Claude to the systems of record through Model Context Protocol so it reads live data instead of a stale spreadsheet someone pasted in.

The new core competencies, mapped to roles

You do not need to turn every analyst into an AI engineer. You need a small number of people who own each competency and a broader group who can use the resulting workflows safely. Below is how the responsibilities tend to distribute once a team has run this for a couple of quarters.

flowchart TD
  A["Finance domain expert"] --> B["Writes Skill: house style & close rules"]
  C["Analyst / power user"] --> D["Designs prompts & context packets"]
  E["AI-fluent analyst"] --> F["Builds evals & variance checks"]
  G["Data engineer"] --> H["Wires MCP to ledger & BI"]
  B --> I{"Claude drafts narrative"}
  D --> I
  F --> I
  H --> I
  I --> J["Controller reviews & signs"]

The domain expert — usually a controller or senior FP&A lead — does not write code, but they author the Agent Skill that encodes how your company talks about its numbers. Agent Skills are folders of instructions, scripts, and resources that Claude loads dynamically when a task matches, so the controller's knowledge becomes a reusable asset rather than tribal memory. The analyst designs the context packets. The AI-fluent analyst owns evals. The data engineer owns the connections. None of these is a full-time hire on day one; most start as ten-percent slices of existing roles.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

Prompt and context engineering for financial narrative

Prompt engineering in finance is less about clever phrasing and more about disciplined context assembly. Claude can hold a very large context window, but a larger context is not automatically a better one. The skill is deciding what belongs: the current-period trial balance, the two prior periods for trend, the budget, the prior commentary so tone is consistent, and an explicit instruction set about what must never be asserted without a supporting figure.

Teams that do this well treat the context packet like a working paper. They version it, they keep a template that says exactly which extracts go in which order, and they train new analysts to assemble it the same way every month. The payoff is consistency: when the input is structured the same way each period, Claude's output varies for the right reasons — because the business changed — not because someone pasted the data differently.

Evaluation is the skill nobody hires for yet

An eval is a repeatable test that scores whether the model's output meets a defined standard. In finance, the standard is brutally concrete: did every number in the narrative tie to the source? Did the commentary flag the variances that exceed materiality thresholds? Did it avoid making a causal claim the data cannot support? Someone on the team has to be able to write those checks and run them automatically before any draft is read by a human.

This is the competency most teams underestimate. It blends light scripting — extracting the numbers Claude cited and comparing them to the ledger — with finance judgment about what counts as a failure. The person who owns evals becomes the quality gatekeeper for the entire workflow, and over time their test suite is the most valuable artifact the team builds, because it is what lets you trust the system at scale.

Hire for judgment, train for tooling

Given the choice, hire people with strong finance judgment and teach them the tools, rather than hiring tool experts and hoping they absorb the judgment. The reason is asymmetry: a controller can learn to write a Skill and read an eval report in a few weeks, but teaching an external engineer the difference between a one-time restructuring charge and a recurring cost takes far longer and is far riskier to get wrong in a board deck.

That said, you do want one genuinely AI-fluent person — someone comfortable with the Claude Agent SDK, MCP servers, and multi-agent patterns — to architect the workflow and keep it healthy. This is often a single data-engineering or analytics hire who partners with the finance domain experts. The pairing of one builder and several trained domain owners is the most common shape of a working team.

A realistic upskilling path over one quarter

The teams that succeed sequence the learning. In the first few weeks, the domain experts draft the house-style Skill and the analysts learn to assemble context packets, using Claude in a sandbox against a closed prior period so mistakes are harmless. In the next phase, the AI-fluent analyst stands up the first evals and the data engineer connects Claude to a read-only view of the ledger. By the end of the quarter, the team runs the narrative draft as part of a real close, with a human reviewing every word and the eval suite catching errors before they do.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

What you are really building is a new muscle: the ability to treat Claude as a junior analyst whose work must be checked, whose context you control, and whose mistakes you have systematized ways to catch. The skills above are the ones that make that possible.

Frequently asked questions

Do finance analysts need to learn to code?

Not deeply. They need enough scripting literacy to read an eval report and understand a context-assembly template, but the heavy engineering — MCP connections, the Agent SDK, multi-agent orchestration — is best owned by one technical hire who partners with them. Most of the value comes from finance judgment plus disciplined context engineering, neither of which requires being a software engineer.

What is an Agent Skill in this context?

An Agent Skill is a folder of instructions, scripts, and reference material that Claude loads automatically when a task matches its description. For a finance team it is where you encode house style, materiality thresholds, and close rules, turning a controller's expertise into a reusable component that every narrative draft inherits.

How long before a team is self-sufficient?

Most teams reach a dependable, human-reviewed workflow within a single quarter if they sequence it: Skills and context first, then evals, then live data connections. Full comfort — where new analysts onboard onto the workflow easily — usually takes two to three close cycles.

Who should own evals on a finance team?

The most AI-fluent analyst, working closely with a controller who defines what counts as a failure. Evals sit at the intersection of light scripting and finance judgment, so they cannot be fully delegated to engineering or fully owned by someone without technical comfort.

Bringing agentic AI to your phone lines

The same skills shift — domain owners writing Skills, one builder wiring the tools — is what makes agentic voice work. CallSphere builds multi-agent assistants that answer every call and message, use tools mid-conversation, and book work around the clock. See it live at callsphere.ai.


Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.