Claude in Lending: An End-to-End Agent Walkthrough

Most agentic-AI articles describe capabilities in the abstract. This one follows a single, realistic deployment all the way through: a mid-size lender drowning in loan-application exceptions decides to build a Claude agent to triage them. We will walk from the messy starting problem to a shipped, monitored workflow, naming the real decisions made at each step. The numbers are illustrative, but the sequence is exactly how these builds go.

The problem: a backlog of exceptions

The lender's automated underwriting passes clean applications and rejects clear declines, but a large middle band gets kicked out as "exceptions": a mismatched address, an income document that doesn't tie out, a thin credit file, a self-employment quirk. A team of analysts works these by hand. Each exception takes time, the queue grows during busy periods, and good applicants drop off because the wait is long. The goal is not to auto-approve loans; it is to have a Claude agent do the gathering, cross-checking, and drafting so an analyst can decide in minutes instead of an hour.

The crucial framing decision, made on day one, is that the agent assists and the human decides. This keeps the blast radius small and the regulatory posture clean, and it shapes every later choice.

Step one: map the analyst's real workflow

Before any prompt is written, the team sits with three senior analysts and traces exactly what they do on a hard case. The agent will replicate this path with tools.

flowchart TD
  A["Exception enters queue"] --> B["Claude pulls application & docs"]
  B --> C["Cross-check income vs documents"]
  C --> D{"Discrepancy found?"}
  D -->|No| E["Draft clear-to-proceed summary"]
  D -->|Yes| F["Draft issue list & questions"]
  E --> G["Analyst reviews & decides"]
  F --> G
  G --> H["Decision logged with citations"]

This map becomes the spec. Each box that touches data becomes an MCP tool: one to fetch the application record, one to retrieve uploaded documents, one to pull the credit summary, one to write the draft back into the case-management system. The branch in the middle is the agent's actual reasoning, and it is the part the evals will scrutinize most.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

Step two: build the tools before the prompt

The integration engineer builds the MCP server first, because a brilliant prompt with no reliable data access is useless. Each tool returns structured data and is scoped tightly: the document tool can read only the documents attached to the specific application in context, enforced server-side. The write tool can only create a draft in "pending review" state and cannot mark a case decided. That last constraint is a deliberate guardrail: even if the agent decides it is confident, it physically cannot finalize a lending decision.

The team also adds a Skill that teaches Claude how to read this lender's specific income documents, because pay stubs and tax forms have lender-specific quirks the base model shouldn't guess at. The Skill loads only when the agent is working an income discrepancy, keeping the context focused.

Step three: write the prompt and the first evals together

The agent engineer and the SME analysts work in parallel. The engineer drafts a system prompt that encodes the lender's exception policy, requires every factual statement to cite a retrieved document, and mandates an explicit "need more information" output when data is missing. Simultaneously, the analysts assemble a golden set of forty real, anonymized exceptions with known-good outcomes. These become the eval suite.

The first eval run is humbling, as it always is. The agent catches obvious income mismatches well but over-flags thin-file cases that experienced analysts would clear. The fix is not a cleverer model; it is adding several thin-file examples to the Skill and tightening the prompt's guidance on when a thin file is acceptable. Two iterations later, the agent's drafts agree with senior analysts on most of the golden set, and the disagreements are genuinely judgment calls.

Step four: model routing and cost

The team uses Sonnet for the main reasoning because it handles the document cross-checks well at reasonable cost. For the simplest exceptions, a cheap Haiku pre-classifier decides whether the case even needs the full agent or can be routed straight back as a simple fix. For a rare class of genuinely tangled self-employment cases, the workflow escalates to Opus. This three-tier routing keeps the per-case cost low while preserving capability where it matters, and the routing logic itself is a small, testable piece of code, not a model guess.

Step five: pilot, monitor, and expand

The agent ships to a single analyst pod first, in shadow-then-assist mode. For two weeks the agent drafts and the analyst works as usual, comparing. Once the drafts prove trustworthy, the analysts start working from them directly. The team watches three things: how much faster each exception clears, how often the analyst overrides the draft, and the continuous eval score against the golden set. When the override rate is low and stable and the eval score holds, the workflow expands to the rest of the team.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

The shipped outcome is not "AI underwrites loans." It is that the same analysts clear far more exceptions per day, the backlog shrinks, applicants get answers faster, and every decision carries a citation trail an auditor can follow. The agent did the gathering and drafting; the humans kept the judgment and the accountability. That division is exactly why the deployment survived contact with the compliance team.

Frequently asked questions

Why build the tools before writing the prompt?

Because an agent is only as good as its access to real, structured data. A polished prompt with flaky or unscoped tools produces confident answers built on nothing. Building the MCP tools first, with tight server-side permissions, gives the agent a reliable foundation and bakes in the safety limits before behavior tuning begins.

How long does a realistic deployment like this take?

For a single, well-scoped workflow, a team with the right skills can reach a monitored pilot in roughly a quarter. The first build is the slow one because you are creating the eval harness, the MCP pattern, and the control template. Subsequent workflows reuse all three and move much faster.

What stops the agent from approving loans on its own?

The write tool is scoped so it can only create drafts in a pending-review state; it has no ability to finalize a decision. This is enforced server-side, so even a confident or confused agent cannot cross that line. The analyst makes and owns every actual lending decision.

How do you know the agent is good enough to trust?

You measure it against a golden set of real, expert-labeled cases and watch the analyst override rate during a shadow pilot. When the agent agrees with senior analysts on the golden set and analysts rarely override its live drafts, and both signals stay stable, it has earned wider rollout.

Bringing agentic AI to your phone lines

CallSphere brings this same build pattern to voice and chat — agents that gather data through tools mid-call, draft outcomes, and hand off to a person when judgment is required. See an end-to-end agent in action at callsphere.ai.

Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.

Claude in Lending: An End-to-End Agent Walkthrough

The problem: a backlog of exceptions

Step one: map the analyst's real workflow

Step two: build the tools before the prompt

Step three: write the prompt and the first evals together

Step four: model routing and cost

Step five: pilot, monitor, and expand

Frequently asked questions

Why build the tools before writing the prompt?

How long does a realistic deployment like this take?

What stops the agent from approving loans on its own?

How do you know the agent is good enough to trust?

Bringing agentic AI to your phone lines

Try CallSphere AI Voice Agents

Related Articles You May Like

Where Claude Code GTM engineering is heading next

Where Claude Cowork is heading and how to prepare

Measuring Claude Cowork success: metrics that prove it

How to measure success of Claude Code GTM workflows

Claude Cowork walkthrough: from problem to shipped

End-to-end Claude Code GTM workflow: a real rebuild