Claude in Lending: An End-to-End Agent Walkthrough
A realistic step-by-step build of a Claude exception-triage agent for lenders, from messy problem to shipped, audited, monitored outcome.
Most agentic-AI articles describe capabilities in the abstract. This one follows a single, realistic deployment all the way through: a mid-size lender drowning in loan-application exceptions decides to build a Claude agent to triage them. We will walk from the messy starting problem to a shipped, monitored workflow, naming the real decisions made at each step. The numbers are illustrative, but the sequence is exactly how these builds go.
The problem: a backlog of exceptions
The lender's automated underwriting passes clean applications and rejects clear declines, but a large middle band gets kicked out as "exceptions": a mismatched address, an income document that doesn't tie out, a thin credit file, a self-employment quirk. A team of analysts works these by hand. Each exception takes time, the queue grows during busy periods, and good applicants drop off because the wait is long. The goal is not to auto-approve loans; it is to have a Claude agent do the gathering, cross-checking, and drafting so an analyst can decide in minutes instead of an hour.
The crucial framing decision, made on day one, is that the agent assists and the human decides. This keeps the blast radius small and the regulatory posture clean, and it shapes every later choice.
Step one: map the analyst's real workflow
Before any prompt is written, the team sits with three senior analysts and traces exactly what they do on a hard case. The agent will replicate this path with tools.
flowchart TD
A["Exception enters queue"] --> B["Claude pulls application & docs"]
B --> C["Cross-check income vs documents"]
C --> D{"Discrepancy found?"}
D -->|No| E["Draft clear-to-proceed summary"]
D -->|Yes| F["Draft issue list & questions"]
E --> G["Analyst reviews & decides"]
F --> G
G --> H["Decision logged with citations"]This map becomes the spec. Each box that touches data becomes an MCP tool: one to fetch the application record, one to retrieve uploaded documents, one to pull the credit summary, one to write the draft back into the case-management system. The branch in the middle is the agent's actual reasoning, and it is the part the evals will scrutinize most.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Step two: build the tools before the prompt
The integration engineer builds the MCP server first, because a brilliant prompt with no reliable data access is useless. Each tool returns structured data and is scoped tightly: the document tool can read only the documents attached to the specific application in context, enforced server-side. The write tool can only create a draft in "pending review" state and cannot mark a case decided. That last constraint is a deliberate guardrail: even if the agent decides it is confident, it physically cannot finalize a lending decision.
The team also adds a Skill that teaches Claude how to read this lender's specific income documents, because pay stubs and tax forms have lender-specific quirks the base model shouldn't guess at. The Skill loads only when the agent is working an income discrepancy, keeping the context focused.
Step three: write the prompt and the first evals together
The agent engineer and the SME analysts work in parallel. The engineer drafts a system prompt that encodes the lender's exception policy, requires every factual statement to cite a retrieved document, and mandates an explicit "need more information" output when data is missing. Simultaneously, the analysts assemble a golden set of forty real, anonymized exceptions with known-good outcomes. These become the eval suite.
The first eval run is humbling, as it always is. The agent catches obvious income mismatches well but over-flags thin-file cases that experienced analysts would clear. The fix is not a cleverer model; it is adding several thin-file examples to the Skill and tightening the prompt's guidance on when a thin file is acceptable. Two iterations later, the agent's drafts agree with senior analysts on most of the golden set, and the disagreements are genuinely judgment calls.
Step four: model routing and cost
The team uses Sonnet for the main reasoning because it handles the document cross-checks well at reasonable cost. For the simplest exceptions, a cheap Haiku pre-classifier decides whether the case even needs the full agent or can be routed straight back as a simple fix. For a rare class of genuinely tangled self-employment cases, the workflow escalates to Opus. This three-tier routing keeps the per-case cost low while preserving capability where it matters, and the routing logic itself is a small, testable piece of code, not a model guess.
Step five: pilot, monitor, and expand
The agent ships to a single analyst pod first, in shadow-then-assist mode. For two weeks the agent drafts and the analyst works as usual, comparing. Once the drafts prove trustworthy, the analysts start working from them directly. The team watches three things: how much faster each exception clears, how often the analyst overrides the draft, and the continuous eval score against the golden set. When the override rate is low and stable and the eval score holds, the workflow expands to the rest of the team.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
The shipped outcome is not "AI underwrites loans." It is that the same analysts clear far more exceptions per day, the backlog shrinks, applicants get answers faster, and every decision carries a citation trail an auditor can follow. The agent did the gathering and drafting; the humans kept the judgment and the accountability. That division is exactly why the deployment survived contact with the compliance team.
Frequently asked questions
Why build the tools before writing the prompt?
Because an agent is only as good as its access to real, structured data. A polished prompt with flaky or unscoped tools produces confident answers built on nothing. Building the MCP tools first, with tight server-side permissions, gives the agent a reliable foundation and bakes in the safety limits before behavior tuning begins.
How long does a realistic deployment like this take?
For a single, well-scoped workflow, a team with the right skills can reach a monitored pilot in roughly a quarter. The first build is the slow one because you are creating the eval harness, the MCP pattern, and the control template. Subsequent workflows reuse all three and move much faster.
What stops the agent from approving loans on its own?
The write tool is scoped so it can only create drafts in a pending-review state; it has no ability to finalize a decision. This is enforced server-side, so even a confident or confused agent cannot cross that line. The analyst makes and owns every actual lending decision.
How do you know the agent is good enough to trust?
You measure it against a golden set of real, expert-labeled cases and watch the analyst override rate during a shadow pilot. When the agent agrees with senior analysts on the golden set and analysts rarely override its live drafts, and both signals stay stable, it has earned wider rollout.
Bringing agentic AI to your phone lines
CallSphere brings this same build pattern to voice and chat — agents that gather data through tools mid-call, draft outcomes, and hand off to a person when judgment is required. See an end-to-end agent in action at callsphere.ai.
Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.