Claude Legal Agent Patterns: Prompts, Tools, Context
Reusable patterns for Claude legal agents: structuring prompts, designing domain tools, shaping context, and emitting structured findings.
Once your first Claude legal agent works, the second one teaches you a harder lesson: the parts you wrote ad hoc the first time become liabilities the moment you reuse them. A prompt that worked for NDAs misfires on employment agreements. A tool that returned clauses fine for one matter type chokes on another's structure. The difference between a one-off demo and a platform your firm can extend is a set of patterns — reusable ways to structure prompts, tools, and context that hold up across document types and use cases.
This post is a pattern catalog drawn from building Claude agents for legal work. None of it is exotic; all of it is the kind of structure that, once adopted, stops you from re-solving the same problems. Treat these as defaults you reach for, then deviate from deliberately.
Pattern: layer the system prompt into stable and dynamic halves
The most reusable prompt structure splits cleanly in two. The stable half is the agent's identity and rules — its role as an assistant to licensed attorneys, its citation discipline, its hard prohibitions. This half never changes between matters and rarely changes at all. The dynamic half is everything matter-specific — the document type in play, the relevant playbook sections, the user's question. Assemble the dynamic half fresh each turn and append it to the stable half.
This split pays off immediately. When you add a new document type, you write a new dynamic block, not a new agent. When legal updates a standard, they edit a retrieved playbook, not your code. And because the stable half is identical across every agent you build, behaviors like 'always cite' and 'never give advice to an end client' are guaranteed consistent firm-wide. A common anti-pattern is to grow one giant prompt that tries to handle every document type with conditionals; it becomes unmaintainable and the model's instruction-following degrades as the prompt sprawls.
Pattern: design tools around legal nouns, not database verbs
Engineers instinctively expose tools that mirror the database: query_table, run_sql, fetch_row. Claude works far better with tools named for the legal concepts it reasons about. Expose find_comparable_clauses, get_playbook_standard, compare_against_standard, lookup_matter_history. Each tool's name and schema should describe a thing a lawyer does, because that is the vocabulary Claude reasons in when the system prompt frames it as a legal assistant.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
The schema matters as much as the name. Make inputs specific and typed — a clause category as an enum, a matter ID as a required string — so Claude cannot pass something nonsensical, and so a malformed call fails fast with a clear message. Make outputs structured and provenance-bearing: every returned clause carries its source reference. The pattern is that tools encode the firm's domain model, and Claude orchestrates them. This keeps the model's job conceptual and the tool's job mechanical, which is exactly the division of labor that scales.
flowchart TD
A["Stable system prompt (role + rules)"] --> D["Assembled context"]
B["Dynamic block (doc type, question)"] --> D
C["Retrieved playbook + clauses"] --> D
D --> E{"Claude reasons"}
E -->|Legal noun tool| F["find_comparable_clauses"]
E -->|Legal noun tool| G["compare_against_standard"]
F --> D
G --> H["Structured findings + cites"]Pattern: shape context as a working set, not a document dump
The instinct to stuff the whole contract and the whole playbook into the context window is wrong even when the window is large. Long context dilutes attention and raises cost, and most of those tokens are irrelevant to the specific question. The reusable pattern is to treat context as a working set: assemble only the clauses and standards relevant to the current decision, let Claude pull more through tools if it discovers it needs them, and keep the window focused.
This is counterintuitive because a 1M-token window invites you to fill it. Resist. A focused context of the five relevant clauses and the three applicable playbook positions produces sharper, more accurate analysis than the same question buried under the full hundred-page agreement. The tool loop is what makes this safe — if Claude needs a clause you did not include, it asks for it. Build your context assembly to start lean and let the agent expand it on demand, rather than front-loading everything and hoping the model finds the needle.
Pattern: make the agent emit structured findings
Free-prose answers are convenient for chat and dangerous for legal work. The reusable pattern is to have Claude return its analysis as structured output — an array of findings, each with the clause reference, the standard it was compared against, the nature of the deviation, and a severity. The prose summary, if you want one, is generated from the structure, not instead of it.
Structured output unlocks three things you reuse everywhere. It makes the citation check mechanical — you verify each finding's reference against retrieved sources programmatically. It makes the output renderable — the same findings drive a UI table, a redline export, or a summary email without re-prompting. And it makes the agent testable — your evaluation set compares structured findings against a lawyer's marked-up answer field by field, instead of trying to grade an essay. Once you adopt structured findings, you stop writing bespoke parsing for every new agent.
Pattern: separate routing from reasoning
A reusable agent does two distinct jobs: it decides what kind of request this is, and it does the deep legal reasoning. Mixing these into one giant prompt makes both worse. The pattern is to route first — a fast classification step that determines document type and intent — then dispatch to a reasoning configuration tuned for that type. Routing is cheap and benefits from a quick model; reasoning is expensive and deserves the most capable one.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
This separation also gives you a clean extension point. Adding support for a new matter type means adding a route and its dynamic prompt block, not rewriting the reasoning core. A legal agent platform is, in pattern terms, a stable reasoning engine plus a growing library of routes and domain tools. The teams that scale fastest are the ones that recognized early that the routing layer and the reasoning layer have different cost profiles and different rates of change, and structured their code to keep them independent.
Pattern: encode the firm's standard as data, comparison as a tool
The final reusable pattern ties the others together. Never bake the firm's positions into prompts or code. Store them as data — a structured playbook keyed by clause category — and expose a comparison tool that takes a contract clause and the relevant standard and returns the deviation. Claude orchestrates: it identifies the clause category, fetches the standard, calls compare, and explains. A reusable legal agent pattern is a structure for prompts, tools, and context that holds constant across document types so adding a new use case means adding data and tools, not rewriting the agent. When your standards live as editable data and your comparisons live as tools, the lawyers maintain the legal logic and the engineers maintain the engine — the durable division that lets both move fast.
Frequently asked questions
Why not just use one big prompt for everything?
Because instruction-following degrades as prompts sprawl, and one giant prompt couples every document type together so a change for one risks breaking another. Splitting into a stable rules half and a dynamic per-turn half keeps behavior consistent and makes adding document types additive rather than risky.
Should tools mirror my database schema?
No. Name and shape tools around legal concepts the model reasons about — comparable clauses, playbook standards, matter history — rather than raw database operations. Claude orchestrates conceptual tools far more reliably than generic query interfaces, and domain-named tools keep the reasoning aligned with the lawyer's mental model.
Does a large context window mean I can skip retrieval?
No. A focused working set of relevant clauses outperforms dumping the full document, even with a huge window — attention dilutes and cost climbs. Start lean and let the tool loop pull more on demand; the agent asks for what it is missing rather than wading through everything up front.
From legal patterns to live conversations
These structures — split prompts, domain tools, lean context, structured output — are how any agentic system stays reliable as it grows. CallSphere applies the same discipline to voice and chat, fielding every call and message with assistants that use tools mid-conversation and book work nonstop. See the patterns in action at callsphere.ai.
Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.