---
title: "Claude in Legal Discovery: A Full Use-Case Walkthrough"
description: "A complete legal AI use case — from a discovery deadline to a defensible privilege log built with Claude, MCP, and Agent Skills, step by step."
canonical: https://callsphere.ai/blog/claude-in-legal-discovery-a-full-use-case-walkthrough
category: "Agentic AI"
tags: ["agentic ai", "claude", "legal ai", "discovery", "mcp", "agent skills", "litigation"]
author: "CallSphere Team"
published: 2026-05-15T17:46:22.000Z
updated: 2026-06-06T21:47:42.358Z
---

# Claude in Legal Discovery: A Full Use-Case Walkthrough

> A complete legal AI use case — from a discovery deadline to a defensible privilege log built with Claude, MCP, and Agent Skills, step by step.

Abstract architecture diagrams are easy. What teams actually want to see is a complete journey: a real legal problem, the messy middle, the decisions that mattered, and a result that shipped and held up. So this post walks through one end-to-end deployment of Claude on a genuinely hard legal task — first-pass document review and privilege logging in discovery — from the moment the problem lands to the moment a defensible work product is delivered. The numbers and parties are illustrative, but the workflow is the real shape of how these systems get built.

The setup: a mid-size litigation matter, a document production deadline four weeks out, and roughly 180,000 documents collected from custodians. The traditional path is a small army of contract reviewers billing for weeks, with quality that varies by reviewer fatigue. The goal is not to remove the lawyers — it is to let a handful of attorneys supervise a Claude-driven first pass that is faster, more consistent, and fully auditable.

## Stage one: framing the problem as an agent workflow

The first real work is not technical. It is decomposition. Discovery review is not one task; it is several, and conflating them is the classic mistake. We split it into **relevance classification** (is this document responsive to the requests?), **privilege identification** (is it attorney-client privileged or work product?), **issue tagging** (which of the case's key issues does it touch?), and **privilege-log drafting** (writing the defensible description for each withheld document).

Each of these has a different risk profile. Relevance errors are recoverable — an over-inclusive set just means more review. Privilege errors are not, because producing a privileged document can waive privilege. So the architecture treats privilege with far more caution: multiple checks, lower thresholds for human escalation, and no automated production of any document near a privilege boundary. Framing the problem this way, before writing a line of code, is what separates a deployment that holds up from one that gets a firm sanctioned.

## Stage two: building the pipeline with Skills and MCP

With the tasks decomposed, the build is mostly assembly. We connect Claude to the document review platform through an MCP server, so it can pull documents and write classifications back. We author an Agent Skill for each task — a folder containing the firm's relevance criteria for this matter, examples of privileged versus non-privileged communications, and the exact format the privilege log must follow. Skills are how the firm's specific judgment gets encoded once and applied 180,000 times.

```mermaid
flowchart TD
  A["Documents in review platform"] --> B["MCP server pulls batch"]
  B --> C["Claude: relevance classification"]
  C --> D{"Responsive?"}
  D -->|No| E["Mark non-responsive, log"]
  D -->|Yes| F["Claude: privilege & issue tagging"]
  F --> G{"Near privilege boundary?"}
  G -->|Yes| H["Escalate to attorney"]
  G -->|No| I["Draft privilege-log entry"]
  H --> I
  I --> J["Attorney QC sample & sign-off"]
```

The pipeline processes documents in batches. For each, Claude first classifies relevance. Responsive documents move to privilege and issue tagging. Anything Claude flags as near a privilege boundary — even if it leans toward not-privileged — is escalated to a human, because the cost of a false negative on privilege is catastrophic and the cost of a false positive is merely an extra minute of attorney time. This asymmetry drives the whole design: **route conservatively wherever the downside is irreversible.**

A subtle but important choice: we run privilege identification as a separate pass with its own Skill and its own model invocation, rather than asking Claude to do everything in one shot. Single-shot prompts that try to classify, tag, and log simultaneously perform worse on each subtask and are far harder to debug. Decomposed passes let us measure and tune each step independently.

## Stage three: the human-in-the-loop reality

The deployment is not autonomous, and pretending it could be would be the fastest way to lose. Attorneys review every escalated document and a statistically meaningful random sample of the documents Claude handled without escalation. That sample is the quality control backbone: if the sample shows Claude's privilege calls are reliable, the firm can defend the process; if it reveals a systematic error, the sample catches it before production.

In the first days, the sample review surfaced a real problem. Claude was under-flagging communications that copied an attorney but were primarily business discussions — a genuinely hard call that even human reviewers disagree on. We responded not by arguing with the model but by improving the Skill: adding examples of exactly this ambiguous pattern and instructing Claude to escalate rather than decide when an attorney was merely copied. The escalation rate rose slightly; the dangerous error rate dropped to near zero. This loop — sample, find the failure pattern, encode the fix into the Skill, re-measure — is the actual work of a legal AI deployment.

## Stage four: shipping a defensible work product

By the deadline, the matter had a complete first-pass review, a privilege log drafted for every withheld document, and — crucially — a documented, repeatable process. The firm could explain exactly how each determination was made, show the attorney oversight at every irreversible step, and produce the audit log of every classification, model version, and human approval. That defensibility is as valuable as the speed. A faster review that cannot be defended is worthless in litigation.

The honest accounting: this was faster and more consistent than pure manual review, but it was not free of human effort, and the upfront design work was substantial. The payoff is that the Skills and pipeline built for this matter are now reusable. The second matter starts with an asset, not a blank page, and the marginal cost of each subsequent deployment falls sharply. That compounding is the real return — not any single matter, but a capability the firm now owns.

## Frequently asked questions

### Can Claude do legal document review without attorney oversight?

No responsible deployment removes attorneys from irreversible decisions. Claude performs a fast, consistent first pass, but humans review every document near a privilege boundary and a random sample of the rest. The model accelerates and standardizes the work; the lawyers remain accountable for what gets produced.

### Why decompose discovery into separate Skills instead of one prompt?

Each subtask — relevance, privilege, issue tagging, log drafting — has a different risk profile and accuracy bar. Separate Skills and model passes perform better on each subtask, are far easier to debug and measure, and let you apply conservative escalation only where the downside is irreversible, such as privilege.

### How does this approach stay defensible in litigation?

Defensibility comes from documented process, attorney oversight at every irreversible step, and an immutable audit log of every classification, model version, and human approval. The firm can reconstruct and explain exactly how each determination was made, which is what courts and opposing counsel require.

### What makes the second matter cheaper than the first?

The Skills, MCP integration, eval samples, and escalation logic built for the first matter are reusable assets. Later matters start from that foundation and only need matter-specific tuning, so the heavy upfront design cost is amortized across every future deployment.

## Bringing agentic AI to your phone lines

The same decompose-escalate-verify pattern shown here powers CallSphere's **voice and chat** agents — they handle calls and messages end to end, pull data through tools mid-conversation, and escalate the hard cases to humans. See a live walkthrough at [callsphere.ai](https://callsphere.ai).

---

*Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.*

---

Source: https://callsphere.ai/blog/claude-in-legal-discovery-a-full-use-case-walkthrough
