Context Design for Claude Clinical Abstraction Agents

Two teams can build the same Claude abstraction agent with the same tools and the same model, and one will be reliably accurate while the other quietly hallucinates. The difference is almost always context design — the discipline of deciding what information Claude sees on each turn, in what shape, and what gets deliberately withheld. For clinical abstraction, where a single misread sentence becomes a wrong diagnosis, getting this right is not optional.

This post is about context engineering specifically for the abstractor task. We will cover what belongs in context, what to keep out, how to shape it so the model attributes its answers, and why more context is frequently worse. The governing idea is that context is a finite attention budget you are spending on the model's behalf — spend it on signal.

The principle: smallest sufficient context

The instinct to dump the whole chart into the window is understandable and wrong. Context engineering is the practice of curating exactly the information a model needs for a task — and excluding the rest — so its reasoning stays focused and grounded. For abstraction, the smallest sufficient context for any element is the set of document sections that could plausibly contain it, the relevant slice of the abstraction rulebook, and the strict output schema. Nothing else earns a place.

Why so strict? Irrelevant text is not neutral; it competes for attention and offers the model tempting but wrong material to draw on. A social-history paragraph mentioning a family member's cancer is exactly the kind of distractor that produces a fabricated patient diagnosis. By withholding it when extracting the principal diagnosis, you remove the temptation entirely. Less context, more accuracy.

What to put in context

Four things belong, and they earn their place by directly supporting the current element. First, the targeted document sections — chosen by element type, not the whole chart. Second, the specific rulebook excerpt that defines this element and its tie-breaking rules. Third, the output schema that forces evidence-bound structure. Fourth, just enough prior state for the model to stay coherent across a multi-element run, such as which encounter is in scope.

Notice what is common to all four: each one is load-bearing for the decision at hand. If you cannot articulate how a piece of context helps the model get this element right, it does not go in. This is the test that keeps context lean as the system grows and the temptation to "just include a bit more" accumulates.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

flowchart TD
  A["Element to abstract"] --> B["Select relevant sections"]
  A --> C["Pull rulebook excerpt"]
  A --> D["Attach output schema"]
  B --> E["Assemble minimal context"]
  C --> E
  D --> E
  E --> F{"Anything not load-bearing?"}
  F -->|Yes| G["Drop it"]
  F -->|No| H["Send to Claude"]

What to leave out, and why

Leave out the chart sections irrelevant to the current element — they are distractors, not context. Leave out other encounters when the element is encounter-scoped; mixing timeframes is how a discontinued medication gets recorded as active. Leave out the full coding standard when a single definition suffices; the model does not need the entire manual to classify one comorbidity, and the bulk only dilutes attention.

Also leave out raw conversational history that has gone stale. In a multi-element run, the model does not need a verbatim transcript of every prior extraction — it needs a compact summary of committed state. Carrying full history forward inflates tokens and, worse, lets earlier mistakes contaminate later reasoning. Summarize and prune aggressively; the context window is a working surface, not an archive.

Shaping context for attribution

How you format context determines whether the model can cite its evidence cleanly. Label every section with a stable id and keep those ids visible, so when Claude returns an evidence quote it references a location you can resolve and verify. Unlabeled, run-together text gives the model nothing concrete to point at, and vague attribution is the same as no attribution.

Order matters too. Place the rules and the schema where they frame the task, then the source sections the model must reason over. Keep the source text clearly delimited from instructions so the model never confuses a quote in the chart for a directive to itself — a real failure mode when patient notes contain imperative language. Clean delimiters and visible ids are small touches that pay off directly in attribution quality.

Why bigger context windows do not change the rule

Claude's context window is large, and it is tempting to conclude that curation no longer matters. It still does, for two reasons. First, attention is finite even across a million tokens; signal buried in noise is signal the model may underweight. Second, every irrelevant token is a token you pay for, on every call, multiplied across thousands of charts. A big window is a convenience for fitting genuinely necessary material, not a license to stop curating.

The practical stance: use the large window to comfortably include all the relevant context — full operative notes, the complete rulebook excerpt — without truncation anxiety, while still excluding the irrelevant. Capacity removes a constraint; it does not remove the discipline. The teams that treat a bigger window as permission to dump everything are the ones whose accuracy quietly erodes.

Iterating on context with evals

You do not have to guess what belongs in context — measure it. With an eval set of human-abstracted charts, you can run ablations: add a section type and see if accuracy moves; remove the rulebook excerpt and watch errors rise; trim history and confirm nothing breaks. Context decisions become empirical rather than superstitious, and you build a context-builder that is tuned, not merely plausible.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Treat context as a living part of the system. As you encounter new failure modes — a distractor that fools the model, an element that needs one more section — fold the fix into the context-builder and re-run the eval. Over time the mapping from element type to ideal context sharpens, and that tuned mapping is one of the most valuable assets in the whole agent.

Frequently asked questions

If Claude has a 1M-token window, why not include the whole chart?

Because irrelevant text competes for attention and invites errors, and you pay for every token on every call. A large window lets you include all the relevant material without truncation, but curation still drives accuracy and cost.

How do I decide which sections an element needs?

Start from clinical knowledge of where the element is documented, then validate empirically with ablations on a labeled set. Encode the result in a context-builder keyed by element type so the decision is explicit and tunable.

What is the biggest context mistake teams make?

Including too much, not too little. Distractor text — other encounters, irrelevant sections, stale history — produces confident, plausible, wrong values. The smallest-sufficient-context rule guards against exactly this.

How should I carry state across a multi-element run?

As a compact summary of committed state — current encounter, elements already abstracted — not a verbatim transcript. Summarizing keeps tokens down and prevents earlier mistakes from contaminating later reasoning.

Bringing agentic reasoning to your phone lines

Disciplined context design is what separates an agent that sounds smart from one you can trust to act. CallSphere applies the same context engineering to voice and chat — assistants that bring exactly the right information into each conversation, use tools mid-call, and book work 24/7. See it live at callsphere.ai.

Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.

Context Design for Claude Clinical Abstraction Agents

The principle: smallest sufficient context

What to put in context

What to leave out, and why

Shaping context for attribution

Why bigger context windows do not change the rule

Iterating on context with evals

Frequently asked questions

If Claude has a 1M-token window, why not include the whole chart?

How do I decide which sections an element needs?

What is the biggest context mistake teams make?

How should I carry state across a multi-element run?

Bringing agentic reasoning to your phone lines

Try CallSphere AI Voice Agents

Related Articles You May Like

Where Claude Code GTM engineering is heading next

Where Claude Cowork is heading and how to prepare

Measuring Claude Cowork success: metrics that prove it

How to measure success of Claude Code GTM workflows

Claude Cowork walkthrough: from problem to shipped

End-to-end Claude Code GTM workflow: a real rebuild