Claude Context Design: What to Include and Leave Out (Skills For Organizations)

The hardest skill in building Claude agents isn't writing prompts — it's deciding what not to put in context. Every token you add competes for the model's attention and your budget, and the instinct to "give it everything just in case" is exactly what produces slow, expensive, distractible agents. This post is a practical guide to context design: a framework for what belongs in context, what to deliberately leave out, and the reasoning behind each call.

Key takeaways

Context is a scarce resource — more is not better; relevant is better.
Separate always-on context (small, stable) from on-demand context (loaded per task) and keep the always-on tier tiny.
Push detail behind progressive disclosure so the model pays for it only when a task needs it.
Leave out raw data the model would only summarize — compute it with a tool and pass the result.
Irrelevant context doesn't just cost tokens; it actively degrades attention and accuracy.

Why more context hurts

It's tempting to treat a large context window as a reason to stuff everything in. In reality, every extra token dilutes the model's attention across more material and raises cost and latency on every turn. An agent given ten relevant facts and a tight instruction outperforms one given those same facts buried in a thousand lines of marginally related background. Context design is about signal-to-noise, and noise has a real, measurable cost.

This is why the discipline matters even when the window is huge. A million-token window is a tool for occasionally reaching for large material, not a license to keep it all resident. The goal is to have exactly what the current task needs in context and nothing it doesn't — and the architecture of skills exists largely to make that achievable.

The three tiers of context

A useful mental model splits context into three tiers. The always-on tier is what's present every turn — system instructions and the skill index. It must stay small because everything in it is paid for constantly. The on-demand tier is loaded when a task triggers it — a skill body, a reference file. The ephemeral tier is tool output: it enters context, gets used, and ideally doesn't linger.

flowchart TD
  A["Information to expose"] --> B{"Needed every turn?"}
  B -->|Yes| C["Always-on: keep tiny"]
  B -->|No| D{"Needed for some tasks?"}
  D -->|Yes| E["On-demand: load via skill/reference"]
  D -->|No| F{"Computable?"}
  F -->|Yes| G["Compute with a tool, pass result"]
  F -->|No| H["Leave it out"]

The discipline is to ask, for every piece of information, which tier it belongs in — and to default toward the cheaper tiers. Most things teams want in the always-on tier actually belong on-demand. Most raw data they want on-demand should instead be computed by a tool and passed as a small result. Walking the decision tree above for each item is the entire practice of context design.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

What to put in context

Include the things the model genuinely needs to reason and that it cannot derive or fetch itself. That means the current task's instructions, the specific facts relevant to this request, the schema of any structured output you require, and the immediate results of tools it has called. These earn their place because the model's answer depends directly on them and they can't be reconstructed from elsewhere.

Be especially deliberate about instructions. A short, precise instruction with a clear role and stop condition is worth more than paragraphs of hedging. The same goes for output schemas: if you need structured output, putting the exact shape in context is high-value because it removes guesswork. Context spent on what-to-do and what-shape-to-return almost always pays off.

What to leave out

Leave out anything the model would only summarize or filter — that's a job for a tool. If you have a 5,000-row export and you need the top five categories, don't paste the rows; run a script and pass the five categories. Leave out background that's nice-to-know but not decision-relevant for this task. Leave out detail that belongs to other sub-cases, which is exactly what progressive disclosure and reference files are for.

Also leave out stale tool output once it's been used. Carrying forward the full text of a document the model already extracted what it needed from is pure noise on subsequent turns. The general principle: if removing a piece of context wouldn't change a correct answer, it shouldn't be there. Apply that test ruthlessly and your agents get faster, cheaper, and sharper.

There is a failure mode worth naming here that teams discover the hard way: distraction by adjacency. When you include a block of context that is related to the task but not actually needed for it, the model often latches onto it and steers the answer toward that material — answering a slightly different question than the one asked. Irrelevant-but-plausible context is more dangerous than obviously-irrelevant context precisely because the model takes it seriously. The cure is the same removal test, applied with a bias toward cutting anything you're not sure earns its place.

Progressive disclosure as a context strategy

Progressive disclosure is the mechanism that makes leaving things out practical. Instead of choosing once between "in" and "out," you structure information so the model loads each piece exactly when it reaches the subtask that needs it. A skill body lists what's available and links the detail; the detail enters context only when its branch is taken. This turns context design from a binary choice into a lazy-loading tree.

The practical upshot is that you can support rich, detailed capabilities with a small resident footprint. A skill covering many sub-cases keeps a lean body and pulls in one sub-case's reference per request. Designing for this is mostly about where you put boundaries — one concern per reference file, clear links from the body, and nothing loaded speculatively. The model walks the tree; you just lay it out well.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

This reframes the whole exercise. Context design is not a one-time act of choosing the perfect prompt; it is the design of a structure the model navigates differently for every task. Two requests to the same agent should pull in different context because they need different things, and a well-laid-out tree makes that automatic rather than something you hand-tune per request. When you get this right, the same agent stays lean on a simple question and reaches deep only when a complex one demands it — without any change to how you invoke it.

Common pitfalls

Everything in the system prompt. The always-on tier is paid for every turn — keep it tiny and move task-specific material on-demand.
Pasting raw data to summarize. If a tool can compute the answer, pass the answer, not the data.
Speculative context. "Just in case" material dilutes attention and costs tokens on every turn for no benefit.
Carrying stale tool output. Once the model has used a result, lingering raw text is pure noise.
Vague instructions padded with hedging. A short precise instruction with a stop condition beats paragraphs of caveats.

Design your context in 5 steps

List every piece of information you're tempted to include.
For each, walk the tier decision: always-on, on-demand, computed by tool, or left out.
Move anything detailed or sub-case-specific into reference files behind progressive disclosure.
Replace pasted raw data with a tool that computes the result you actually need.
Apply the removal test — if cutting it wouldn't change a correct answer, cut it.

Information	Decision	Why
Task instructions	Always-on / on-demand	Answer depends on it directly
Output schema	In context	Removes structural guesswork
Large raw dataset	Compute with tool	Model would only summarize it
Sub-case detail	Reference file	Load only when that branch runs
Nice-to-know background	Leave out	Not decision-relevant; pure noise

Frequently asked questions

What is context design for an LLM agent?

Context design is the practice of deciding which information enters the model's working context for a given task — including instructions, relevant facts, and tool output — and deliberately excluding everything that wouldn't change a correct answer.

If the context window is huge, why ration it?

Because every token costs latency and money on each turn, and irrelevant material dilutes attention and lowers accuracy. A large window is for occasionally handling big inputs, not for keeping everything resident.

When should I compute instead of include?

Whenever the model would only summarize, filter, or do math over raw data. Run a tool, pass the small result, and keep the bulk data out of context entirely.

How does progressive disclosure help context design?

It lets you structure information as a lazy-loaded tree, so each detail enters context only when the model reaches the subtask that needs it — supporting rich capabilities with a small resident footprint.

Bringing agentic AI to your phone lines

CallSphere applies this same context discipline to voice and chat agents — loading only what each call needs, computing the rest with tools — so responses stay fast, accurate, and on-script. Hear it for yourself at callsphere.ai.

Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.

Claude Context Design: What to Include and Leave Out (Skills For Organizations)

Key takeaways

Why more context hurts

The three tiers of context

What to put in context

What to leave out

Progressive disclosure as a context strategy

Common pitfalls

Design your context in 5 steps

Frequently asked questions

What is context design for an LLM agent?

If the context window is huge, why ration it?

When should I compute instead of include?

How does progressive disclosure help context design?

Bringing agentic AI to your phone lines

Try CallSphere AI Voice Agents

Related Articles You May Like

Where Claude Cowork is heading and how to prepare

Where Claude Code GTM engineering is heading next

Measuring Claude Cowork success: metrics that prove it

How to measure success of Claude Code GTM workflows

Claude Cowork walkthrough: from problem to shipped

End-to-end Claude Code GTM workflow: a real rebuild