Claude Context Design: What to Include and Leave Out (Skills For Organizations)
Prompt and context design for Claude agents — the three tiers of context, what to include, what to compute with tools, and what to leave out, and why.
The hardest skill in building Claude agents isn't writing prompts — it's deciding what not to put in context. Every token you add competes for the model's attention and your budget, and the instinct to "give it everything just in case" is exactly what produces slow, expensive, distractible agents. This post is a practical guide to context design: a framework for what belongs in context, what to deliberately leave out, and the reasoning behind each call.
Key takeaways
- Context is a scarce resource — more is not better; relevant is better.
- Separate always-on context (small, stable) from on-demand context (loaded per task) and keep the always-on tier tiny.
- Push detail behind progressive disclosure so the model pays for it only when a task needs it.
- Leave out raw data the model would only summarize — compute it with a tool and pass the result.
- Irrelevant context doesn't just cost tokens; it actively degrades attention and accuracy.
Why more context hurts
It's tempting to treat a large context window as a reason to stuff everything in. In reality, every extra token dilutes the model's attention across more material and raises cost and latency on every turn. An agent given ten relevant facts and a tight instruction outperforms one given those same facts buried in a thousand lines of marginally related background. Context design is about signal-to-noise, and noise has a real, measurable cost.
This is why the discipline matters even when the window is huge. A million-token window is a tool for occasionally reaching for large material, not a license to keep it all resident. The goal is to have exactly what the current task needs in context and nothing it doesn't — and the architecture of skills exists largely to make that achievable.
The three tiers of context
A useful mental model splits context into three tiers. The always-on tier is what's present every turn — system instructions and the skill index. It must stay small because everything in it is paid for constantly. The on-demand tier is loaded when a task triggers it — a skill body, a reference file. The ephemeral tier is tool output: it enters context, gets used, and ideally doesn't linger.
flowchart TD
A["Information to expose"] --> B{"Needed every turn?"}
B -->|Yes| C["Always-on: keep tiny"]
B -->|No| D{"Needed for some tasks?"}
D -->|Yes| E["On-demand: load via skill/reference"]
D -->|No| F{"Computable?"}
F -->|Yes| G["Compute with a tool, pass result"]
F -->|No| H["Leave it out"]
The discipline is to ask, for every piece of information, which tier it belongs in — and to default toward the cheaper tiers. Most things teams want in the always-on tier actually belong on-demand. Most raw data they want on-demand should instead be computed by a tool and passed as a small result. Walking the decision tree above for each item is the entire practice of context design.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
What to put in context
Include the things the model genuinely needs to reason and that it cannot derive or fetch itself. That means the current task's instructions, the specific facts relevant to this request, the schema of any structured output you require, and the immediate results of tools it has called. These earn their place because the model's answer depends directly on them and they can't be reconstructed from elsewhere.
Be especially deliberate about instructions. A short, precise instruction with a clear role and stop condition is worth more than paragraphs of hedging. The same goes for output schemas: if you need structured output, putting the exact shape in context is high-value because it removes guesswork. Context spent on what-to-do and what-shape-to-return almost always pays off.
What to leave out
Leave out anything the model would only summarize or filter — that's a job for a tool. If you have a 5,000-row export and you need the top five categories, don't paste the rows; run a script and pass the five categories. Leave out background that's nice-to-know but not decision-relevant for this task. Leave out detail that belongs to other sub-cases, which is exactly what progressive disclosure and reference files are for.
Also leave out stale tool output once it's been used. Carrying forward the full text of a document the model already extracted what it needed from is pure noise on subsequent turns. The general principle: if removing a piece of context wouldn't change a correct answer, it shouldn't be there. Apply that test ruthlessly and your agents get faster, cheaper, and sharper.
There is a failure mode worth naming here that teams discover the hard way: distraction by adjacency. When you include a block of context that is related to the task but not actually needed for it, the model often latches onto it and steers the answer toward that material — answering a slightly different question than the one asked. Irrelevant-but-plausible context is more dangerous than obviously-irrelevant context precisely because the model takes it seriously. The cure is the same removal test, applied with a bias toward cutting anything you're not sure earns its place.
Progressive disclosure as a context strategy
Progressive disclosure is the mechanism that makes leaving things out practical. Instead of choosing once between "in" and "out," you structure information so the model loads each piece exactly when it reaches the subtask that needs it. A skill body lists what's available and links the detail; the detail enters context only when its branch is taken. This turns context design from a binary choice into a lazy-loading tree.
The practical upshot is that you can support rich, detailed capabilities with a small resident footprint. A skill covering many sub-cases keeps a lean body and pulls in one sub-case's reference per request. Designing for this is mostly about where you put boundaries — one concern per reference file, clear links from the body, and nothing loaded speculatively. The model walks the tree; you just lay it out well.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
This reframes the whole exercise. Context design is not a one-time act of choosing the perfect prompt; it is the design of a structure the model navigates differently for every task. Two requests to the same agent should pull in different context because they need different things, and a well-laid-out tree makes that automatic rather than something you hand-tune per request. When you get this right, the same agent stays lean on a simple question and reaches deep only when a complex one demands it — without any change to how you invoke it.
Common pitfalls
- Everything in the system prompt. The always-on tier is paid for every turn — keep it tiny and move task-specific material on-demand.
- Pasting raw data to summarize. If a tool can compute the answer, pass the answer, not the data.
- Speculative context. "Just in case" material dilutes attention and costs tokens on every turn for no benefit.
- Carrying stale tool output. Once the model has used a result, lingering raw text is pure noise.
- Vague instructions padded with hedging. A short precise instruction with a stop condition beats paragraphs of caveats.
Design your context in 5 steps
- List every piece of information you're tempted to include.
- For each, walk the tier decision: always-on, on-demand, computed by tool, or left out.
- Move anything detailed or sub-case-specific into reference files behind progressive disclosure.
- Replace pasted raw data with a tool that computes the result you actually need.
- Apply the removal test — if cutting it wouldn't change a correct answer, cut it.
| Information | Decision | Why |
|---|---|---|
| Task instructions | Always-on / on-demand | Answer depends on it directly |
| Output schema | In context | Removes structural guesswork |
| Large raw dataset | Compute with tool | Model would only summarize it |
| Sub-case detail | Reference file | Load only when that branch runs |
| Nice-to-know background | Leave out | Not decision-relevant; pure noise |
Frequently asked questions
What is context design for an LLM agent?
Context design is the practice of deciding which information enters the model's working context for a given task — including instructions, relevant facts, and tool output — and deliberately excluding everything that wouldn't change a correct answer.
If the context window is huge, why ration it?
Because every token costs latency and money on each turn, and irrelevant material dilutes attention and lowers accuracy. A large window is for occasionally handling big inputs, not for keeping everything resident.
When should I compute instead of include?
Whenever the model would only summarize, filter, or do math over raw data. Run a tool, pass the small result, and keep the bulk data out of context entirely.
How does progressive disclosure help context design?
It lets you structure information as a lazy-loaded tree, so each detail enters context only when the model reaches the subtask that needs it — supporting rich capabilities with a small resident footprint.
Bringing agentic AI to your phone lines
CallSphere applies this same context discipline to voice and chat agents — loading only what each call needs, computing the rest with tools — so responses stay fast, accurate, and on-script. Hear it for yourself at callsphere.ai.
Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.