Claude Context Design: What to Include and Omit

There is a seductive but wrong instinct in agent building: when an agent gets something wrong, add more to its context. More instructions, more examples, more retrieved documents. It feels like you are helping. Often you are making it worse. A context window stuffed with marginally-relevant material does not make Claude smarter; it dilutes the signal, raises cost, and increases the odds the model anchors on the wrong detail. Good context design is as much about what you leave out as what you put in.

This post is a practical guide to deciding what belongs in Claude's context on any given turn. It is the discipline that separates agents that stay sharp at scale from ones that slowly degrade as their prompts accrete. The principles apply whether you are building a coding agent, a support bot, or a research assistant.

Key takeaways

Context is a budget, not a bucket — every token you add competes for the model's attention.
Include only what this turn needs: the task, the few relevant facts, and the tools to act.
Leave out stale history, redundant docs, and "just in case" instructions — they cost accuracy and money.
Use retrieval to pull facts on demand instead of pre-loading everything.
Put durable knowledge in Skills and durable facts in memory, so context stays lean per turn.

Why more context can hurt

It is tempting to treat the context window as free space — a 1M-token window invites you to fill it. But the model has to attend across everything you include, and relevance is not uniform. When the genuinely important instruction sits between paragraphs of tangential background, it competes for attention with all of it. Teams routinely find that trimming an over-stuffed prompt improves answer quality, because the signal-to-noise ratio went up. Bigger windows raise the ceiling on what you can include; they do not change the principle that you should include what is relevant.

Cost and latency compound the case. Every token in context is paid for on every turn and adds to time-to-first-token. An agent that re-sends 40,000 tokens of history it never references is burning money and patience for no accuracy benefit. The right question on each turn is not "what could possibly help?" but "what does this specific step need?"

A working definition of context engineering

Context engineering is the deliberate practice of curating what enters a model's context window on each turn — selecting the relevant instructions, facts, and tools, ordering them effectively, and excluding everything that does not earn its place. It treats the context window as a scarce resource to be allocated, not a scratchpad to be filled. Framed this way, the job becomes a series of include/exclude decisions you can reason about explicitly.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

flowchart TD
  A["New turn"] --> B{"Needed for THIS step?"}
  B -->|Core task| C["Include: instruction + goal"]
  B -->|Relevant fact| D["Retrieve & include top-k"]
  B -->|Occasional how-to| E["Load Skill on demand"]
  B -->|Stale or tangential| F["Exclude"]
  C --> G["Assemble lean context"]
  D --> G
  E --> G
  G --> H["Claude reasons & acts"]

What to include

Three things almost always earn their place. First, the task and constraints — what the agent should do this turn and the hard rules it must respect, stated concisely. Second, the relevant facts — the specific records, code, or documents this step touches, retrieved on demand rather than pre-loaded. Third, the tools the agent might need, defined clearly so it can act. Keep each of these tight; a focused instruction beats a verbose one, and three precise documents beat twenty loosely-related chunks.

For retrieval specifically, rank and trim aggressively. If your retriever returns ten chunks, including the top three by relevance usually produces better answers than including all ten, because the bottom of the list is mostly noise that competes for attention. Quality of selection beats quantity of inclusion almost every time.

There is a positional dimension too. Information placed at the very start or very end of a long context tends to be attended to more reliably than material buried in the middle. So when you do include several pieces, order them with the most important first rather than scattering the critical instruction halfway down a long block. The practical rule: decide what the single most important thing for this turn is, put it where the model will see it clearly, and let everything else earn a place beneath it or not appear at all.

What to leave out

Several things masquerade as helpful but mostly hurt. Stale conversation history: once a sub-task is resolved, its back-and-forth rarely needs to ride along on every future turn — summarize it and drop the transcript. Redundant documents: three sources saying the same thing add tokens, not information. "Just in case" instructions: rules for situations this agent never encounters dilute the ones that matter. Whole files when a function will do: for coding agents, pulling an entire module when the task touches one function buries the relevant code.

Item	Include?	Better alternative
This turn's task & constraints	Yes	Keep it concise
Top-k relevant retrieved facts	Yes	Rank and trim hard
Full conversation history	No	Summarize resolved threads
Every related document	No	Dedupe to the few that matter
Occasional procedures	No, in prompt	Load as a Skill on demand

Patterns that keep context lean

Three patterns do most of the heavy lifting. Summarize and compact: when a long sub-task finishes, replace its transcript with a short summary of the outcome so future turns carry the conclusion, not the deliberation. Retrieve on demand: instead of pre-loading a knowledge base, let the agent fetch the specific record it needs via a tool, so context contains only what was actually used. Externalize the durable: put recurring procedures in Skills and persistent facts in a memory store, both pulled in only when relevant. Together these keep each turn's context proportional to the work the turn is doing.

Compaction is especially powerful for long-running agents. A coding agent that has been working for an hour may have accumulated dozens of tool results and dead-ends; carrying all of that forward both costs tokens and risks the model re-anchoring on an abandoned approach. Periodically compacting the history into "here is what we have established and what remains" keeps the agent oriented on the current state of the work rather than re-living every step that got it there. The summary becomes the working memory; the raw transcript can be dropped or archived for audit without riding along in context.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Tune your context in 5 steps

Instrument token counts per turn and find the requests carrying the most context.
For the biggest ones, audit what the model actually referenced versus what you sent.
Cut stale history and redundant docs; replace resolved threads with summaries.
Move occasional procedures into Skills and durable facts into a memory store.
Switch pre-loaded knowledge to on-demand retrieval with aggressive top-k trimming, then re-run your evals to confirm quality held or improved.

Common pitfalls

Treating the big window as free. A 1M-token window is a ceiling, not a target. Relevance still rules.
Never compacting history. Letting transcripts accumulate degrades both cost and reasoning. Summarize resolved threads.
Including all retrieval hits. The long tail of a retrieval result is mostly noise. Trim to the top few.
Stuffing every rule into the prompt. Rules for situations that never occur dilute the ones that do. Scope instructions to the agent's real job.
Sending whole files to coding agents. Provide the relevant function or section, not the module, so the signal stays on top.

Frequently asked questions

Doesn't a 1M-token context window mean I can stop worrying about this?

No. A larger window raises what you can include, but the model still attends across everything you put in, and irrelevant material still competes for attention and costs tokens. Curate regardless of window size.

How do I decide between retrieval and pre-loading?

Pre-load only small, stable, always-needed material (it also caches well). Use retrieval for anything large or situational, so context contains the specific facts a turn used rather than the whole corpus.

What's the difference between context, Skills, and memory?

Context is what is in the window this turn. Skills are procedures Claude loads on demand when relevant. Memory is a persistent store of facts the harness queries and selectively injects. Skills and memory exist so context can stay lean.

How do I know if I'm including too much?

Trim and measure. If cutting stale history or redundant documents leaves eval scores flat or improved while lowering cost and latency, that material was noise. Make trimming an ongoing habit, not a one-time cleanup.

Bringing agentic AI to your phone lines

CallSphere uses disciplined context design to keep voice and chat agents fast and accurate on every call — pulling exactly the customer facts a moment needs and nothing more. Experience lean, sharp agents at callsphere.ai.

Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.

Claude Context Design: What to Include and Omit

Key takeaways

Why more context can hurt

A working definition of context engineering

What to include

What to leave out

Patterns that keep context lean

Tune your context in 5 steps

Common pitfalls

Frequently asked questions

Doesn't a 1M-token context window mean I can stop worrying about this?

How do I decide between retrieval and pre-loading?

What's the difference between context, Skills, and memory?

How do I know if I'm including too much?

Bringing agentic AI to your phone lines

Try CallSphere AI Voice Agents

Related Articles You May Like

Where Claude Cowork is heading and how to prepare

Where Claude Code GTM engineering is heading next

Measuring Claude Cowork success: metrics that prove it

How to measure success of Claude Code GTM workflows

Claude Cowork walkthrough: from problem to shipped

End-to-end Claude Code GTM workflow: a real rebuild