Context Design for Claude Agent SDK: What to Include
Design context for Claude Agent SDK agents: what to include, what to prune, just-in-time retrieval, and using subagents to keep windows lean.
Every engineer who builds with the Claude Agent SDK eventually learns the same hard lesson: more context is not better context. The instinct is to give the agent everything — the full file, the entire history, all the docs — on the theory that more information can only help. In practice, a window stuffed with marginally relevant text makes the model slower, more expensive, and measurably worse at picking out the few facts that actually matter. Context design is the discipline of deciding what the agent sees on each turn, and it is one of the highest-leverage skills in agent engineering.
Context engineering is the practice of curating exactly the information a model needs for the current step — instructions, relevant data, tool results, and memory — while deliberately excluding the rest, so the model reasons over a clean, high-signal window. Done well, it is the difference between an agent that stays sharp across forty turns and one that loses the thread after ten.
The window is a working set, not an archive
Reframe the context window as a CPU's working set rather than a hard drive. Its job is to hold what the current computation needs, not to store everything the agent has ever seen. Each turn, you assemble a fresh working set: the system prompt, the current sub-goal, the few tool results that bear on the next decision, and a compact summary of what came before. Everything else lives outside the window — in files, in a database, in a vector store — retrievable on demand but not occupying precious attention by default.
This reframing changes how you build. Instead of asking "how do I fit everything in," you ask "what is the minimum the agent needs to take the next correct step?" That question has a small answer surprisingly often, and the small answer almost always outperforms the large one.
What earns a place in the window
Four kinds of content reliably earn their seat. First, durable instructions: the system prompt's identity, rules, and definition of done — these stay resident every turn. Second, the active goal: what the agent is trying to accomplish right now, stated plainly. Third, decision-relevant results: the specific tool outputs that inform the next move, trimmed to their meaningful fields. Fourth, working memory: a compact running summary of progress, often kept in a scratchpad the agent reads back rather than a growing transcript.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
flowchart TD
A["New turn begins"] --> B["Pin durable instructions"]
B --> C["Add active goal"]
C --> D{"Need external data?"}
D -->|Yes| E["Retrieve only relevant items"]
D -->|No| F["Skip retrieval"]
E --> G["Trim results to meaningful fields"]
F --> G
G --> H["Attach compact progress summary"]
H --> I["Send curated window to model"]
Notice what is not on that list: stale tool output from ten turns ago, full file contents when a summary would do, entire conversation history verbatim, or speculative "might be useful" documentation. Each of those costs attention and budget while rarely changing the next decision. The act of leaving them out is not negligence — it is the design.
Why cramming actively hurts
It is tempting to believe extra context is harmless padding, but it is not. Models exhibit degraded retrieval when the relevant fact is buried among large volumes of irrelevant text — the signal gets diluted. Long windows also cost more per turn and run slower, and in an agent that takes many turns those costs compound. Worst of all, stale context misleads: an old tool result the agent forgot to discard can get treated as current, sending the loop down a wrong path. Cramming does not just waste resources; it degrades correctness.
This is why summarization and pruning are not optional cleanup steps but core to the loop. After a sub-task completes, collapse its detailed tool exchanges into a one-line outcome. When a file has been analyzed, keep the conclusion and drop the raw text. The agent's window should reflect the current state of the work, not the full diary of how it got there.
Retrieval as just-in-time context
The complement to pruning is retrieval: pulling information into the window precisely when it is needed and not before. Rather than front-loading a knowledge base into the system prompt, give the agent a tool to look things up and let it fetch on demand. This keeps the default window lean and lets the agent's own reasoning decide what is relevant for the task at hand. It also scales — your knowledge base can be enormous because only the handful of relevant chunks ever enter the window.
The engineering nuance is making retrieval precise. A retrieval step that dumps twenty loosely related documents into context recreates the cramming problem. Tune it to return few, highly relevant items, and have the agent summarize what it learned before moving on, so the window holds the insight rather than the raw source.
Designing context for subagents
Subagents are a context-design tool in disguise. When you spawn a child agent for a focused task, you hand it a clean, purpose-built context — just the instruction and the data that subtask needs — and it returns a condensed result rather than its entire working history. This is how multi-agent systems stay coherent: each agent operates in a tight window, and only distilled conclusions cross between them. The parent never has to carry the child's intermediate mess. Used this way, delegation is as much about managing context as about parallelism.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Frequently asked questions
Isn't a bigger context window the simple fix?
A larger window raises the ceiling but does not remove the need to curate. Burying the relevant fact among irrelevant text degrades the model's retrieval, raises cost, and slows each turn. Even with a very large window, a lean, high-signal context outperforms a stuffed one.
What should I prune first when context grows?
Stale tool results and raw source text whose conclusions you have already captured. Collapse completed sub-tasks into one-line outcomes and replace full file contents with summaries. Keep durable instructions, the active goal, and a compact progress note resident.
How does retrieval fit into context design?
It supplies just-in-time context: instead of front-loading a knowledge base, the agent fetches the few relevant items when it needs them. Tune retrieval to return few, highly relevant results, and have the agent summarize them so the window holds insight rather than raw documents.
Why do subagents help with context?
Each subagent works in its own clean window scoped to one subtask and returns only a distilled result. The parent stays free of the child's intermediate detail, so the overall system holds more total work without any single window becoming cluttered.
Bringing agentic AI to your phone lines
CallSphere designs context the same way for voice and chat — pinning the rules, fetching account details just in time, and summarizing as the conversation moves — so its agents stay sharp through long calls and book work without losing the thread. Try it at callsphere.ai.
Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.