MCP Context Design: What to Put In, What to Leave Out
Design prompt and context for MCP-powered Claude agents — curate tools, retrieve just-in-time, compact history, and keep context lean for sharper reasoning.
Most teams building on MCP optimize the wrong thing. They obsess over adding capabilities — more servers, more tools, more resources pulled into context — and the agent gets worse, not better. The hard skill in agentic engineering is not putting things into context; it is deciding what to leave out. A model's context window is a scarce, shared resource, and every token you spend on a tool the task does not need is a token stolen from reasoning. This post is about context design as an act of subtraction.
The stakes are concrete. Crowd the context with forty tool descriptions and the model spends reasoning on tool selection and sometimes picks wrong. Dump a 4,000-row resource in and the actual question gets buried. Good context design is what separates an agent that stays sharp across a long task from one that degrades as the window fills.
Context is a budget, and you are the one spending it
Effective context engineering means curating the smallest set of high-signal tokens — tool descriptions, resources, instructions, and conversation history — that lets Claude complete the task well. Think of the window as a budget you allocate, not a bucket you fill. Every tool you connect adds its description. Every resource you attach adds its content. Every turn of conversation accumulates. Left unmanaged, these costs compound until the model is reasoning over mostly noise.
The discipline is to ask, for each thing you might add: does the model need this to take the next action? If a tool will not plausibly be called for this task, do not connect its server. If a resource is reference material the model rarely consults, make it pullable on demand rather than always-present. The goal is high signal density — a context where almost everything present is relevant — because that is the condition under which models reason best.
What belongs in context, and what should be retrievable
Some things earn a permanent place. The system instructions that define the agent's role and constraints belong in context always — they are short and they govern everything. The tools for the current task belong in context. The immediately relevant data — the specific order being discussed, the file being edited — belongs in context. These are high-signal and the model uses them every turn.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Other things should be retrievable rather than resident. A full product catalog, a large knowledge base, historical records the model occasionally needs — these are better exposed as resources or behind a search tool the model calls when the moment arrives. This is the just-in-time pattern: rather than pre-loading everything the model might conceivably want, give it the means to fetch what it needs when it needs it. The model pulls a document into context for the one turn it is relevant, then moves on, keeping the window lean the rest of the time.
flowchart TD
A["New task"] --> B{"Needed every turn?"}
B -->|Yes| C["Keep in context: role, active tools, current data"]
B -->|No| D{"Large or rarely used?"}
D -->|Yes| E["Expose as resource / search tool"]
D -->|No| F["Include only if signal is high"]
E --> G["Model retrieves just-in-time"]
C --> H["Lean, high-signal context"]
G --> HCurate tools the way you curate a team
The number of connected tools is one of the most underappreciated levers on agent quality. Each tool description is prompt the model reads and reasons over when choosing what to call. A focused set of ten well-described tools produces sharper selection than fifty, even if those fifty technically cover more. The model is choosing under uncertainty, and fewer, clearer options make for better choices.
Curate at two levels. Across servers, connect only those a task actually needs — a coding task does not need the calendar server attached. Within a server, expose only the tools that earn their place, and write their descriptions to disambiguate from each other. If two tools sound similar to the model, it will sometimes pick the wrong one, so the fix is either clearer descriptions or fewer tools. The instinct to connect everything "just in case" is exactly the instinct context design must override.
Manage the conversation, not just the prompt
Context design does not stop at the opening prompt — it continues across a long agent run as history accumulates. Tool results, intermediate reasoning, and prior turns pile up, and on a long task they can crowd out the room the model needs to keep working. The pattern is active management: summarize or compact earlier turns once they are no longer needed in full, keep the high-signal conclusions, and drop the verbose intermediate steps that led to them.
This is especially important for tool outputs. A tool that returned a large payload three turns ago has usually served its purpose — the model extracted what it needed and acted. Keeping the full payload resident wastes budget. Compacting it to a short summary of what was learned preserves the signal at a fraction of the cost. On very long tasks, the difference between an agent that finishes coherently and one that loses the thread is often just disciplined compaction of stale context.
Instructions: specific enough to guide, short enough to read
System instructions are the highest-leverage tokens in your context because they shape every turn. The mistake is treating them as a place to dump every edge case and caveat, which produces a wall of text the model weights unevenly. Better instructions are specific where specificity changes behavior and silent where it does not. State the role, the non-negotiable constraints, and the few rules that genuinely guide action, then stop.
Skills change this calculus. An Agent Skill is a folder of instructions and resources Claude loads dynamically when relevant, which means you do not have to keep every procedure resident in the base prompt. The detailed playbook for a rare task lives in a skill the model loads only when that task arises, keeping the always-on instructions lean while the depth is available on demand. This pairs naturally with MCP: tools provide the capability, skills teach the procedure, and neither has to occupy context until it is needed.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
The cost of getting it wrong
An over-stuffed context fails in ways that are easy to misdiagnose. The model picks the wrong tool — you blame the model, but the cause was fifty ambiguous options. The agent loses the thread on a long task — you blame the task's difficulty, but the cause was a window full of stale tool outputs. Answers drift off-target — you add more instructions, making it worse. Nearly every one of these is a context-design problem, and the fix is subtraction, not addition. When an agent underperforms, audit what is in its context before you reach for a bigger model.
Frequently asked questions
Why does adding more tools sometimes make an agent worse?
Because each tool's description is prompt the model reasons over when choosing what to call, and more options mean more chances to pick wrong. A focused set of well-described tools yields sharper selection than an exhaustive one. Connect what the task needs and curate the rest out, even when more tools would technically cover more cases.
What is the just-in-time context pattern?
Instead of pre-loading everything the model might want, you expose large or rarely used information behind resources or search tools and let the model retrieve it only when relevant. The model pulls a document into context for the turn it needs it, then moves on, keeping the window lean the rest of the time rather than carrying dead weight.
How do skills relate to context design?
Skills let detailed procedures live outside the always-on prompt and load dynamically when relevant. This keeps base instructions short while making depth available on demand — the rare-task playbook occupies context only when that task arises. Paired with MCP tools, skills teach the procedure while tools provide the capability, and neither crowds context until needed.
Should I keep full tool outputs in context?
Usually not for long. Once the model has extracted what it needs and acted, a large prior tool output is wasted budget. Compacting it to a short summary of what was learned preserves the signal at a fraction of the token cost, which is often what keeps a long-running agent coherent to the end.
Bringing agentic AI to your phone lines
Lean, well-curated context is exactly what keeps CallSphere's voice and chat agents fast and on-task while they use tools mid-call and book work 24/7. See disciplined context design in a live agent at callsphere.ai.
Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.