Skip to content
Agentic AI
Agentic AI6 min read0 views

Prompt and Context Design for Claude Agents in 2026

What to put in Claude's context and what to leave out — prompt and context engineering for agents with Skills and MCP tools, with concrete heuristics.

Every engineer building agents eventually learns the same hard lesson: the model's failures are usually not reasoning failures, they are context failures. The agent didn't pick the wrong tool because it's dumb; it picked the wrong tool because the context was a cluttered attic where the relevant instruction was buried under three irrelevant ones. With Claude's Skills and MCP architecture, you have unusually fine control over what enters the model's working memory. This post is about using that control well — what to put in context, what to deliberately leave out, and the reasoning behind each call.

Context engineering is the discipline of curating exactly the information a model needs for the task in front of it, and nothing more. It is the highest-leverage skill in agent building, and it is mostly about subtraction.

Why more context makes agents worse

The intuition that "give the model everything, just in case" is wrong, and it's worth understanding why. Every token you add is a token the model must attend to, and irrelevant tokens are active distractors — they raise the odds of the model latching onto the wrong instruction or tool. A bloated system prompt with fifty rules dilutes the five that matter for the current task. Long, stale tool outputs left in the transcript get re-read on every turn, nudging the model toward already-completed paths.

So the goal is not a complete context; it is a sufficient one. The Skills architecture supports this directly: dormant Skills cost only their one-line descriptions, and their bodies enter context only when triggered. Your job is to design so that, at any moment, the model sees the task, the few Skills relevant to it, and the tool results it actually needs — and very little else.

The four things that belong in context

For an agentic turn, a well-curated context contains four ingredients. First, the task and its constraints, stated plainly. Second, the relevant guidance — the one or two Skills whose descriptions matched, now loaded with their procedures. Third, the tool results gathered so far, in structured form. Fourth, just enough history to maintain continuity. Everything else is a candidate for removal.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
flowchart TD
  A["Incoming turn"] --> B["Task + explicit constraints"]
  A --> C["Matched Skill procedures only"]
  A --> D["Structured tool results so far"]
  A --> E["Minimal relevant history"]
  B --> F{"Anything else proposed?"}
  C --> F
  D --> F
  E --> F
  F -->|Not task-relevant| G["Leave it out"]
  F -->|Task-relevant| H["Include, then prune next turn"]

Notice the loop at the bottom: inclusion is not permanent. A tool result that mattered three turns ago may be dead weight now. Treat context as something you continually garden, not a log you append to forever.

What to deliberately leave out

The harder discipline is exclusion. Leave out tool schemas the current task won't touch — the disclosure ladder keeps unused tools summarized rather than fully expanded for exactly this reason. Leave out completed sub-task transcripts; once a subagent has reported its conclusion, the raw exploration that produced it is noise. Leave out generic etiquette and boilerplate that doesn't change behavior. And leave out large reference documents until a branch actually needs them — reference them from a Skill so they load on demand rather than squatting in context from turn one.

A useful test for any candidate token: "If I removed this, would the model behave worse on this turn?" If you can't articulate a concrete failure it prevents, it's probably clutter. This is uncomfortable because leaving things out feels risky, but the data is consistent — leaner contexts produce sharper tool selection and fewer hallucinated steps.

Designing Skill descriptions as routing signals

Because Claude decides which Skills to load by matching their descriptions against the task, those descriptions are the most important sentences you'll write. They are routing signals, not marketing copy. A vague description ("helps with orders") matches everything and nothing; a sharp one ("check stock and decide reorder quantities for low inventory") fires precisely when relevant and stays quiet otherwise.

Write descriptions in terms of the tasks they serve, using the words an engineer or user would actually phrase the request in. Include the trigger conditions explicitly. The reward is an agent whose context stays clean because the right judgment loads at the right moment and the rest stays dormant. Bad descriptions are the most common reason an agent either ignores a Skill it should use or drags in three it shouldn't.

Pruning long-running agents with summarization

For agents that run many turns, context naturally accumulates, and the fix is deliberate compaction. Periodically replace a sprawling history with a tight summary of what was learned and decided, keeping the conclusions and dropping the exploration. A subagent pattern helps here too: a subagent burns its own context exploring, then returns only a distilled result to the orchestrator, so the main agent's context grows by a paragraph instead of a transcript. The principle is the same at every scale — carry forward conclusions, not the path that produced them.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Done well, a long-running Claude agent at turn fifty has a context not much larger than at turn five, because every turn both adds what's newly relevant and sheds what's gone stale. That steady-state leanness is what keeps quality from degrading as a task drags on.

Frequently asked questions

Isn't a 1M-token window large enough that I don't need to prune?

A large window changes the economics but not the physics. Even with room to spare, irrelevant tokens still act as distractors and still cost latency and money. Use the headroom for genuinely relevant material, not as an excuse to skip curation.

How do I decide what goes in the system prompt versus a Skill?

Put truly global, always-true rules in the system prompt and keep it short. Put task-specific procedures in Skills so they load only when relevant. If a rule applies to one kind of task, it belongs in that task's Skill, not in the always-on prompt.

What's the simplest way to tell if my context is too cluttered?

Watch for the model re-doing finished work, choosing tools unrelated to the task, or ignoring a clear instruction. Those symptoms almost always trace to distractor tokens. Prune the transcript and tighten Skill descriptions, then re-test.

Bringing agentic AI to your phone lines

CallSphere applies this same context discipline to voice and chat — agents that keep only what each conversation needs, call tools mid-call, and book real work 24/7. See it live at callsphere.ai.


Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.