Context Engineering for Claude Agents: What to Include
Prompt and context design for Claude agents: what to include, what to leave out, summarize the past and fetch the present, and how to measure if context is right.
Every turn of an agent's loop, the Claude model sees a freshly assembled context window — and that window is the single biggest lever you have over its behavior. Get it right and the agent is sharp, cheap, and consistent. Get it wrong and the same model becomes forgetful, expensive, and prone to confidently wrong answers. This post is about the deliberate craft of deciding what goes into context, what stays out, and why those choices matter more than almost anything else in agent design.
Context engineering is the practice of deciding which instructions, facts, tool schemas, and history to place in a model's window each turn so it has exactly what it needs to act correctly — and nothing that distracts it. The mistake everyone makes early is treating context as a place to dump everything "just in case." A large window invites that, but more context is not more capability; past a point it's less.
Why more context hurts past a point
Even with a 1M-token window, an agent's attention is finite. Bury the one relevant order among two hundred lines of unrelated account history and the model is more likely to miss it, not less. Irrelevant material doesn't just waste tokens and money — it actively competes for the model's focus with the facts that matter. The best agents run on lean, high-signal context where almost everything in the window is load-bearing for the current step.
This reframes your job. You're not trying to give the model everything it might conceivably want; you're curating the smallest set of inputs that makes the next action obvious. Think like an editor, not a hoarder. Every line you can justify keeping earns its place; everything else is noise you're paying for in both dollars and accuracy.
The four things that belong in every window
Strip context down to its essentials and four categories survive. The stable policy: the system prompt — role, workflow, boundaries — which doesn't change per task. The relevant tools: schemas for the tools this step might need, not your entire catalog. The recent working state: what the agent has established and the results of its last few actions. And the task-specific facts: the actual ticket, the actual customer record, pulled in fresh rather than baked into the prompt.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
flowchart TD
A["New turn begins"] --> B["Include stable policy prompt"]
B --> C["Add only relevant tool schemas"]
C --> D["Add recent working state"]
D --> E{"Window getting full?"}
E -->|Yes| F["Compact old turns, drop stale results"]
E -->|No| G["Pull task-specific facts via tools"]
F --> G
G --> H["Model chooses next action"]
The compaction branch is where long-running agents live or die. Notice it triggers before facts are pulled, not after — you make room first, then add the fresh, high-value material. An agent that compacts reactively, after it's already overrun, has usually lost the thread by then. Manage the budget proactively and the recent, sharpest context always wins the room it needs.
What to deliberately leave out
Just as important as what you include is what you exclude. Leave out raw tool dumps — summarize a search result to its top few relevant fields rather than pasting the full payload. Leave out resolved sub-tasks once their conclusion is captured; the model needs the answer, not the transcript that produced it. Leave out tool schemas for capabilities irrelevant to the current phase. And leave out volatile data that belongs behind a tool: today's inventory, current pricing, live account status — fetch these when needed rather than freezing a stale snapshot into the prompt.
The discipline is to ask, of every block, "does the model need this to choose its next action?" If the answer is "maybe later," the answer for now is no — pull it later when it becomes relevant. This is the progressive-disclosure principle applied to context: keep the default lean and load detail on demand, often through skills that activate only when their trigger fires.
Summarize the past, fetch the present
A durable rule of thumb separates two kinds of information. The past — what the agent has already done and learned this run — should be summarized and carried forward compactly, because the conclusions matter and the step-by-step does not. The present — facts about the live world — should never be baked into context at all; it should be fetched through tools at the moment it's needed, so it's always current.
Mixing these up causes two classic failures. Carrying full transcripts forward instead of summaries blows the budget and drowns the signal. Baking live facts into the prompt instead of fetching them makes the agent confidently quote stale data. Keep the rule clean — summarize the past, fetch the present — and a whole category of subtle production bugs simply doesn't occur.
Designing context for multi-agent runs
When you delegate to a subagent, context design becomes a handoff problem. The subagent should receive a tight brief — its specific task and just the facts it needs — not the parent's entire window. And it should return a compact summary, not its full transcript, so the parent's context stays clean. This is precisely why subagents help with large tasks: each gets a fresh, focused window instead of inheriting the parent's accumulated clutter.
Design these briefs and summaries explicitly. A vague handoff ("continue the analysis") forces the subagent to reconstruct context it doesn't have; a precise one ("validate these five records against this rule, return pass/fail with reasons") gives it a clean window aimed at a clear deliverable. The quality of your inter-agent context contracts largely determines whether multi-agent runs are worth their extra token cost.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Measuring whether your context is right
You don't have to guess. Read the traces of real runs and watch for tells: the agent re-fetching something it already had means your working state isn't being carried; the agent ignoring a fact that was in the window means it was buried in noise; cost climbing across a long run means compaction isn't keeping up. Each symptom maps to a specific context fix. Over time, tuning context becomes the highest-leverage maintenance you do — a leaner window is usually both cheaper and more accurate, which is a rare two-for-one in engineering.
Frequently asked questions
If the window is 1M tokens, why not just include everything?
Because attention is finite even when the window is large. Irrelevant context competes with relevant context for the model's focus, so padding the window tends to make the agent miss the fact that mattered. It also costs more every turn. Lean, high-signal context outperforms a stuffed window on both accuracy and price.
How do I decide what to summarize versus keep verbatim?
Keep verbatim what the model must reason over precisely right now — the current ticket, the exact policy in play. Summarize what's settled — earlier steps, resolved sub-tasks, old tool results — because the conclusion matters and the detail doesn't. As material ages from "active" to "established," move it from verbatim to summary.
What goes in the prompt versus fetched through a tool?
Stable behavior and policy go in the prompt; volatile facts come through tools. If a value can change between runs — pricing, inventory, account status — fetch it live so the agent never quotes stale data. If it's a rule that holds across tasks, bake it into the prompt where it's always available.
How is context engineering different from prompt engineering?
Prompt engineering is crafting the stable instructions; context engineering is managing the full, dynamic window every turn — instructions plus tools plus history plus fetched facts — including what to compact and drop. Prompt design is a subset. In long-running agents, the dynamic curation of context matters at least as much as the wording of the prompt.
Bringing agentic AI to your phone lines
CallSphere applies this same context discipline — lean windows, summarized history, live facts fetched mid-call — to voice and chat agents that answer every call and message, use tools in real time, and book work 24/7. See it live at callsphere.ai.
Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.