Context Design for Claude Managed Agents: What to Include
What to put in a Claude managed agent's context and what to leave out: ordering, just-in-time retrieval, compression, and untrusted-content safety.
Two managed agents can have identical tools and identical models and behave completely differently, because one was handed a clean, deliberate context and the other was handed a junk drawer. Context design is the most underrated lever in agent engineering: what you include shapes what the model attends to, what you exclude protects its focus, and both directly move your cost and reliability. This post is about the editorial decisions — what earns a place in the agent's context, what gets cut, and the reasoning behind each call.
Key takeaways
- Context is a budget the model re-reads every turn — include only what the next decision needs.
- Put stable instructions first for caching; put volatile observations last.
- Inject just-in-time, not just-in-case: pull reference material only when the task calls for it.
- Compress acted-on observations into one-line summaries to keep the goal in focus.
- Untrusted external content in context is an attack surface — fence it and never let it grant authority.
The mental model: context is re-read every turn
The core fact that should drive every context decision is that an agent re-processes its entire context on every single turn. A document you dropped in at step 2 is still being read at step 20. This reframes the question from "could this be useful?" to "is this worth paying for on every remaining turn?" Most things that pass the first test fail the second. The discipline of context design is asking the second question relentlessly.
This is also why bigger context windows do not eliminate the work. A million tokens of capacity does not make a million tokens of content free or focused — it just delays the point at which sprawl becomes painful. The agents that stay reliable over long runs are the ones whose context stays lean by design.
What earns a place in context
Three things almost always belong: the goal, the operating instructions, and the recent decision-relevant observations. The goal anchors everything. The instructions define role, tools, output format, and stop conditions. The recent observations are the evidence the next decision rests on. Beyond these, every inclusion should justify itself against the per-turn re-read cost.
flowchart TD
A["New turn"] --> B["Stable system instructions"]
B --> C["Task goal & constraints"]
C --> D{"Reference needed now?"}
D -->|Yes| E["Inject just-in-time doc"]
D -->|No| F["Skip it"]
E --> G["Recent observations only"]
F --> G
G --> H["Model decides next action"]
Notice the ordering: stable content first, volatile content last. This is not cosmetic. Putting the unchanging prefix first lets prompt caching reuse it across turns, while the changing tail at the end is the only part that must be reprocessed fresh. Order context by how often it changes, stable to volatile.
What to leave out, and why
The hardest discipline is excluding things that feel helpful. Whole API references, entire files when three functions matter, full conversation history once it is resolved, speculative "background" the task may never touch — all of it dilutes attention and inflates cost. The model's attention is a shared pool; every irrelevant token in context competes with the goal for that pool. Leaving things out is not neglect, it is focus.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
A useful heuristic: if you cannot name the specific decision a piece of context informs, cut it. "The agent might need to know our refund policy" is not a reason; "the agent is deciding whether this refund is allowed" is. Tie every inclusion to a decision, and the junk falls away on its own.
This bites hardest with files and transcripts, because they feel authoritative and complete. Pasting an entire 800-line module so the agent "has the full picture" usually means 780 lines competing with the 20 that matter. The same goes for replaying a full prior conversation: once a sub-task is resolved, its turn-by-turn detail is dead weight. Prefer a pointer the agent can follow on demand over the whole artifact sitting in context, and you trade a small, occasional fetch for a large, constant tax.
Just-in-time over just-in-case
The best context is often not in context at all until the moment it is needed. Rather than front-loading every document the agent might consult, give it a tool to fetch reference material and let it pull the relevant piece when the task actually requires it. This keeps the baseline context small and surfaces detail only at the point of decision, when it is most useful and least likely to be stale.
This mirrors how Agent Skills work — instructions and resources load dynamically when relevant rather than sitting in context permanently. The same principle applies to your own reference data: a retrieval tool plus a lean context beats a fat context every time, because it spends tokens only on what the current step truly needs.
Compress as you go
Long runs accumulate observations, and most lose relevance once acted on. Rather than let them pile up, compress them. When the agent has read a log, found the error, and fixed it, that 200-line log should become "checked deploy log; failure was a timeout on line 142, since resolved." The decision is preserved; the bulk is gone. Do this continuously and a 40-step run keeps a context the size of a 10-step one.
Where the model itself does long work, a periodic self-summary turn — "summarize progress and open questions, then continue" — is a clean way to fold history into a compact state. The agent keeps its bearings without dragging every prior observation forward.
Treat external content as untrusted
The moment your agent reads a web page, an email, or a user-supplied file, that content is in context — and it may contain instructions aiming to hijack the agent. Context design has a security dimension: fence untrusted content clearly, label it as data not instructions, and never let anything read from the outside world grant authority or change scope. The tenant scope and tool permissions discussed in safe tool design live server-side precisely so that no string in context can override them.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Practically, keep a hard line between trusted instructions (your system prompt) and untrusted observations (anything fetched). The model should treat fetched content as information to reason about, never as commands to obey. Designing context with that boundary in mind is what keeps a curious agent from being a compromised one.
A concrete tactic: wrap fetched content in an explicit delimiter and a one-line caption telling the model what it is and how to treat it — for example, marking a block as untrusted page text to be summarized, not followed. This does not make injection impossible, but it raises the bar and makes your intent legible to the model. The deeper defense, though, is architectural: because authority lives server-side in tool permissions and tenant scope, even an agent that gets talked into trying something harmful is stopped at the tool boundary. Context fencing reduces how often that happens; the boundary determines what it costs when it does.
Design your context in 5 steps
- Write the goal and stable instructions first; place them at the front for caching.
- For every other candidate, name the decision it informs — cut anything you cannot.
- Replace front-loaded reference dumps with a just-in-time retrieval tool.
- Add a compression rule that turns acted-on observations into one-line summaries.
- Fence all externally fetched content as untrusted data, never as instructions.
Include or exclude?
| Content | Default | Why |
|---|---|---|
| Goal & constraints | Include | Anchors every decision |
| Full API reference | Exclude | Fetch just-in-time instead |
| Resolved observations | Compress | Keep the decision, drop the bulk |
| Speculative background | Exclude | No named decision it serves |
| Fetched web/email content | Fence | Untrusted; data not instructions |
Frequently asked questions
What is context design for an agent, in one line?
Context design is the deliberate choice of what information the agent carries into each turn — the goal, instructions, and just enough evidence — so the model attends to the right things at the lowest cost.
Why does a large context window not solve this?
Because the agent re-reads its full context every turn, and irrelevant tokens compete with the goal for the model's attention. A big window delays sprawl but does not make included content free or focused.
When should I inject reference material?
Just in time, via a retrieval tool, at the moment the task needs it — not just in case up front. This keeps baseline context lean and surfaces detail exactly where the decision is made.
How do I keep fetched content from hijacking the agent?
Fence it as untrusted data, label it clearly as information rather than instructions, and keep all authority — tenant scope, tool permissions — server-side so no string in context can override it.
Bringing agentic AI to your phone lines
CallSphere brings disciplined context design to voice and chat agents that answer every call, pull the right information mid-conversation, and book work 24/7 without losing the thread. See it live at callsphere.ai.
Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.