Skip to content
Agentic AI
Agentic AI6 min read0 views

Context engineering for Claude Code agents

What to put in Claude Code context and what to leave out: durable vs disposable facts, just-in-time loading, fighting context rot, and subagent isolation.

Most failures I see in agentic workflows aren't reasoning failures — they're context failures. The model was given too much, too little, or the wrong thing at the wrong time. A 1M-token window tempts engineers to treat context as free, but context is the most precious resource in an agentic system, and how you curate it determines whether the agent is sharp or scattered. This post is about context engineering for Claude Code: the deliberate practice of deciding what enters the model's working memory, when, and why.

The framing that helps is to treat the context window like a working desk, not a filing cabinet. A desk holds what you need for the task in front of you; a cabinet holds everything you might ever need, filed away until called for. Confuse the two and the desk becomes unusable.

Durable versus disposable: the first cut

Every candidate fact splits into two kinds. Durable facts are true across many tasks: the stack, the conventions, the protected directories, the test command. Disposable facts are relevant to one task only: this ticket's details, this file's contents, this query's result. The first discipline of context engineering is keeping these in different places — durable facts in standing memory, disposable facts loaded fresh per task and discarded after.

Context engineering is the practice of deciding what information enters a model's context window, in what form, and at what moment, so the model has exactly what it needs and little else. When durable and disposable mix, you get the worst of both: standing context bloats with stale task details, and task prompts get cluttered with facts that should have been ambient. Make the cut cleanly and everything downstream gets easier.

Just-in-time beats just-in-case

The dominant anti-pattern is just-in-case loading — pulling in documentation, schemas, and examples up front because the model might need them. It almost always costs more than it returns. The better default is just-in-time: load the minimal index, and let the model pull detail into context only when the task actually reaches for it.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
flowchart TD
  A["Task starts"] --> B["Load durable memory only"]
  B --> C{"Need detail?"}
  C -->|No| D["Act with what's loaded"]
  C -->|Yes| E["Pull just-in-time: skill or file"]
  E --> F["Use it for this step"]
  F --> G{"Context getting heavy?"}
  G -->|Yes| H["Delegate to fresh-context subagent"]
  G -->|No| C
  H --> I["Receive distilled result"]

Skills are the mechanism that makes just-in-time practical. Their one-line descriptions are the index; their bodies are the detail loaded only on a relevant trigger. The same logic applies to files and data: prefer giving the model a tool to fetch the specific record over pasting a whole table into the prompt. The model fetching exactly what it needs, when it needs it, keeps the desk clear.

What to leave out — and why it helps

Counterintuitively, removing information often improves results. Irrelevant context isn't neutral; it actively dilutes the model's attention and can pull reasoning toward tangents. If the task is to fix a payment bug, the marketing copy guidelines in your memory file aren't just wasted tokens — they're a small but real distraction the model has to filter past on every turn.

So the question for any candidate context isn't "could this ever be useful?" — almost anything could — but "is this useful for the class of task at hand, more often than not?" If the honest answer is no, leave it out and make it loadable on demand instead. This is the hardest discipline because it runs against the instinct to be thorough. Being thorough with context is precisely what degrades it.

Context rot and how to fight it

Long-running sessions accumulate cruft: stale tool outputs, abandoned approaches, files read for a step that's long finished. This is context rot — the gradual filling of the window with material that no longer serves the current goal, crowding out what does. Even within a large window, rot degrades focus well before you hit the token ceiling.

The defenses are structural. Use subagents to keep heavy, exploratory work out of the main transcript so the orchestrator never accumulates the noise of a big search. Summarize and checkpoint at natural boundaries, carrying forward conclusions rather than every intermediate step. And design tasks to be bounded where you can, so a session ends before rot sets in. Treat the main context as something to actively defend, not a place where everything piles up by default.

Subagent isolation as a context tool

The most powerful context-engineering move is delegation, precisely because each subagent gets a fresh, clean window. When you hand a subagent a bounded brief, all of its reading, dead ends, and intermediate reasoning stay in its context, and only a distilled result comes back. The orchestrator's desk stays clear while real work happens elsewhere.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

This is why subagents are a context strategy as much as a parallelism strategy. The cost discipline still applies — multi-agent runs use several times more tokens — so reserve delegation for work that's genuinely heavy or independent. But when a task threatens to flood the main context with material the orchestrator doesn't need to retain, isolating it in a subagent is often the cleanest fix available. You trade tokens for focus, and for the right tasks that trade is well worth making.

Frequently asked questions

If the context window is 1M tokens, why not just load everything?

Because attention and focus degrade well before the token ceiling. Irrelevant context dilutes the model's attention and invites tangents, and long sessions accumulate rot. A large window is room to maneuver, not a budget to fill; the goal is the least context that fully serves the task.

How do I know what counts as durable context?

Ask whether the fact is true across most tasks in the project. The stack, conventions, protected paths, and test commands usually are, so they belong in standing memory. Anything tied to a single ticket, file, or query is disposable and should be loaded per task and discarded after.

When should I summarize or hand off to a subagent?

Summarize at natural boundaries when the transcript is filling with finished work, carrying forward conclusions instead of every step. Hand off to a subagent when a sub-task is heavy or exploratory enough that its intermediate context would clog the orchestrator — its isolated window keeps your main context clean.

Bringing sharp context to your phone lines

CallSphere applies the same context discipline to voice and chat: agents that load just what each conversation needs, keep their focus across a call, and act without drowning in irrelevant detail. See it live at callsphere.ai.


Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.