---
title: "Context engineering for Claude: what to include and cut"
description: "Design context for Claude agents in 2026: what belongs in the window, what to leave out, just-in-time retrieval, and compaction for long, reliable runs."
canonical: https://callsphere.ai/blog/context-engineering-for-claude-what-to-include-and-cut
category: "Agentic AI"
tags: ["agentic ai", "claude", "context engineering", "prompt engineering", "rag", "agent design", "anthropic"]
author: "CallSphere Team"
published: 2026-05-26T09:32:44.000Z
updated: 2026-06-06T21:47:41.771Z
---

# Context engineering for Claude: what to include and cut

> Design context for Claude agents in 2026: what belongs in the window, what to leave out, just-in-time retrieval, and compaction for long, reliable runs.

Give engineers a million-token context window and the first instinct is to fill it. That instinct is wrong, and unlearning it is the single biggest upgrade most teams can make to their Claude agents. A model's attention is a finite resource regardless of how large the window is; the more irrelevant material you put in front of it, the harder it has to work to find what matters. Context engineering is the discipline of deciding what earns a place in the window and what is better left out, fetched on demand, or summarized away. It is prompt engineering grown up — less about clever wording, more about information architecture.

## Context is a budget, and attention is the cost

The mental shift that unlocks everything is to treat context as a budget you spend rather than a bucket you fill. Context engineering is the practice of curating exactly the right information for a model's working window so it can complete a task — and the operative word is curating. Every token you include competes for the model's attention with every other token. A focused window of two thousand relevant tokens routinely outperforms a sprawling one of fifty thousand mostly-relevant ones, because the model isn't burning effort separating signal from noise.

This is why "just add more context" so often makes agents worse. Pile in the entire knowledge base and the one fact that matters is now buried among a hundred that don't, and the model's job shifts from reasoning to searching. The large window is permission to be generous when generosity helps, not a mandate to dump everything you have. Discipline scales better than volume.

## What earns a place in the window

A few categories almost always belong in context. The task itself, stated clearly. The hard constraints the agent must respect. The specific data the current step needs — this customer's record, this file, this error. And a concise output contract. These share a property: they are relevant to the decision the model is about to make right now. Relevance to the immediate next action is the test, and most things that pass it are surprisingly compact.

```mermaid
flowchart TD
  A["Candidate information"] --> B{"Needed for the next decision?"}
  B -->|No| C["Leave out — fetch via tool if ever needed"]
  B -->|Yes| D{"Stable across the task?"}
  D -->|Yes| E["Put in system prompt"]
  D -->|No| F{"Large or noisy?"}
  F -->|Yes| G["Summarize, then include"]
  F -->|No| H["Include directly in context"]
```

The diagram is the whole method in one flow. Information that isn't needed for the next decision stays out and becomes a tool call if it ever matters. Stable, always-relevant material goes in the system prompt where it is read every turn. Large or noisy material gets summarized before inclusion so its conclusions land without its bulk. Everything else goes in directly. Running candidate information through this filter, rather than defaulting to include, is the core habit of context engineering.

## What to leave out, and why it helps

The harder discipline is exclusion. Leave out data the current step doesn't need, even when it might be relevant later — that is what tools are for, and pulling it on demand keeps the window clean until the moment it counts. Leave out verbose tool outputs once you've extracted the conclusion; the model needs "tests passed," not two hundred lines of test runner output. Leave out redundant restatements of rules the model has already internalized this session.

Leaving things out is not deprivation; it is focus. An agent that pulls exactly the file it needs when it needs it outperforms one carrying twenty files it might need, because the relevant file isn't competing with nineteen distractions. The win compounds over long sessions: a lean context keeps the model sharp on turn forty, where a bloated one has degraded into a fog the model has to wade through on every call.

## Just-in-time retrieval beats preloading

The pattern that operationalizes all of this is just-in-time retrieval: instead of front-loading data, give the agent tools to fetch it and trust it to fetch the right things at the right time. A code agent doesn't need the repository in context; it needs a read tool and the judgment to read the files the task points to. A support agent doesn't need every customer's history; it needs a lookup tool. The agent assembles its own working set dynamically, which is almost always tighter and more relevant than anything you could assemble in advance.

This also future-proofs the agent. Preloaded context goes stale the moment underlying data changes, but a tool fetches fresh data every time it's called. Just-in-time retrieval means the agent always reasons over current information, and it means adding a new data source is as simple as adding a tool rather than re-architecting what you cram into every prompt. Context becomes something the agent earns through action rather than something you guess at up front.

## Compaction: managing context over a long run

Long-running agents need a way to forget gracefully. Compaction is the technique: periodically summarize the conversation so far into a compact note and drop the verbose intermediate exchanges that produced it. The agent keeps the decisions and conclusions — "chose approach B, migration applied, two tests still failing" — while shedding the raw transcript that no longer earns its tokens. Done well, an agent can run for dozens of turns with its effective context staying lean and its reasoning staying crisp.

Compaction is a judgment call about what to preserve. Keep outcomes, open questions, and active constraints; discard the play-by-play of how you arrived at settled facts. A good compaction reads like solid meeting minutes — enough that someone joining now understands the state, without the full recording. Get this right and the length of a session stops being the enemy of the agent's reliability.

## Frequently asked questions

### If the context window is huge, why not just include everything?

Because attention is finite even when the window is large. Irrelevant tokens compete with relevant ones and force the model to search instead of reason. A focused, curated context consistently outperforms a bloated one on the same task.

### What is the simplest test for whether something belongs in context?

Ask whether it's needed for the next decision the model will make. If yes, include it; if it's stable, put it in the system prompt. If it's only maybe-relevant later, leave it out and expose a tool to fetch it when the moment comes.

### How is context engineering different from prompt engineering?

Prompt engineering optimizes the wording of a single instruction. Context engineering optimizes the entire information set the model sees across a whole task — what to include, fetch, summarize, or compact — and at agent scale it matters far more than phrasing.

### How do I keep context clean during long agent runs?

Use just-in-time retrieval so data enters context only when needed, and compact periodically — summarize settled work into a short note and drop the verbose intermediate results. Together they keep the effective window lean across dozens of turns.

## Bringing agentic AI to your phone lines

CallSphere applies rigorous context engineering to live conversations — its voice and chat agents pull just the right account detail at just the right moment and compact long calls so they stay sharp from hello to booking. Hear the difference at [callsphere.ai](https://callsphere.ai).

---

*Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.*

---

Source: https://callsphere.ai/blog/context-engineering-for-claude-what-to-include-and-cut
