---
title: "Context design for Claude Code: what to include, cut"
description: "Context engineering for Claude Code — what to put in the window, what to fetch on demand, how to fight staleness, and why structure beats volume."
canonical: https://callsphere.ai/blog/context-design-for-claude-code-what-to-include-cut
category: "Agentic AI"
tags: ["agentic ai", "claude", "claude code", "context engineering", "prompt engineering", "agent skills", "ai engineering"]
author: "CallSphere Team"
published: 2026-05-20T09:32:44.000Z
updated: 2026-06-06T21:47:42.218Z
---

# Context design for Claude Code: what to include, cut

> Context engineering for Claude Code — what to put in the window, what to fetch on demand, how to fight staleness, and why structure beats volume.

Every Claude Code session is a negotiation over a finite window. You can fit a lot — context windows now reach into the millions of tokens — but more is not the same as better. The agents that stay sharp are the ones whose context is curated, not crammed. Context design is the discipline of deciding, for each turn, exactly what the model should see and what it should not, and shaping it so the model spends its reasoning on the task rather than on sorting through clutter. It is the highest-leverage skill in agent building, and it is mostly about subtraction.

Context engineering is the practice of deliberately selecting, structuring, and pruning the information placed in a model's context window so that what's present is relevant, well-bounded, and ordered to support the current task. Hold that definition in mind; everything below is an application of it.

## Why more context quietly hurts

It is tempting to dump the whole repository, the full chat history, and every tool result into the window and let the model sort it out. In practice this degrades performance in two ways. First, relevant signal gets diluted by irrelevant text, and the model has to work harder to find what matters. Second, contradictory or stale material — an old version of a file, a superseded plan — actively misleads the model, which has no reliable way to know which copy is current.

The mental model that helps is attention budget. Every token in the window competes for the model's focus. A 2,000-token window of exactly the right material outperforms a 200,000-token window where the right material is buried. So the goal is not to maximize what you include; it is to maximize relevance density — the fraction of the window that bears on the current step.

## What belongs in the window

Four things almost always earn their place. The standing instructions (the agent's role and hard rules) belong on every turn because they govern all behavior. The current task or sub-task belongs, stated plainly. The working set — the specific files, records, or results the model needs for this exact step — belongs, freshly read so it is current. And a compact summary of progress so far belongs, so the agent remembers what it already did without re-reading every turn verbatim.

Notice what these have in common: each is directly load-bearing for the next decision. If you cannot say why a block helps the model's immediate next step, it probably should not be there. The map below shows the decision you make for every candidate piece of context.

```mermaid
flowchart TD
  A["Candidate context item"] --> B{"Needed for the next step?"}
  B -->|No| C["Leave it out"]
  B -->|Yes| D{"Stable or volatile?"}
  D -->|Stable| E["System prompt / memory"]
  D -->|Volatile| F{"Big & reusable?"}
  F -->|Yes| G["Fetch on demand via tool"]
  F -->|No| H["Inject as labeled block"]
  E --> I["Assembled, ordered context"]
  G --> I
  H --> I
```

## What to leave out — and fetch instead

The counterintuitive move is to keep large, reusable knowledge out of the window and let the agent pull it on demand. Documentation, schema references, long logs, big files — these do not need to sit in context speculatively. Give the model a tool to fetch them when a step actually requires it. This is retrieval as a deliberate act: the model decides it needs the auth module, reads it, uses it, and the window stays lean.

This is also the right home for Agent Skills. A skill is a folder of instructions and resources Claude loads only when it becomes relevant, rather than something living in the prompt full-time. The pattern generalizes: prefer just-in-time retrieval over just-in-case inclusion. The window should hold what the current step needs, and a clear map of how to fetch the rest.

## Structure beats volume

Two contexts with identical information but different structure produce different behavior. Wrap each piece in a named, bounded block. Order them predictably — instructions, task, knowledge, working set, recent results, next-step question. Keep that order stable across turns so the model learns where to look. A consistent, well-delimited layout lets the model orient in a few tokens instead of many, the same way clean markup lets a browser build a tree without guessing.

Stale data is the structural enemy. When you re-read a file the agent already edited, replace the old block rather than appending a second copy, so the window never holds two versions of the same thing. When a plan is superseded, drop it. Curating against staleness is as important as curating for relevance; a window with one current truth per fact is one the model can trust.

## Compaction as ongoing hygiene

No matter how disciplined you are, a long session accumulates history. The maintenance pattern is compaction: periodically fold completed work into a dense summary and drop the verbatim turns. Done early and routinely, this keeps relevance density high for the entire session. Done late and reluctantly, it leaves the model reasoning over a window that is mostly exhaust.

Think of context design as continuous gardening rather than a one-time setup. Before each turn you decide what to include; during the session you prune staleness and compact history; throughout, you favor fetching over hoarding. Get this rhythm right and a Claude Code agent stays as sharp on its fiftieth turn as its first.

## Frequently asked questions

### What is context engineering?

Context engineering is the practice of deliberately selecting, structuring, and pruning the information placed in a model's context window so that what's present is relevant, clearly bounded, and ordered to support the current task — maximizing relevance density rather than total volume.

### If the window is huge, why not include everything?

Because every token competes for the model's attention. Padding the window with irrelevant or stale text dilutes the signal and can actively mislead the model, so a small, curated context routinely outperforms a large, crammed one even when the large one technically contains the answer.

### What should I fetch on demand instead of pre-loading?

Large, reusable knowledge — documentation, schema references, long logs, big files, and skills. Give the agent tools to retrieve these when a step needs them, keeping the window lean and current rather than padded with material that may never be used this turn.

### How do I keep stale data out of the context?

Replace rather than append: when you re-read a file the agent changed, swap the old block for the new one so only one version is present, and drop superseded plans. Compact completed history into summaries regularly so the window holds one current truth per fact.

## Bringing agentic AI to your phone lines

Tight context design is what keeps CallSphere's **voice and chat** agents fast and accurate on live calls — pulling account details just in time, holding only what each moment needs, and booking work 24/7 without losing the thread. See it live at [callsphere.ai](https://callsphere.ai).

---

*Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.*

---

Source: https://callsphere.ai/blog/context-design-for-claude-code-what-to-include-cut