---
title: "Claude Code Context Design: What to Include and Cut"
description: "Prompt and context design in Claude Code: what to put in the 1M-token window, what to leave out, and why a lean window beats a stuffed one."
canonical: https://callsphere.ai/blog/claude-code-context-design-what-to-include-and-cut
category: "Agentic AI"
tags: ["agentic ai", "claude", "claude code", "context engineering", "prompt engineering", "context window", "anthropic"]
author: "CallSphere Team"
published: 2026-04-15T09:32:44.000Z
updated: 2026-06-06T21:47:43.478Z
---

# Claude Code Context Design: What to Include and Cut

> Prompt and context design in Claude Code: what to put in the 1M-token window, what to leave out, and why a lean window beats a stuffed one.

A 1M-token context window does not make context design easier; it makes the wrong instincts more expensive. The biggest mistake engineers make with Claude Code is treating the window as a place to dump everything they might conceivably need. More context is not more intelligence — past a point, it is noise that dilutes attention, inflates cost, and slows every turn. Good context design is the discipline of deciding what earns a place in the window and what stays out. This post is about making those calls well.

## Context is a curation problem, not a capacity problem

Context engineering is the practice of deciding what information to place in a model's window so it can reason effectively. The framing that matters: you are curating, not hoarding. Every token competes for the model's attention, and the most relevant facts compete against the least relevant ones on equal footing inside the prompt. Bury the one schema the task depends on under ten files it does not, and you have made the model's job harder, not easier.

The capacity of a 1M-token window changes the math but not the principle. It means you *can* hold an entire service in context — but "can" is not "should." The right question is never "will it fit?" It is "does this token improve the next decision?" Most of the repo, most of the time, does not. Curate for the task in front of you and let the rest stay on disk where the agent can fetch it if a step needs it.

## What to put in

Include the things the agent cannot infer and the task genuinely depends on. Durable project facts come first: architecture, conventions, build and test commands, and the landmines specific to your codebase — these go in a stable memory file that survives compaction. Then the precise slice of code the task touches plus its immediate dependencies, loaded deliberately. Then a clear statement of the goal and the definition of done, so the agentic loop has a target to check against.

```mermaid
flowchart TD
  A["Candidate information"] --> B{"Can the agent infer it from code?"}
  B -->|Yes| C["Leave out: fetch on demand"]
  B -->|No| D{"Does the task depend on it?"}
  D -->|No| C
  D -->|Yes| E{"Durable or volatile?"}
  E -->|Durable| F["Stable memory prefix"]
  E -->|Volatile| G["Volatile suffix, prune later"]
  F --> H["Tight, high-signal window"]
  G --> H
```

Fresh, high-signal tool results also belong in — but only the distilled version. The top query matches, the failing test output, the one function that is misbehaving. Each of these earns its place because the next decision rests on it. The test for inclusion is simple and strict: would the agent make a worse decision without this token? If not, it stays out.

## What to leave out

Leave out anything the agent can rederive on demand. The full repository, when the task touches three files. Verbose logs the agent already extracted the answer from. Entire API responses when one field mattered. Old conversation turns whose conclusions are already captured in a summary. All of these are reachable through tools when needed, so keeping them resident is pure cost with no benefit.

Also leave out redundancy and restatement. If your memory file says tests run a certain way, do not repeat it in every prompt. If a file is already in context, do not paste it again. Redundant tokens are not free reinforcement — they crowd out signal and warm a false sense that the agent is well-informed when it is actually wading through duplicate noise. Trim aggressively; a lean window almost always outperforms a stuffed one.

## Why less context often beats more

The reason is attention. A model attends across everything in the window, and relevant tokens have to win that attention against everything else present. A focused prompt concentrates attention on what matters; a bloated one spreads it thin. This is why a tightly curated 100,000-token prompt frequently outperforms a sprawling 800,000-token one on the same task — the smaller prompt is almost all signal.

There are second-order wins too. Smaller prompts cost less and respond faster, which compounds over a long session with hundreds of turns. They are easier to cache, since a lean stable prefix changes less. And they are easier to debug: when something goes wrong, you can actually see what the agent was working from. Discipline in what you exclude pays dividends on cost, latency, quality, and your own sanity.

## Designing context for the long haul

Long sessions need a context strategy, not just a good first prompt. Plan for compaction from the start: write decisions and progress to an external notes file so the agent can recover detail a summary might lose. Keep the stable prefix genuinely stable so caching holds. Prune tool output at natural boundaries. And delegate separable exploration to subagents so their reading never lands in the main window. Context design, done for the long haul, is what keeps hour three of a session as sharp as minute one.

## Frequently asked questions

### What is context engineering?

Context engineering is the practice of deciding what information to place in a model's context window so it can reason effectively. It treats the window as a curated, finite resource — choosing high-signal, task-relevant material and excluding what the model can infer or fetch on demand.

### If the window holds 1M tokens, why not fill it?

Because every token competes for the model's attention. Filling the window with material the task does not need dilutes attention, raises cost, and slows responses. "Will it fit?" is the wrong question; "does this token improve the next decision?" is the right one.

### What is the test for whether something belongs in context?

Ask whether the agent would make a worse decision without it. If the answer is no — because it is inferable, rederivable, redundant, or already summarized — leave it out and let the agent fetch it on demand if a step ever needs it.

### How does context design hold up over a long session?

By planning for compaction: externalize decisions to a notes file, keep the stable prefix stable so caching holds, prune tool output at natural boundaries, and delegate separable exploration to subagents. That keeps the window lean and the agent sharp deep into a session.

## Bringing agentic AI to your phone lines

CallSphere applies this same context discipline to **voice and chat** — agents that keep a tight, high-signal context through a whole conversation, fetch data only when a step needs it, and book work 24/7. See it live at [callsphere.ai](https://callsphere.ai).

---

*Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.*

---

Source: https://callsphere.ai/blog/claude-code-context-design-what-to-include-and-cut