Context Design for Claude Code in Large Codebases
What to put in Claude Code's context and what to leave out in large codebases: memory tiers, retrieve over paste, attention budgeting, and the dilution trap.
Every problem you have with an agent in a large codebase is, at bottom, a context problem. The agent edited the wrong file because the right one was never in context. It contradicted your conventions because they were not loaded. It got slow and expensive because someone pasted four thousand lines it did not need. Once you accept that context is the lever, the craft of working with Claude Code becomes the craft of deciding what the model sees on any given turn, and just as importantly, what it does not.
This article is about that craft. Not prompt wordsmithing, but context architecture: what belongs in always-on memory, what should be retrieved on demand, what should never be pasted at all, and why a fuller context window can produce worse answers than a leaner one. In a big repo this is the highest-skill, highest-payoff part of the job.
Context is a budget, not a bucket
The instinct when an agent gets something wrong is to give it more: more files, more explanation, more history. That instinct is often wrong. The context window is a finite attention budget, and every token you spend competes with every other token for the model's focus. A window crammed with marginally relevant code does not make the model smarter; it dilutes the signal that actually matters and the edits get sloppier, not sharper.
Treat context the way you would treat a tight memory budget on an embedded device. The question is never "can I fit this in?" but "does this earn its place?" The right slice of three files beats the full contents of ten, because the model spends its attention on signal instead of sifting noise. The 1M-token window in Claude Code is generous, but generous is not the same as free, and frugality is a feature, not a limitation.
The three tiers of context
It helps to think in tiers. Tier one is always-on memory: the stable, repo-wide facts that belong in CLAUDE.md and load every session, build commands, directory layout, conventions, do-not-touch zones. Tier two is retrieved-on-demand: the files and symbols the agent pulls in for a specific task via search and reads. Tier three is task-local pointers you supply in the prompt, the repro script path, the failing log line, the one paragraph of background the agent could not derive.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
flowchart TD
A["Task arrives"] --> B["Tier 1: always-on memory loads"]
B --> C{"Enough to act?"}
C -->|No| D["Tier 2: retrieve files via search"]
D --> E["Read scoped line ranges"]
E --> F{"Signal sufficient?"}
F -->|No, add pointer| G["Tier 3: task-local pointer"]
G --> C
F -->|Yes| H["Act with lean context"]
C -->|Yes| HThe discipline is to push each fact to the lowest tier that works. Stable facts go in tier one so you never repeat them. Anything the agent can find itself stays in tier two, retrieved fresh, never pasted. Only the things the agent genuinely cannot derive, and that this specific task needs, go in tier three. Most context bloat comes from putting tier-two material (whole files) into tier-three prompts by pasting it.
Retrieve, do not paste
The strongest single habit in large-repo context design is to point rather than paste. The agent has read tools; it can pull services/auth/session.go itself, scoped to the relevant function, far more cheaply than you can paste the whole file. When you paste, you pay the full token cost whether or not the model needed all of it, and you anchor its attention on a fixed blob instead of letting it navigate to what matters.
Pointing also keeps context current. A pasted file is a snapshot that goes stale the moment the agent edits something; a pointer resolves to the live file on every read. So instead of "here is the auth code: [4000 lines]," say "the auth flow lives in services/auth; the session validation is in session.go." The agent reads exactly what it needs, when it needs it, and your context stays lean and accurate.
What to deliberately leave out
Knowing what to exclude is as important as knowing what to include. Leave out whole-file pastes the agent can retrieve. Leave out giant command logs, surface the relevant fifty lines, not the ten-thousand-line build output. Leave out stale history once a subtask is done; a long-running session accumulates resolved detours that no longer earn their tokens. And leave out secrets, always, because anything in context can surface in a summary or a subagent hand-off.
Subagents are partly a context-exclusion mechanism, which is worth understanding. When the orchestrator delegates a search-heavy subtask, the subagent burns dozens of file reads in its own isolated window and returns only a compact summary. The orchestrator never sees the noise, just the conclusion. That is exactly the leave-it-out principle operating at the architecture level: keep the exploration mess out of the main context and pass forward only distilled signal.
The dilution trap, with a concrete example
Consider an agent asked to fix a null-pointer bug in a payment handler. Give it the failing test, the handler file, and the memory file, three focused inputs, and it finds the unchecked optional and patches it cleanly. Now give it the same task plus the entire payments package, twelve files of tangentially related code, and it more often fixes the wrong null check or proposes a sprawling refactor, because the relevant signal is now one part in twenty. Same model, same bug, worse result, purely from context dilution.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
This is the trap that catches teams who equate "more context" with "more help." The fix is to start lean and let the agent ask, via its own searches, for what it is missing, rather than front-loading everything you think might be relevant. If it needs another file, it will read one; that is cheaper and sharper than pre-loading ten. Lean-and-retrieve beats dump-and-hope in almost every large-repo task.
Frequently asked questions
Will giving Claude Code more context always improve results?
No. The context window is a finite attention budget, and irrelevant tokens dilute the signal that matters. A focused slice of the right files usually outperforms a large dump, and over-stuffing often produces sloppier edits along with higher cost and latency.
Should I paste files into the prompt or let the agent read them?
Let the agent read them. It can retrieve scoped line ranges itself more cheaply than you can paste whole files, and a pointer always resolves to the live file rather than a stale snapshot. Reserve pasting for small things the agent cannot find.
What belongs in the always-on memory file?
Stable, repo-wide facts: build and test commands, directory layout, conventions, and do-not-touch zones. These load every session so you never repeat them, while task-specific details stay in the prompt or get retrieved on demand.
How do subagents help with context design?
They isolate exploration. A subagent does the noisy, read-heavy work in its own window and returns only a compact summary, so the orchestrator's context stays lean and focused. It is the leave-it-out principle applied at the architecture level.
Lean context, live conversations
The same context discipline, load the stable facts, retrieve the rest, and leave the noise out, is how CallSphere keeps its voice and chat agents fast and accurate while they answer every call, use tools mid-conversation, and book work 24/7. See lean-context agents live at callsphere.ai.
Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.