Claude Code Context Patterns: Prompts, Tools, State
Reusable code-level patterns for prompts, tools, and context in Claude Code that keep long sessions and the 1M-token window fast, cheap, and coherent.
After enough hours in Claude Code, you stop thinking turn-by-turn and start thinking in patterns. The agents that stay coherent across a long session and a 1M-token window are not lucky — they are built on a handful of reusable structures for prompts, tools, and state. This post is a pattern catalog: each one is a concrete, code-level habit you can apply today, with the failure mode it prevents. Steal them, adapt them, and your long sessions will cost less and drift less.
Pattern: the stable prefix, the volatile suffix
Structure every prompt so the unchanging parts come first and the changing parts come last. System instructions, tool schemas, and durable project context belong at the front; the live conversation and fresh tool results belong at the end. The reason is mechanical: a stable prefix can be cached and reused across turns, cutting cost and latency, while a prefix that mutates every turn busts the cache and pays full price each time.
The anti-pattern is sprinkling volatile data — timestamps, request IDs, per-turn counters — into the system prompt. It feels harmless but it silently destroys cache reuse. Keep the front of your prompt boring and identical turn over turn, and push everything that changes to the tail. This single discipline is often the largest cost lever in a long session.
Pattern: just-in-time context, not just-in-case
Resist loading everything up front. The 1M-token window tempts you to dump the whole repo "in case" the agent needs it, but most of those tokens never earn their keep — they cost money, slow responses, and dilute the model's attention. Instead, give the agent the tools to fetch context when a step actually needs it: read this file now, grep for that symbol when the question arises, query the database on demand.
flowchart TD
A["Stable prefix: system + schemas + memory"] --> B["Volatile suffix: latest turn"]
B --> C{"Step needs more data?"}
C -->|Yes| D["Fetch just-in-time via tool"] --> E["Append & summarize result"]
C -->|No| F["Reason over current context"]
E --> G{"Context near budget?"}
F --> G
G -->|Yes| H["Compact: keep decisions, drop raw bytes"]
G -->|No| B
Just-in-time context keeps the window lean and the agent sharp. It also produces a natural audit trail: you can see exactly which file or query each decision rested on, instead of hoping the answer was somewhere in a giant initial dump. When you must preload — a tight refactor across known files — preload that precise slice and nothing more.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Pattern: tools that return summaries, not firehoses
Design or wrap your tools so they return distilled, structured results rather than raw firehoses. A tool that dumps a 50,000-token log into context burns budget and buries the signal. A better version filters, paginates, or summarizes server-side and returns the few hundred tokens that matter, with a way to drill deeper if needed. The agent reasons over a clean signal instead of wading through noise.
This is especially true for search and query tools. Return the top matches with enough context to act, not every row in the table. If the agent needs more, it can ask for the next page. Shaping tool output is one of the highest-leverage things you can do for long-session health, because tool results are usually the fastest-growing part of the transcript.
Pattern: externalize state instead of holding it in the window
For long, multi-step work, push durable state out of the conversation and into a place the agent can read and write — a notes file, a task list, a scratch document. The window is working memory and it gets compacted; an external file is long-term memory that persists verbatim. Have the agent record decisions and progress to that file as it goes, then read it back after a compaction to recover the full detail a summary might have lost.
This pattern turns the context window into a cache rather than a database. The agent keeps a small live set in context and treats the external file as the system of record. It is the same instinct that makes the stable-prefix pattern work: keep the volatile, expensive resource lean, and lean on durable storage for everything that must survive.
Pattern: delegate to subagents to protect the main window
When a task has a separable exploration — "figure out how the auth layer works" — hand it to a subagent with its own context window and ask for a condensed report back. The orchestrator's window never sees the dozen files the subagent read, only the few hundred tokens of conclusion. This keeps the main thread's context clean and lets independent investigations run in parallel. The cost is real — parallel agents use several times more tokens — so reserve delegation for genuinely separable work, not for tasks where the parent already has the context.
Pattern: explicit definition of done in the prompt
Every nontrivial task should carry its own acceptance criteria in the prompt: what "finished" means, which tests must pass, what must not change. Without it the agentic loop has no clean stopping condition and either quits early or wanders. With it, the loop has a target it can check against — run the tests, confirm the criteria, then stop. A crisp definition of done is the cheapest reliability upgrade you can give a long-running agent.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Frequently asked questions
What is the single biggest cost lever in a long Claude Code session?
Keeping a stable prompt prefix so it can be cached and reused across turns. System instructions, tool schemas, and durable context should never change turn-to-turn; pushing volatile data into the prefix busts the cache and pays full price every request.
Should I preload the whole repo into the 1M-token window?
Usually not. Prefer just-in-time context: give the agent tools to fetch files and run queries when a step needs them. Preloading everything wastes budget, slows responses, and dilutes attention; preload only the precise slice a known task touches.
How do I stop tool results from blowing up my context?
Shape the tools to return summaries, not firehoses — filter, paginate, or summarize server-side and return the few hundred tokens that matter, with a way to drill deeper. Tool output is the fastest-growing part of a transcript, so taming it pays off the most.
Where should long-running state live?
Externalize it to a notes or task file the agent reads and writes. The context window gets compacted and is working memory; an external file persists verbatim and serves as the system of record the agent can re-read after compaction.
Bringing agentic AI to your phone lines
CallSphere puts these very patterns to work on voice and chat — agents that keep a tight live context, fetch data just in time mid-conversation, and book work around the clock. See it live at callsphere.ai.
Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.