Skip to content
Agentic AI
Agentic AI8 min read0 views

Context Engineering for Claude Agents: What to Include (Building Effective AI Agents)

What to put in a Claude agent's context and what to leave out — tiered context, compaction, just-in-time retrieval — with a diagram, code, and pitfalls.

Prompt engineering taught us how to phrase a single request. Agents demand something more: deciding, on every turn, what the model should be looking at. That discipline is context engineering, and it is the skill that most separates agents that stay sharp over a long run from agents that drift, repeat themselves, and burn tokens. Claude's 1M-token window does not free you from this work — it raises the stakes, because now you can stuff everything in, and stuffing everything in is precisely the mistake.

This post is about curation. Not how to write a clever instruction, but how to choose, shape, and prune the working set the model sees each cycle. The mental model: context is the agent's short-term memory, and like any memory, it is most useful when it holds the right things and forgets the rest on purpose.

Key takeaways

  • Context engineering is deciding what the model sees each turn — include the goal, constraints, and decision-relevant facts; exclude noise.
  • More context is not better; irrelevant tokens dilute attention and degrade decisions well before the window fills.
  • Compress old tool outputs into summaries; keep only recent turns verbatim.
  • Pin the goal and hard constraints so they survive every eviction.
  • Let Skills and retrieval bring information in just-in-time rather than pre-loading everything up front.

What context engineering means

Context engineering is the practice of curating the exact set of tokens a model sees at each step so that the most decision-relevant information is present and everything else is absent. Where prompt engineering optimizes one message, context engineering optimizes a moving window across many turns. It is an ongoing operation, not a one-time setup, because what is relevant changes as the agent learns.

The reason it matters is mechanical. A model attends across everything in its window. Fill that window with stale tool dumps and the signal-to-noise ratio drops, and with it the quality of the next decision. Effective agents fight entropy actively — every turn, they ask what still earns its place.

What to include, what to leave out

A useful rule: include what changes the next decision, exclude what does not. The goal always changes the next decision, so it stays. Hard constraints stay. The most recent observations stay, because the agent is actively reasoning about them. But the raw 4,000-token output of a tool call from twelve turns ago, already summarized into one sentence of conclusion? That goes. Keeping it adds cost and dilutes attention while contributing nothing.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

The hardest discipline is leaving out things that feel safe to keep. Full chat history feels safe. Every retrieved document feels safe. But an agent does not need the transcript of how it got here — it needs the current state and the goal. Aggressively summarizing the path while preserving the conclusions is the whole game.

The context lifecycle

It helps to see context as flowing through stages each turn rather than as a static blob. New information enters, gets evaluated for relevance, either joins the working set or gets compressed, and old material ages out. The diagram below traces that lifecycle for a single turn of a long-running agent.

flowchart TD
  A["New observation arrives"] --> B{"Decision-relevant now?"}
  B -->|Yes| C["Add to recent tier verbatim"]
  B -->|No| D["Compress to one-line summary"]
  C --> E{"Recent tier over budget?"}
  E -->|Yes| F["Fold oldest into rolling summary"]
  E -->|No| G["Render: pinned goal + summary + recent"]
  D --> G
  F --> G
  G --> H["Send curated context to Claude"]

The pinned goal sits outside this churn entirely — it is never a candidate for compression or eviction. Everything else is negotiable. This is the structure that lets an agent run for fifty turns and still answer as crisply as it did on turn three.

A practical compaction routine

Compaction is the engine of context engineering. A simple, effective routine: after each turn, if the verbatim recent tier exceeds a token threshold, take its oldest entries and replace them with a single summarizing sentence appended to a rolling summary. You can even use Claude itself for the summarization with a tight instruction. Here is the shape of the policy:

def compact(ctx, max_recent=12000):
    while ctx.recent_tokens() > max_recent:
        chunk = ctx.pop_oldest_recent()
        line = summarize(chunk)        # one sentence, conclusions only
        ctx.rolling_summary.append(line)
    return ctx.render()                # pinned + summary + recent

The instruction you give summarize should demand outcomes, not narration: "State only the conclusion and any value the agent still needs, in one sentence." That keeps the summary tier dense with signal instead of becoming a second transcript.

Just-in-time over pre-loading

A common reflex is to front-load the context with everything the agent might conceivably need — the whole knowledge base, every API doc, all prior tickets. Resist it. The stronger pattern is just-in-time retrieval: keep the context lean, and let the agent pull specific information through tools or Skills exactly when a decision requires it. Skills are designed for this — they load instructions and resources dynamically only when relevant, so the base context stays small and sharp.

Ordering and placement matter

What you include is most of the battle, but where you place it matters too. Models attend unevenly across a long context, and information at the very start and very end tends to land harder than material buried in the middle. Put the goal and the most decision-critical constraints where they cannot be missed — typically pinned at the top — and keep the live, in-progress exchange at the end where the model is actively reasoning. The rolling summary sits between them. This ordering is not superstition; it is a practical response to how attention degrades over distance, and it costs nothing to get right.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

One more placement lever: structure beats prose for reference material. When you do include a set of constraints or facts, a tight bulleted or labeled block is easier for the model to use than the same content written as flowing paragraphs. Structure gives the model anchors, and anchors survive the dilution that long context inevitably brings.

Measuring whether your context is healthy

Context engineering is empirical, so instrument it. Track three numbers per run: total context tokens at each turn, the ratio of summary-tier to recent-tier tokens, and how often the agent re-retrieves something it had earlier discarded. Steadily climbing total tokens means your compaction is too gentle. A summary tier that dwarfs the recent tier means you are over-compressing and losing useful detail. Frequent re-retrieval of evicted facts means you are dropping things the task actually needs — tighten what counts as decision-relevant. These three signals turn a vague sense that "the agent feels sloppy" into a tuning loop you can act on.

Common pitfalls

  • Equating window size with how much to use. A 1M-token window is a ceiling, not a target. Use the least context that supports a good decision.
  • Keeping raw tool dumps. Large outputs from old turns are pure dilution once their conclusions are captured. Summarize and drop them.
  • Letting the goal scroll out of view. If the goal isn't pinned, a long run can push it out of effective attention. Pin it explicitly.
  • Pre-loading everything. Stuffing all possible references up front wastes tokens and buries the relevant ones. Retrieve just-in-time.
  • Summarizing with narration. Summaries that recount the story rather than the conclusion grow without adding value. Demand outcome-only summaries.

Tune your context in five steps

  1. Split your context into pinned, summary, and recent tiers in code.
  2. Pin the goal and hard constraints so they are never evicted.
  3. Set a recent-tier token budget and add a compaction routine that folds old turns into the summary.
  4. Replace any pre-loaded reference material with just-in-time retrieval tools or Skills.
  5. Log per-turn context size and watch for creep; if it climbs steadily, your compaction is too gentle.

Include or exclude?

ItemVerdictWhy
Current goalPin alwaysDrives every decision
Hard constraintsPin alwaysBounds all actions
Last 2-3 turnsKeep verbatimActive reasoning surface
Old tool dumpsCompress outConclusions already captured
Whole knowledge baseExclude, retrieveDilutes attention

Frequently asked questions

If Claude has a 1M-token window, why prune at all?

Because attention quality, latency, and cost all degrade as you fill the window with low-signal tokens. The window is the maximum you can hold, not the amount you should. Lean, relevant context consistently produces better decisions than a stuffed one.

How aggressively should I summarize old turns?

Compress to conclusions only — one sentence capturing the outcome and any value still needed. If your summaries read like a story, they are too long; if the agent later asks for something you dropped, your retrieval tools should fetch it on demand.

What is the difference between prompt and context engineering?

Prompt engineering shapes a single instruction; context engineering decides, across many turns, what the model sees at all. For agents the second dominates, because the system prompt is fixed but the working context changes every cycle and is where most behavior is won or lost.

Sharp context, live on every call

CallSphere applies this same context discipline to voice and chat agents — they hold just the right state mid-conversation, retrieve account details on demand, and stay coherent across long calls. Hear it in action at callsphere.ai.


Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.