Prompt and context design for Claude Code agents (Built With Opus Hackathon)

The single best predictor of agent quality at a Built-with-Opus hackathon was not the cleverness of the idea or the number of tools — it was whether the team had disciplined opinions about what went into the model's context. Every other facet builds on this one. This post is specifically about prompt and context design: what to include, what to deliberately exclude, and the reasoning behind each call.

Context engineering is the practice of deciding what information an agent sees on each turn so the model's attention is spent on what currently matters. It is a design discipline, not a dumping ground, and treating it that way is what separates an agent that stays sharp over twenty turns from one that degrades into confused noise.

The core principle: relevance per token

Opus 4.8 has a very large context window, and the instinct is to use all of it. That instinct is wrong. The model's attention is finite even when the window is huge; every token you add competes with every other token. The principle we kept returning to was relevance per token — each thing in context should earn its place by being likely to matter for the next decision. A context packed with marginally relevant material reasons worse than a lean one, even though it nominally "knows more."

This reframes the job. You are not trying to give the model everything; you are trying to give it the smallest set of things that lets it act correctly right now. Anything else is noise that dilutes attention and, on long tasks, actively degrades output. Curation is the work.

What belongs in context

Four categories almost always earn their place. First, the goal — the user's actual objective, stated plainly and pinned so it never falls out of the window. Agents that lose the goal mid-task are agents whose goal wasn't pinned. Second, the active constraints — the small set of rules that govern the current decision. Third, the recent, relevant tool results — the data the model needs for the step it's on. Fourth, the current plan and current step, so the model has a backbone and knows where it is.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

flowchart TD
  A["New turn begins"] --> B{"Is item relevant to next decision?"}
  B -->|Goal or active constraint| C["Pin: always include"]
  B -->|Recent tool result| D["Keep verbatim a few turns"]
  B -->|Old detail| E["Summarize to one line, archive"]
  B -->|Noise / boilerplate| F["Exclude entirely"]
  C --> G["Assemble lean context"]
  D --> G
  E --> G
  G --> H["Send to model, act"]

Notice that even the "keep" items have lifespans. A tool result that mattered three turns ago usually doesn't matter now; let it age out into a one-line summary. The context should reflect the present state of the task, not its entire history.

What to leave out — and why

Several things tempt their way into context and should be kept out. Raw, unfiltered tool dumps — a thousand-line API response when the model needs three fields — should be parsed down before they ever enter the window; carry the three fields, drop the rest. Resolved history — the back-and-forth of a sub-task that's finished — should collapse to its conclusion. Static reference material that applies only sometimes belongs in a skill loaded on demand, not in the permanent prompt. And credentials and secrets must never be in context at all, for security and because they are pure noise to the model's reasoning.

The why behind all of this is the same: each excluded item is attention reclaimed for the decision at hand. Excluding well is harder than including, because it requires a judgment about relevance — but that judgment is precisely the skill. When in doubt, the heuristic that served us was: if removing it wouldn't change the model's next action, remove it.

Designing the prompt layers

Context isn't flat; it has layers with different lifespans, and good design assigns each piece to the right one. The system layer holds identity and immutable rules — small and stable. The task layer holds the pinned goal and plan — set once per task. The working layer holds recent results — churning every few turns. The skill layer is injected and removed as relevance shifts. Thinking in layers makes the trim function obvious: you know exactly what's allowed to age out (the working layer) and what must never move (the system and task layers).

This layering also makes debugging tractable. When an agent misbehaves, you can ask which layer failed — was the rule missing from the system layer, the goal lost from the task layer, or the wrong skill loaded? Each has a different fix, and the layer structure tells you where to look.

Compression as a first-class step

The highest-leverage piece of code in a context-engineered agent is the compression step that runs before each model call. It summarizes aging working-layer items into terse digests and drops anything past its useful life. The art is summarizing without losing the load-bearing detail — keep the fact the later steps will need, discard the narrative around it. A good compression step is the difference between an agent that holds coherence across a long task and one that, by turn fifteen, is reasoning over a soup of half-relevant fragments.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Importantly, compression should be lossy by design but recoverable in principle: if the model later needs a detail you archived, it can re-fetch the source through a tool. You're not destroying information, you're moving it out of the expensive live window and keeping a path back to it. That combination — aggressive compression plus on-demand retrieval — is what lets a lean context scale to genuinely long, complex tasks.

Frequently asked questions

If Opus 4.8 has a huge context window, why curate at all?

Because window size and effective attention aren't the same. A model reasons best over a focused, relevant context; padding it with marginal material measurably degrades decisions on long tasks. The window is a ceiling, not a target.

What's the single most common context mistake?

Dumping raw tool output straight into the transcript. Parse it down to the fields that matter before it enters context — otherwise you fill the window with noise the model has to wade through on every subsequent turn.

How do skills relate to context design?

Skills are how you keep specialized instructions out of the permanent context until they're needed. Anything that applies only to some tasks belongs in a skill that loads on demand, keeping the base prompt lean and focused.

How aggressive should compression be?

Aggressive, as long as retrieval is available. Summarize aging results down to the load-bearing facts and drop the rest, trusting the agent to re-fetch from source if a detail turns out to matter. Lean-plus-retrievable beats comprehensive-but-bloated.

Bringing agentic AI to your phone lines

CallSphere applies this context discipline to voice and chat — keeping each conversational agent focused on the caller's goal while pulling exactly the data it needs, mid-call, and nothing it doesn't. Hear it at callsphere.ai.

Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.

Prompt and context design for Claude Code agents (Built With Opus Hackathon)

The core principle: relevance per token

What belongs in context

What to leave out — and why

Designing the prompt layers

Compression as a first-class step

Frequently asked questions

If Opus 4.8 has a huge context window, why curate at all?

What's the single most common context mistake?

How do skills relate to context design?

How aggressive should compression be?

Bringing agentic AI to your phone lines

Try CallSphere AI Voice Agents

Related Articles You May Like

Where Claude Cowork is heading and how to prepare

Where Claude Code GTM engineering is heading next

Measuring Claude Cowork success: metrics that prove it

How to measure success of Claude Code GTM workflows

Claude Cowork walkthrough: from problem to shipped

End-to-end Claude Code GTM workflow: a real rebuild