Designing Context and Prompts for Claude Agent Skills

The hardest part of building a good agent skill isn't deciding what to include — it's deciding what to leave out. Context is a shared, finite resource, and every token you spend on instructions, references, or tool output is a token the model isn't spending on the actual task. The instinct to add "just one more rule" is how skills slowly degrade: each addition seems harmless, but the cumulative weight blunts the model's focus and dilutes the instructions that actually matter. This post is about context design as a discipline — the judgment of what earns a place in the window and what doesn't.

Context engineering is the practice of deliberately curating what an agent has in its working window at each step — instructions, tools, history, and retrieved data — so the model has exactly what it needs and not more. For skills, this is the difference between a capability that stays sharp across a long session and one that gets vaguer the more it's used.

Why more context makes agents worse

It's counterintuitive that adding relevant information can hurt, but it does. Models attend across everything in their window, and attention is a budget. Bury the three rules that matter under thirty rules that rarely apply, and the important three lose salience. Long, undifferentiated context also invites the model to latch onto the wrong detail — a stale instruction, an outdated example, a tool result from five steps ago that no longer holds.

The practical consequence is that a skill body should be ruthlessly edited. Every sentence should earn its place by changing what the model does on a meaningful fraction of runs. If a rule covers a once-in-a-hundred edge case, it doesn't belong in the body that loads every time — it belongs in a reference file the body opens only when that case appears. Treat the body like a function's hot path: keep it lean, push the rare branches elsewhere.

The three tiers of what goes in context

It helps to sort everything a skill might carry into tiers. Tier one is always-on: the core procedure and the handful of rules that apply to essentially every run. This is small by design and lives in the skill body. Tier two is on-demand: detailed references, lookup tables, long style guides — loaded only when the work reaches the branch that needs them. Tier three is per-task: the actual user input and tool results, which arrive fresh each run and should be trimmed of anything stale as the task progresses.

Designing a skill is largely the act of assigning content to the right tier. Engineers new to skills put everything in tier one and wonder why the agent feels unfocused. Experienced ones keep tier one tiny, lean hard on tier two's progressive disclosure, and actively manage tier three so old tool output doesn't pile up. The model performs best when each step sees a clean, relevant slice rather than the full accumulated history.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

flowchart TD
  A["New step in task"] --> B{"Core rule for every run?"}
  B -->|Yes| C["Keep in skill body (tier 1)"]
  B -->|No| D{"Rare or detailed?"}
  D -->|Yes| E["Move to reference file (tier 2)"]
  D -->|No| F{"Stale tool output?"}
  F -->|Yes| G["Trim from context (tier 3)"]
  F -->|No| H["Keep fresh task data"]
  C --> I["Lean, focused window"]
  E --> I
  G --> I
  H --> I

Writing instructions the model can actually follow

What you keep, write well. The model follows specific instructions far better than abstract ones. "Be professional" is nearly useless; "Use complete sentences, no exclamation marks, and never use the customer's first name" is followable and checkable. Prefer concrete rules and a single sharp example over paragraphs of philosophy. One example of the exact output you want teaches more than three explanations of the principles behind it.

Order matters too. Put the most important constraints early and state them once, clearly, rather than repeating them in slightly different words throughout — repetition with variation is a common way skill bodies bloat and contradict themselves. And be explicit about negatives. The failure modes you've actually seen in testing deserve a direct "do not" rule, because closing off a known wrong path is more valuable than describing the right one in more detail.

Managing context across a long agent run

A single prompt is easy; a long agentic session is where context design earns its keep. As the agent works, tool results and intermediate reasoning accumulate, and unmanaged, that history grows until it crowds out the task. The remedy is to treat context as something you actively prune. Summarize completed phases into a compact note and drop the raw detail. Once a validation step has passed, you don't need its full output anymore — a single line recording that it passed is enough.

For tasks that span many steps, lean on external memory rather than holding everything in the window. Write intermediate results to a file or a structured store, and pull back only what the current step needs. This keeps each step's context focused on the present, while the durable record lives outside the model. The agents that stay coherent over long, complex runs are the ones designed this way from the start — not the ones that try to keep their entire history in view and slowly lose the thread.

Measuring whether a rule earns its place

Context discipline shouldn't rest on taste alone — you can test it. When you suspect a skill body has grown bloated, run its eval set with and without a candidate rule and compare the outputs. If removing the rule changes nothing across your representative tasks, the rule wasn't earning its tokens and belongs in a reference file or in the bin. This turns "should this stay in context" from an argument into an experiment, and it's the most honest way to keep a body lean as it ages.

The same measurement habit catches the subtler failure: a rule that helps one case but quietly degrades five others by crowding the window. You only see that trade-off when you measure across a spread of tasks rather than the single example that motivated the rule. Teams that instrument their skills this way end up with noticeably tighter bodies than teams that edit by intuition, because every addition has had to prove it pulls its weight. Over a large library, that discipline compounds into agents that stay fast and focused where untested ones slowly bloat.

Knowing what to deliberately leave out

Some things should never go in context at all. Don't include raw credentials or secrets — those belong at the tool boundary, never in the prompt. Don't paste large reference documents wholesale when the model needs one section; retrieve the section. Don't carry the full conversation history into a subtask that only needs a clean instruction — a focused subagent with a minimal brief outperforms one drowning in irrelevant backstory.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

The overarching principle is that context is a design surface, not a dumping ground. Each thing you add should be a deliberate choice with a reason, and the default answer to "should this go in context" should be no until proven otherwise. Skills that follow this discipline stay sharp at their hundredth invocation and across hour-long sessions. Skills that don't degrade quietly, one well-meaning addition at a time, until no one remembers why the agent got vague. Design the context as carefully as you design the code, and the agent stays reliable under real use.

Frequently asked questions

Why can adding relevant information make an agent worse?

Because attention is a finite budget. The model attends across everything in its window, so burying the few rules that matter under many that rarely apply lowers the salience of the important ones and invites the model to latch onto stale or irrelevant detail.

What belongs in the skill body versus a reference file?

The body holds tier-one content — the core procedure and rules that apply to nearly every run — and stays small. Rare edge cases, long style guides, and lookup tables go in reference files loaded on demand, so they don't tax the context window when they aren't needed.

How do I keep a long agent run from losing the thread?

Actively prune context. Summarize completed phases into a short note and drop the raw output, and use external memory — files or a structured store — for intermediate results, pulling back only what the current step needs. This keeps each step focused on the present.

What should never go into an agent's context?

Raw credentials and secrets (those live at the tool boundary), large documents pasted wholesale when one section would do, and full conversation history forced into a subtask that only needs a clean instruction. Default to leaving things out until they prove their worth.

Bringing agentic AI to your phone lines

CallSphere applies disciplined context design to voice and chat agents that stay sharp across long calls — surfacing exactly the right playbook at the right moment and booking work without losing the thread. See it live at callsphere.ai.

Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.

Designing Context and Prompts for Claude Agent Skills

Why more context makes agents worse

The three tiers of what goes in context

Writing instructions the model can actually follow

Managing context across a long agent run

Measuring whether a rule earns its place

Knowing what to deliberately leave out

Frequently asked questions

Why can adding relevant information make an agent worse?

What belongs in the skill body versus a reference file?

How do I keep a long agent run from losing the thread?

What should never go into an agent's context?

Bringing agentic AI to your phone lines

Try CallSphere AI Voice Agents

Related Articles You May Like

Where Claude Cowork is heading and how to prepare

Where Claude Code GTM engineering is heading next

Measuring Claude Cowork success: metrics that prove it

How to measure success of Claude Code GTM workflows

Claude Cowork walkthrough: from problem to shipped

End-to-end Claude Code GTM workflow: a real rebuild