Skip to content
Agentic AI
Agentic AI6 min read0 views

Claude Cowork context design: what to include, what to cut

Prompt and context design for Claude Cowork: what to include, what to leave out, and how compaction keeps long agentic knowledge-work runs sharp and reliable.

Ask an experienced agent builder what separates a Claude Cowork workflow that hums from one that flails, and few of them will say "a cleverer prompt." Most will say context — specifically, the discipline of deciding what the model sees on each turn and, just as importantly, what it does not. The context window is finite working memory, and how you fill it determines the quality of every decision the agent makes. This post is about that craft: what to put in context, what to deliberately leave out, and the reasoning behind each call.

It is the least visible part of building with Claude and the most consequential. Two teams can have identical skills and connectors and get wildly different reliability purely because one curates context and the other floods it.

Why more context is not better context

The intuition that the model performs better with more information is wrong past a point, and that point arrives sooner than people expect. Every token you add competes for the model's attention with every other token. Bury the one relevant invoice line in a 40-page document dump and the model is more likely to miss it, not less. The goal is not maximal information; it is maximal signal — the relevant facts, cleanly presented, with the noise removed. Treat the window as a curated briefing, not a filing cabinet.

This reframes the whole job. You are not trying to give the agent everything it could conceivably need; you are trying to give it exactly what this turn requires and trusting the architecture to pull more in when a later turn requires it. Restraint is the skill.

What belongs in context on every turn

A few things earn their place reliably. The goal — what done looks like, restated compactly — anchors the agent so a long run does not drift. The relevant current state — a compacted summary of what has happened and where things stand — gives it footing without replaying every raw step. The schemas of the tools usable right now let it act, while tools irrelevant to this task stay out. And the active skill's instructions carry the specific know-how for the work at hand. That is usually enough, and its leanness is a feature.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
flowchart TD
  A["New turn begins"] --> B["Include: goal & success criteria"]
  B --> C["Include: compacted state summary"]
  C --> D["Include: only relevant tool schemas"]
  D --> E["Include: active skill instructions"]
  E --> F{"Window getting full?"}
  F -->|Yes| G["Summarize old turns, drop raw blobs"]
  F -->|No| H["Send to model for next decision"]
  G --> H

The decision point in the diagram — "window getting full?" — is where long runs are won or lost. The right move is to summarize older turns into compact state and drop the raw tool outputs that have already been digested, preserving conclusions while reclaiming space.

What to deliberately leave out

The harder discipline is exclusion. Leave out raw tool outputs once you have extracted what matters from them — keep "three discrepancies found, listed below," discard the 40-page ledger that produced it. Leave out tools the current task cannot use; their schemas are pure noise on this turn. Leave out skills that are not relevant; that is the entire point of dynamic loading. Leave out stale history that no longer affects the next decision. Each of these is a token budget you reclaim for signal, and the cumulative effect on accuracy is large.

A practical test for any piece of context: "does this change what the model should do next?" If the answer is no, it does not belong this turn. Resumes of completed sub-tasks, exploratory dead-ends, and verbose successful results all tend to fail this test and can be compacted to a sentence or dropped entirely.

Compaction: turning history into state

As a run lengthens, the architecture must convert a growing history into stable, compact state, and you can design your skills and tools to make that conversion clean. Return small, structured results so there is little to compact in the first place. Have skills periodically restate progress in a fixed format — "completed: X; pending: Y; decisions: Z" — so summarization has a clean anchor. The aim is that at any moment the agent's context reads like a sharp status memo a colleague could pick up, not a transcript no one would ever read. Well-compacted state is what lets a single agent carry a long, complex job without losing the thread.

Designing skills and tools to be context-friendly

Much of context quality is decided upstream, in how you build skills and tools rather than at runtime. A tool that returns a focused field instead of a whole record is doing context design. A skill that says "summarize your findings in three bullets before continuing" is doing context design. The patterns reinforce each other: narrow tools, small outputs, focused skills, and explicit progress restatements all conspire to keep the window clean. If you find yourself fighting context bloat at runtime, the fix usually lives one layer down, in a tool that returns too much or a skill that never summarizes.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

What to watch for

Two anti-patterns dominate. The first is the kitchen-sink prompt — pasting every policy, every example, every tool into a permanent preamble "so the agent always has it." This degrades every turn and defeats dynamic loading; move that material into skills. The second is the never-compacted run — letting raw history pile up until the model loses the goal in the noise and starts repeating itself or contradicting earlier steps. Watching for these two and correcting them resolves the large majority of context-driven failures you will encounter.

Frequently asked questions

Why does adding more context sometimes hurt performance?

Because every token competes for the model's attention. Relevant facts buried in irrelevant bulk are easier to miss, not harder. The goal is maximal signal, not maximal information — a curated briefing rather than a filing cabinet.

What should always be in the context window?

The compacted goal and success criteria, a summary of current state, the schemas of tools usable right now, and the active skill's instructions. That lean set is usually enough; the architecture pulls in more only when a later turn requires it.

How do I keep long runs from losing the thread?

Compact history into state. Summarize older turns into a fixed-format status memo, drop raw tool outputs you have already digested, and have skills restate progress periodically. Designing tools to return small results makes this compaction nearly free.

Where do most context problems actually originate?

Usually one layer down — a tool that returns whole records instead of fields, or a skill that never summarizes. If you are fighting context bloat at runtime, fix the tool output or add a progress-summary step rather than trimming by hand each run.

Sharp context, on every call

CallSphere brings the same context discipline — lean windows, compacted state, only the relevant tools — to voice and chat agents that stay on-track through long conversations, use tools when needed, and book work around the clock. See it live at callsphere.ai.

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.