Context Design for Claude Cowork in the Enterprise
Prompt and context design for enterprise Claude Cowork: what to put in context, what to retrieve on demand, and how to cache the stable prefix.
Every enterprise Claude Cowork deployment eventually hits the same wall: the agent has access to too much, knows too much, and is asked to hold all of it in its head at once. Accuracy drops, cost climbs, and behavior gets erratic. The cure is not a bigger model or a longer prompt. It is deliberate context design — being intentional about what enters the model's working window, what stays out, and why. This is the skill that most separates teams whose agents are reliable from teams whose agents are merely impressive in demos.
This post is about context as a designed resource. I will be concrete about what belongs in context, what should be retrieved on demand instead, and the failure modes you get when you confuse the two.
Key takeaways
- Context is a budget, not a bucket — every token you add competes for the model's attention.
- Put stable, decision-shaping material in context: policies, the current task, and the procedure. Leave bulk data for on-demand retrieval.
- Lazy-loaded skills and tool schemas keep the window lean and the planner sharp.
- Too much context degrades reasoning as surely as too little — irrelevant text is a distractor.
- Design context in layers by stability so you can cache the stable prefix and pay for it once.
Why more context makes agents worse, not better
It is tempting to treat a large context window as a reason to dump everything in: all the policies, the whole knowledge base, every past transcript. The instinct is wrong. A model's attention is finite even when its window is large, and every irrelevant passage is a distractor that competes with the signal. Dump the entire HR handbook into a reconciliation task and you have not helped the model; you have buried the three lines that mattered under thousands that did not.
There is a measurable version of this effect that engineers underestimate. As irrelevant context grows, the model spends more of its effective attention discriminating signal from noise, and accuracy on the actual task degrades — not catastrophically, but enough to turn a reliable workflow into an occasionally-wrong one. Combine that with the linear cost of every extra token on every turn, and overfilling context is the rare decision that is simultaneously worse for quality and worse for the bill. Lean context is not a compromise between accuracy and cost; it improves both at once.
Context design is the practice of deciding what an agent should hold in its working window versus what it should retrieve or compute on demand. The right answer is almost always less than you think. The art is in choosing the small set of material that actually shapes the next decision and trusting tools and retrieval for the rest.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
What belongs in context
Three kinds of material earn a permanent place in the working window. First, the current task — the specific request and the data it directly operates on. Second, the procedure — the skill instructions for how to do this task. Third, the governing policy — the rules the agent must not violate, like spend thresholds or tone requirements. These three shape every decision the agent makes, so they pay for their tokens.
flowchart TD
A["Incoming request"] --> B{"Does it shape the next decision?"}
B -->|Yes: task, policy, procedure| C["Load into context"]
B -->|No: bulk data, archives| D["Leave out, retrieve on demand"]
C --> E["Agent plans with lean context"]
D --> F["Tool/retrieval fetches just-in-time"]
E --> G{"Need a fact not in context?"}
G -->|Yes| F
G -->|No| H["Answer or act"]
F --> HThe decision rule in the diagram is the whole discipline: if a piece of information shapes the next decision, it goes in context; if it is bulk reference the agent might occasionally need, it stays out and gets retrieved on demand. A skill's trigger description and a tool's schema are themselves examples of just-in-time context — they enter the window only when relevant.
What to leave out
Leave out anything large, anything rarely relevant, and anything the agent can fetch precisely when needed. The full customer database does not belong in context; a tool that looks up one customer does. Last quarter's hundred transcripts do not belong in context; a retrieval step that surfaces the two relevant ones does. Reference documents the agent reads once in a while belong behind a tool, not in the prompt.
The general principle: prefer a tool call that fetches the exact slice over preloading the whole corpus. This keeps the window small, keeps the model focused, and as a bonus keeps cost down because you are not re-sending megabytes of unused text on every turn.
| Material | In context? | Why |
|---|---|---|
| Current request & its data | Yes | Shapes every step |
| Skill procedure | Yes (lazy) | Loads when task matches |
| Spend / policy rules | Yes | Guards every action |
| Full database | No | Fetch one record via tool |
| Document archive | No | Retrieve the relevant slice |
| Old transcripts | No | Summarize or skip |
Layer context by stability for caching
Beyond what goes in, how you order it matters for cost. Arrange context so the stable parts come first — system instructions, governing policy, the procedure — and the volatile per-run data comes last. Because the stable prefix does not change between runs, it can be served from prompt cache, and you pay the full price for it once rather than on every call. Put a timestamp or the task data at the front and you invalidate the cache every time, paying full freight repeatedly.
This is a concrete lever: a finance team running the same reconciliation skill fifty times a month with a large stable policy prefix can cut a real fraction of token cost simply by ordering context stable-first and letting the cache do its job.
Compaction: managing context that grows mid-run
Long tasks accumulate context — tool results, intermediate drafts, sub-agent reports — and can fill the window. The pattern is compaction: periodically replace verbose intermediate material with a tight summary that preserves the decisions and drops the noise. A sub-agent returning a one-paragraph summary instead of its full transcript is compaction in action, which is one reason isolated sub-agents keep long runs coherent.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
The judgment call in compaction is what counts as a decision worth keeping. The rule is to preserve anything that constrains future steps — the exceptions found, the choices already made, the values the rest of the task depends on — and discard the working material that produced them. The raw query that found four mismatches can go; the fact that there are four specific mismatches must stay. Done well, compaction lets a forty-step task run in roughly the context footprint of a five-step one, because the window holds conclusions rather than the path to them. Done badly, it silently drops a constraint and the agent contradicts a decision it made twenty steps earlier, which is why summaries should be explicit about what was decided, not just what happened.
Design your context in five steps
- List everything currently in your agent's prompt and label each item: shapes-decision or bulk-reference.
- Move every bulk-reference item behind a tool or retrieval step.
- Reorder what remains stable-first, volatile-last, so the prefix is cacheable.
- Convert always-on reference text into lazy-loaded skills triggered by task descriptions.
- Add compaction for long runs — summarize intermediate results and isolate large subtasks into sub-agents.
Common pitfalls
- Treating the window as storage. A large context window invites dumping. Curate; irrelevant text actively degrades reasoning.
- Preloading the knowledge base. Whole corpora belong behind retrieval. Load the exact slice the task needs, not everything.
- Volatile-first ordering. Putting timestamps or task data before stable instructions breaks caching and inflates cost.
- Always-on skills. Skills meant to trigger lazily, left always loaded, crowd the window and dull the planner.
- Never compacting. Long runs that never summarize intermediate output fill the window and lose the thread. Summarize and isolate.
Frequently asked questions
Does a 1M-token window mean I can stop worrying about context design?
No. A large window raises the ceiling but not the rule: attention is finite, irrelevant text distracts, and you pay for every token. Big windows make context design more important, not less, because the temptation to overfill is greater.
How do I decide if something belongs in context or behind a tool?
Ask whether it shapes the next decision. The current task, the procedure, and the governing policy do, so they go in context. Bulk data the agent only sometimes needs goes behind a tool that fetches the exact slice on demand.
What is prompt caching's role here?
Prompt caching lets you pay once for a stable context prefix reused across runs. Order your context stable-first so policies and procedures sit in the cacheable prefix, and keep volatile task data at the end where it cannot invalidate the cache.
How does context design relate to sub-agents?
Sub-agents are a context-design tool: they run a large subtask in an isolated window and return a compact summary, which keeps the orchestrator's context clean. Use them when a subtask would otherwise flood the main window with intermediate noise.
Bringing agentic AI to your phone lines
CallSphere applies this same context discipline to voice and chat — agents that carry only what the call needs, retrieve account details on demand, and book work without losing the thread. See lean, on-policy agents live at callsphere.ai.
Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.