Prompt and context design for Claude Cowork agents

There is a counterintuitive truth at the heart of building good Claude Cowork agents: the failures rarely come from the model not knowing enough. They come from the model being shown too much, or the wrong things, at the moment it needs to decide. An agent buried in stale tool output, half-relevant instructions, and a transcript of everything it has ever done will reason worse than the same model handed a clean, curated window. Getting that window right is the discipline of context engineering, and it is the highest-leverage skill in agentic work.

Context engineering is the practice of deciding, at each step of an agent's loop, exactly what information enters the model's limited context window and what stays out. This post is about doing it well in Cowork — what belongs in context, what to deliberately exclude, and the reasoning behind each call. The model is sharp; your job is to keep its attention pointed at the task.

The window is a stage, not a warehouse

The first mental shift is to stop treating context as storage. It is not a database you dump everything into for safekeeping; it is a stage, and only the actors needed for the current scene belong on it. Everything on the stage competes for the model's attention, and attention is finite. A fact that is technically present but surrounded by noise is a fact the model may overlook.

This reframes the whole design problem. Instead of asking "what might the model need?" — which leads to hoarding — you ask "what does this exact step require?" and put only that on stage. The competitor list belongs there while the agent is gathering updates; the raw HTML of every page it already read does not. Curating ruthlessly is not throwing away information; it is making the relevant information findable. A lean window consistently produces better decisions than a crammed one, and the gap widens as tasks get longer.

What belongs in context

Four things earn a permanent spot. First, the durable instructions: the agent's role, the rules it must always follow, the output contract. These are short, stable, and define correct behavior, so they stay loaded throughout. Second, the live task state: what the user actually asked, the current goal, and any decisions made so far that constrain what comes next.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

Third, the results of recent actions — but trimmed. The output of the last tool call is highly relevant; include the fields that matter and drop the rest. Fourth, the skill currently in play. Skills are loaded dynamically when a task matches, which means the procedure for this task is on stage while the procedures for unrelated tasks are not. That lazy loading is itself a context-design decision: it keeps deep, specific know-how available without paying for it when it is irrelevant.

flowchart TD
  A["New turn begins"] --> B["Keep: role, rules, output contract"]
  B --> C["Keep: current goal & live state"]
  C --> D["Add: trimmed recent tool result"]
  D --> E{"Old history relevant?"}
  E -->|No| F["Replace with running summary"]
  E -->|Yes| G["Keep concise excerpt"]
  F --> H["Load only the matching skill"]
  G --> H
  H --> I["Model decides next action"]

What to leave out, and why

The exclusions matter as much as the inclusions. Leave out raw, unprocessed tool dumps — the thousand-row table, the full document, the verbose API response. Summarize or extract first. The model rarely needs every row; it needs the digest, and feeding it the raw dump both wastes the window and invites the model to fixate on irrelevant detail.

Leave out stale history. Once a sub-task is finished and its conclusion recorded, the blow-by-blow of how it got there is dead weight. Replace it with a one-line summary of the outcome. Leave out instructions for tasks you are not doing — that is what lazy skill loading handles. And leave out anything sensitive that does not need to be there: credentials and secrets belong in connector configuration, never in the prompt, both for security and because they are pure noise to the reasoning. The guiding question for every exclusion is the same: does the model need this to decide the next action? If not, it is off the stage.

Managing context over a long run

Short tasks rarely strain the window; long ones always do. The technique that keeps a multi-hour workflow coherent is progressive summarization. As the run proceeds, periodically compress what is behind you into a compact state object: the decisions made, the facts established, the open questions remaining. Then carry that summary forward and let the verbose history fall away.

Done well, this is invisible — the agent behaves as if it remembers everything important because the summary does capture everything important, just without the bulk. The complementary move is offloading: when a sub-task is large and noisy, hand it to a sub-agent with its own fresh context, let it churn through the detail in isolation, and return only the conclusion. The main thread never sees the mess. Between summarization and offloading, a workflow can run far longer than its raw context budget would suggest, because at no single moment is everything loaded at once.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Designing context for the next decision, not the archive

The unifying principle behind all of this is to design context for the next decision, not for an archive a human might later want to read. Logging and auditability are real needs, but they belong outside the model's window — in a run trace, a log, a database — not crammed into the prompt. The prompt exists for exactly one purpose: to give the model what it needs to choose its next action well.

When you internalize that, the rules stop feeling like a checklist and start feeling obvious. Include the goal and the rules because they define correct action. Include the last result because it determines the next step. Summarize the past because the conclusions matter and the details do not. Load the skill that fits and skip the ones that do not. Every choice traces back to the same question, and asking it relentlessly is what turns a model into a dependable agent.

Frequently asked questions

Won't summarizing history lose information the agent needs later?

Only if the summary is sloppy. A good running summary captures the decisions, facts, and open questions — everything that constrains future steps — while dropping the verbose path that produced them. The art is summarizing outcomes, not erasing them.

How is context engineering different from prompt engineering?

Prompt engineering crafts a single instruction well; context engineering manages what the model sees across an entire multi-turn loop, turn after turn. In agentic work the second matters more, because the agent's behavior emerges from a sequence of contexts, not one prompt.

Where should logs and audit trails live if not in context?

Outside the window — in run traces, application logs, or a database. The model's context is for deciding the next action; audit needs are served better and more cheaply by external storage that does not consume attention.

Bringing agentic AI to your phone lines

A voice agent lives or dies on what it holds in mind mid-call — the caller's intent, the last system response, the rule it must follow. CallSphere applies this context discipline to voice and chat, with agents that answer every call, use tools as they talk, and book work around the clock. Try it at callsphere.ai.

Prompt and context design for Claude Cowork agents

The window is a stage, not a warehouse

What belongs in context

What to leave out, and why

Managing context over a long run

Designing context for the next decision, not the archive

Frequently asked questions

Won't summarizing history lose information the agent needs later?

How is context engineering different from prompt engineering?

Where should logs and audit trails live if not in context?

Bringing agentic AI to your phone lines

Try CallSphere AI Voice Agents

Related Articles You May Like

Migrating a workflow to Claude Cowork agents safely

Testing and evals for Claude Cowork agents that ship

Security hardening for Claude Cowork agentic AI systems

Cutting Claude Cowork token costs: caching and batching

Debugging Claude Cowork agents: loops and bad tool calls

Wiring MCP servers into Claude Cowork: the full guide