---
title: "Context design for multi-agent Claude: what to include"
description: "What to include and leave out of each agent's context in multi-agent Claude systems: orchestrator deltas, need-to-know subagents, compaction, and shared stores."
canonical: https://callsphere.ai/blog/context-design-for-multi-agent-claude-what-to-include
category: "Agentic AI"
tags: ["agentic ai", "claude", "context engineering", "multi-agent systems", "prompt engineering", "context window"]
author: "CallSphere Team"
published: 2026-04-10T09:32:44.000Z
updated: 2026-06-06T21:47:43.623Z
---

# Context design for multi-agent Claude: what to include

> What to include and leave out of each agent's context in multi-agent Claude systems: orchestrator deltas, need-to-know subagents, compaction, and shared stores.

The hardest resource to manage in a multi-agent system isn't tokens or money — it's attention. Every agent has a finite context window, and what you put in it directly determines the quality of what comes out. In a single-agent app you can be sloppy and let the window fill; in a multi-agent system, where an orchestrator and a dozen subagents each have their own window and pass material between them, sloppy context compounds into noise, drift, and cost. This post is about the discipline of context design: deciding, per agent, what belongs in the window, what to deliberately leave out, and why the leaving-out is the harder and more valuable skill.

Context engineering is the practice of curating exactly what an agent sees — its instructions, the task, the relevant facts, and prior results — so that its limited attention is spent on what matters and not diluted by what doesn't. In multi-agent systems this becomes the central design problem, because every handoff is an opportunity to either keep context clean or pollute the next agent's window.

## The orchestrator's context: plan, not transcript

The orchestrator should hold a summary of state, never the full history. The instinct to give it everything — every subagent's complete output, the whole running log — is exactly wrong, because it pays attention to all of it and its planning degrades as the window fills with stale detail. What the orchestrator needs each round is a tight delta: the goal, the plan, what's done, what's open, and the compact findings so far. That's enough to decide the next move and nothing more.

Concretely, after each round, compress. Replace "here are five complete subagent transcripts" with "five findings, here they are in one line each, two are low-confidence." The orchestrator reasons over the compressed state and stays sharp across many rounds. A system that re-feeds raw history every round doesn't just cost more — it gets *worse* as it goes, because the signal-to-noise ratio in the orchestrator's window keeps dropping.

## The subagent's context: need-to-know only

Each subagent should receive the minimum context to do its one job. A verification subagent gets the claim and its source — not the user's original question, not the sibling claims, not the orchestrator's plan. This isn't only about saving tokens. Extra context actively harms a subagent, because irrelevant material invites it to expand its scope, second-guess its brief, or blend in concerns that aren't its job. Tight context keeps a subagent on-task; loose context makes it wander.

```mermaid
flowchart TD
  A["Full system state"] --> B{"Building context for which agent?"}
  B -->|Orchestrator| C["Goal + plan + compact findings"]
  B -->|Subagent| D["One task + need-to-know facts"]
  C --> E["Run & produce next decision"]
  D --> F["Run in isolation"]
  F --> G["Compact the result"]
  G --> H["Merge summary back into state"]
  E --> H
```

Build subagent context from an explicit allow-list, not by inheriting the orchestrator's window. Your spawn function should assemble a context object deliberately for each role. When you catch yourself wanting to pass "everything just in case," treat it as a signal that the subagent's task isn't crisp enough — a well-defined task tells you precisely what it needs, and that's a short list.

## What to deliberately leave out

The valuable skill is subtraction. Several categories almost always belong outside an agent's window. Leave out **raw tool output** once it's been distilled — keep the finding, drop the 5,000-token API dump. Leave out **resolved history** — completed subtasks become a one-line "done" entry, not a replayed transcript. Leave out **sibling state** a subagent doesn't act on — it only invites scope creep. Leave out **credentials and secrets** entirely; they belong in the tool layer, never in context where they can leak into a log or handoff.

There's a counterintuitive corollary: more context does not mean better answers. Past a point, additional material crowds out the signal and the agent's quality drops even though its window isn't technically full. The teams that ship reliable multi-agent systems are ruthless about removal — they treat every token in an agent's context as something that has to earn its place, and they delete aggressively at every handoff boundary.

## Compaction at the handoff boundary

The handoff between a subagent and the orchestrator is the most important compaction point in the system, the `G` step in the diagram. A subagent might consume a large context window doing its work, but it must return a small, distilled result — that's the context-firewall property that makes multi-agent worthwhile. Enforce it twice: instruct the subagent to summarize in its prompt ("return at most 200 words plus sources"), and validate the size in code, re-asking for a tighter summary if it overflows.

Good compaction is lossy on purpose. The subagent throws away its reasoning steps, its dead ends, its raw sources, and keeps the conclusion and the evidence that supports it. The skill is deciding what the orchestrator actually needs to act — usually a claim, a confidence, and a citation — and discarding everything that merely explains how the subagent got there. If your handoffs preserve reasoning instead of conclusions, your orchestrator's window fills with other agents' thinking and the whole system bloats.

## Persistent context: the shared store

Some context outlives any single window — a large document set, accumulated findings, a project's state across many rounds. That belongs in a shared store (a file, a database, a vector index), not in any agent's context. Agents pull from it on demand through a tool call and pull only the slice they need. This keeps windows small while making large, durable context available, and it's the backbone of long-running and blackboard-style multi-agent patterns where you can't predict in advance what each agent will need.

The discipline that makes a shared store work is retrieval over inclusion: agents fetch the specific record relevant to their task rather than loading the whole store into context. A retrieval step that pulls the three relevant passages beats dumping a hundred passages and hoping the agent finds the right ones — and it keeps the window clean for the actual reasoning. Treat the shared store as the system's memory and each agent's context as a small, freshly-assembled workspace drawn from it.

## Frequently asked questions

### What is context engineering in a multi-agent system?

Context engineering is the practice of curating exactly what each agent sees — instructions, task, relevant facts, prior results — so its limited attention is spent on what matters. In multi-agent systems it's the central design problem, because every handoff either keeps the next agent's context clean or pollutes it.

### Does giving an agent more context improve its answers?

No — past a point it hurts. Extra material crowds out the signal, dilutes attention, and invites subagents to wander off their brief, so quality can drop even when the window isn't technically full. Ruthless subtraction at every handoff is what keeps quality high.

### What should I leave out of a subagent's context?

Raw tool output once distilled, resolved history that can become a one-line "done," sibling state the subagent doesn't act on, and all credentials. Give it only the one task and the need-to-know facts, assembled from an explicit allow-list rather than inherited from the orchestrator.

### Where should context that outlives a single window go?

In a shared store — a file, database, or vector index — that agents query on demand, pulling only the slice they need. This keeps every agent's window small while making large, durable context available, and it underpins long-running and blackboard coordination patterns.

## Bringing agentic AI to your phone lines

Tight context and clean handoffs are what keep a live agent fast and on-script. CallSphere applies the same context-engineering discipline to **voice and chat** — agents that answer every call, use tools mid-conversation, and book work 24/7 without losing the thread. See it at [callsphere.ai](https://callsphere.ai).

---

*Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.*

---

Source: https://callsphere.ai/blog/context-design-for-multi-agent-claude-what-to-include