Multi-agent coordination patterns: the architecture inside
How multi-agent systems are wired with Claude: orchestrators, subagents as context firewalls, state transport, and the five coordination patterns that matter.
The first time you split a hard task across several Claude agents, something surprising happens: the system gets slower to reason about, not faster. You traded one model call you could read top-to-bottom for a swarm of agents passing messages you cannot see. The win — parallelism, specialization, larger effective working memory — is real, but it only shows up when the underlying architecture is deliberate. This post is a tour of that architecture: the moving parts, how they fit together end to end, and the five coordination patterns you choose between when you design a system on Claude Code, the Claude Agent SDK, or your own loop over the Anthropic API.
A multi-agent system is a set of LLM-driven agents — each with its own context window, tools, and instructions — that cooperate on a task by exchanging messages or results under some coordination scheme. The architecture is the answer to three questions: who decides what runs, how state moves between agents, and where the loop terminates. Get those three right and the rest is plumbing.
The pieces, and what each one owns
Every multi-agent build on Claude has the same skeleton, even when the framework hides it. At the top there is an orchestrator: a Claude instance whose job is to decompose the task, decide which subagents to spawn, and synthesize their outputs. Below it are subagents, each a fresh Claude context with a narrow brief — "research this sub-question", "write this file", "verify this claim". Around both sits the tool layer (functions, MCP servers, file I/O) and a context store that holds anything that must outlive a single agent's window.
The reason this separation matters is the context window. Even with Claude's large windows, an agent that does everything eventually fills its context with tool output, half-finished reasoning, and stale plans, and its quality degrades. Subagents exist primarily to contain context: a research subagent can burn 80,000 tokens chasing a question and hand back a 300-token summary, and the orchestrator never pays for the 80,000. This is the single most important architectural property of the pattern — subagents are context firewalls as much as they are workers.
The orchestrator owns the plan and the termination condition. Subagents own their slice and nothing else. The context store owns durable facts. When teams get this blurry — when subagents start re-planning, or the orchestrator starts doing the work itself — coordination breaks down and token cost explodes.
How a request flows end to end
Trace a single request through an orchestrator–subagent system and the architecture becomes concrete. The orchestrator receives the goal, produces a plan, and dispatches subagents — often in parallel, since on Claude Code subagents run concurrently. Each subagent works in isolation, calls its tools, and returns a compact result. The orchestrator collects results, decides whether the goal is met, and either synthesizes a final answer or spawns another round.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
flowchart TD
A["Goal arrives"] --> B["Orchestrator: decompose & plan"]
B --> C{"Subtasks independent?"}
C -->|Yes| D["Spawn subagents in parallel"]
C -->|No| E["Run subagents in sequence"]
D --> F["Subagent: own context + tools"]
E --> F
F --> G["Return compact result"]
G --> H{"Goal satisfied?"}
H -->|No| B
H -->|Yes| I["Orchestrator synthesizes answer"]
The loop back from H to B is where most of the engineering lives. A naive system re-plans from scratch every round and thrashes; a good one feeds the orchestrator a tight delta — what's done, what failed, what's left — so each round narrows the problem. The compaction step at G is equally load-bearing: if subagents return raw transcripts instead of distilled findings, you lose the context-firewall benefit entirely.
The five coordination patterns
Architecture is not one shape. There are five coordination patterns worth knowing, and real systems mix them. Orchestrator–worker is the default: one planner fans work out to stateless workers and merges results. Use it when subtasks are independent and the hard part is breadth — research, multi-file refactors, gathering evidence from many sources.
Sequential pipeline chains agents where each consumes the previous one's output: extract, then transform, then validate. Use it when stages have a strict dependency order and each stage is a different kind of work. Hierarchical nests orchestrators — a top planner spawns mid-level managers that spawn their own workers — and earns its complexity only on genuinely large tasks where a single planner would itself overflow.
Peer debate / critic runs two or more agents against each other: a generator proposes, a critic finds flaws, they iterate. This pattern buys accuracy on subjective or error-prone work — code review, claim verification, design critique — at the cost of more tokens and latency. Blackboard is the most decoupled: agents read and write a shared store and react to its state rather than to direct messages, which suits long-running, open-ended problems where you cannot predict the call graph in advance.
State: the part frameworks hide and you still must own
The trickiest architectural decision is how state moves. There are three transports and you will use all of them. Message passing — the orchestrator's prompt to a subagent, and the subagent's return value — is the cleanest and what you reach for first. Shared store — a file, a key-value table, a vector index — is for state too large or too long-lived to pass inline, and for the blackboard pattern. Tool results are implicit state: a subagent that writes to a database has changed the world for every later agent.
The failure mode is hidden coupling. When two subagents both write the same file, or one agent's tool call invalidates another's assumptions, you get nondeterministic, hard-to-reproduce bugs. The architectural defense is to make every subagent's inputs and outputs explicit, keep writes idempotent, and treat the shared store as the only sanctioned channel for cross-agent state. If a subagent needs something, it should receive it in its brief or read it from the store — never assume it inherited the orchestrator's full context, because it did not.
Where the cost and the failures concentrate
Multi-agent runs typically consume several times the tokens of a single-agent run doing the same work — each subagent re-reads its instructions, and the orchestrator pays for every round of synthesis. The architecture should earn that cost. Reach for multiple agents when the task is wide (many independent sub-questions), when you need isolation (untrusted or risky sub-steps), or when accuracy justifies a critic loop. For a task one capable agent can hold in a single window, a single agent with good tools is faster, cheaper, and easier to debug.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Failures cluster in three places: planning (the orchestrator decomposes badly), handoff (results returned too raw or too lossy), and termination (the loop never converges). Instrument all three. Log every subagent brief and return value, cap the number of rounds, and give the orchestrator an explicit "give up and report" branch. The systems that survive production are the ones where a human can read the orchestration log and understand exactly why each agent ran.
Frequently asked questions
What is a multi-agent system in the context of Claude?
A multi-agent system is a coordinated set of Claude instances, each with its own context window, tools, and instructions, that cooperate on a task — typically an orchestrator that decomposes the goal and subagents that each handle a slice and return compact results. Its defining benefit is that subagents act as context firewalls, doing expensive work in isolation and returning only distilled findings.
When should I not use multiple agents?
When a single Claude agent can hold the whole task in one context window without degrading. Multi-agent systems cost several times more tokens and add coordination failure modes, so for narrow tasks a single well-equipped agent is faster, cheaper, and far easier to debug.
How is the orchestrator different from a subagent?
The orchestrator owns the plan, the dispatch decisions, and the termination condition; it never does the leaf-level work itself. Subagents own one narrow brief and return a compact result. Keeping that boundary sharp is what stops token cost from exploding and keeps the system debuggable.
Which coordination pattern is the safest default?
Orchestrator–worker. It handles the common case — independent subtasks that need breadth — with the least coordination machinery, and you can layer a critic or pipeline on top of it once a specific stage demands more.
Bringing agentic AI to your phone lines
The same orchestration, context-firewall, and handoff patterns that make a coding swarm work also make a great voice agent. CallSphere applies them to voice and chat — multi-agent assistants that answer every call, call tools mid-conversation, and book work around the clock. See it live at callsphere.ai.
Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.