Multi-Agent Architecture with Claude: How It Fits Together
How multi-agent systems fit together on Claude: orchestrators, isolated subagent contexts, message passing, and where the token cost actually goes.
The first time you watch a Claude orchestrator fan out work to four subagents and stitch their answers back together, it feels like magic. The second time, when one subagent silently hallucinates a file path and poisons the final report, it feels like debugging a distributed system blindfolded. That gap — between the demo and the dependable production run — lives almost entirely in the architecture. If you understand how the pieces actually connect, the failures stop being mysterious.
This post walks the full machine end to end: what an orchestrator really is, how subagents get their own context windows, how results flow back, and where the token cost hides. The goal is not a toy diagram but a working mental model you can reason about when something breaks at 2 a.m.
What a multi-agent system actually is
A multi-agent system is a setup where a coordinating agent decomposes a task, delegates sub-tasks to one or more separate agent instances that each run with their own context window and tool access, and then synthesizes their outputs into a single result. The key word is separate: each subagent is a fresh Claude invocation with its own conversation history, not a function call inside one long transcript. That isolation is the entire point — and also the source of every coordination headache.
Contrast this with a single-agent loop, where one Claude instance reads a prompt, calls tools, observes results, and keeps going until it's done. The single agent has perfect memory of everything it did because it all lives in one growing context. A multi-agent system trades that shared memory for parallelism and focus. The orchestrator never sees the raw 50 tool calls a research subagent made; it sees only the distilled summary the subagent chose to return.
The orchestrator: planner, router, and synthesizer
The orchestrator is the brain, and it plays three distinct roles that are worth separating in your head. As a planner, it reads the user's goal and breaks it into sub-tasks — ideally ones that are independent enough to run in parallel. As a router, it decides which subagent gets which task and what context to hand each one. As a synthesizer, it collects the returns and composes the final answer, resolving conflicts when two subagents disagree.
The most common architectural mistake is collapsing these roles. Teams give the orchestrator a vague system prompt like "coordinate the agents" and wonder why it spawns redundant work or hands subagents overlapping scopes. A good orchestrator prompt is explicit about decomposition rules: how many subagents to use, what a clean task boundary looks like, and how to merge results. In Claude Code, the orchestrator is the main agent and subagents are spawned via the Task tool; in the Claude Agent SDK you wire this loop yourself, which gives you tighter control over how plans are formed.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
One subtle internal detail: the orchestrator's context fills up with subagent summaries, not their work. So a five-subagent run might leave the orchestrator with five dense paragraphs plus its own planning notes. That's compact — but it also means the orchestrator is trusting summaries it cannot verify. Design your subagent return format so those summaries are structured and checkable.
flowchart TD
U["User goal"] --> O["Orchestrator: plan & decompose"]
O --> T{"Independent sub-tasks?"}
T -->|Yes| P["Spawn subagents in parallel"]
T -->|No| S["Spawn sequentially, pass state"]
P --> A1["Subagent A: own context + tools"]
P --> A2["Subagent B: own context + tools"]
A1 --> R["Return structured summary"]
A2 --> R
S --> R
R --> M["Orchestrator synthesizes & resolves conflicts"]
M --> USubagents and the isolated context window
Each subagent receives a curated context: a system prompt scoping its job, the specific task from the orchestrator, and whatever reference material it needs — but crucially not the orchestrator's full history or the other subagents' transcripts. This isolation is what makes multi-agent systems scale to large problems. A codebase audit can run six subagents each examining a different module, and none of them drowns in the others' file dumps.
The flip side is that subagents are blind to each other. If subagent A discovers that the database schema changed, subagent B working on the migration won't know unless the orchestrator explicitly relays it. This is why the orchestrator's routing decisions matter so much: any shared fact must be either pushed into each subagent's brief up front or surfaced through a return-and-re-dispatch cycle. Architectures that ignore this end up with subagents confidently producing mutually incompatible outputs.
Subagents also need clear termination conditions. A subagent without a crisp definition of done will keep calling tools, burning tokens, chasing tangents. Give each one an explicit deliverable — "return a list of the three slowest queries with line numbers" — and a budget mindset. Claude is good at stopping when the goal is unambiguous and bad at stopping when it isn't.
Where the tokens go
Multi-agent runs typically consume several times more tokens than a single-agent run on the same task, and the architecture explains why. Every subagent re-pays the cost of its system prompt and any shared context the orchestrator hands it. Five subagents each carrying a 4,000-token brief is 20,000 tokens before any of them does real work. Then each runs its own tool-call loop, accumulating observations. Finally the orchestrator pays to read all the returns and write the synthesis.
This is not a reason to avoid multi-agent systems — it's a reason to use them where the parallelism or context-isolation actually buys you something. Breadth-first work (research across many sources, auditing many files, exploring several solution branches) pays back the token premium because the subagents genuinely work in parallel and keep each other's noise out. Narrow, sequential tasks usually don't; a single agent with a clean prompt will be cheaper and just as good.
Coordination patterns beyond orchestrator-subagent
The orchestrator-subagent star is the default, but it isn't the only shape. A pipeline chains agents where each one's output is the next one's input — useful when stages are genuinely sequential, like draft, critique, revise. A blackboard pattern has agents read and write to a shared store rather than reporting to a central coordinator, which suits long-running collaborative tasks. A debate pattern runs two agents arguing opposite positions with a judge, useful for high-stakes decisions where you want adversarial pressure.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
In practice most teams start with orchestrator-subagent because it maps cleanly onto how Claude Code and the Agent SDK already work, and because the failure modes are the easiest to reason about. Reach for the fancier topologies only when you've hit a concrete wall — for example, when sequential dependencies make parallelism impossible, a pipeline is honest about that where a star orchestrator just pretends.
Frequently asked questions
When should I use multi-agent instead of a single agent?
Use multiple agents when the work is genuinely parallelizable or when context isolation prevents one task's noise from polluting another. Research across many independent sources, auditing many files, and exploring several solution branches all fit. For narrow, sequential, or cheap tasks, a single well-prompted agent is simpler and far less token-hungry.
How do subagents share information if they have separate contexts?
They don't share directly. The orchestrator is the only channel: it either bakes shared facts into each subagent's brief before dispatch, or it collects returns and re-dispatches with the new information. If a fact must be known by every subagent, push it into all their briefs up front rather than hoping it propagates.
Why do multi-agent systems cost so much more?
Each subagent re-pays its system prompt and shared context, runs its own tool loop, and the orchestrator pays again to read and synthesize every return. The total often lands at several times a single agent's usage, which is why you deploy the pattern deliberately rather than by default.
Does Claude handle the orchestration automatically?
In Claude Code, the main agent can spawn subagents via the Task tool with built-in coordination. With the Claude Agent SDK you control the loop yourself, deciding how plans form, how subagents are dispatched, and how returns are merged — more work, but more control over the architecture.
Bringing agentic AI to your phone lines
CallSphere puts these same orchestration patterns to work on voice and chat — assistants that route calls, pull live data with tools mid-conversation, and hand off cleanly between specialized agents around the clock. See the architecture running in production at callsphere.ai.
Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.