Skip to content
Agentic AI
Agentic AI7 min read0 views

Build a multi-agent system on Claude: a walkthrough

Step-by-step guide to building an orchestrator–subagent system on Claude: decomposition, parallel dispatch, compact handoff, looping, and synthesis.

Reading about orchestrator–worker architectures is one thing; getting one to actually converge on a real task is another. The gap is full of small decisions — how to phrase the decomposition prompt, where to put the parallelism, what a subagent is allowed to return, when to stop — that no diagram tells you. This walkthrough builds a concrete multi-agent system end to end on Claude, the kind you could ship: a research assistant that answers a complex question by fanning out subagents, each chasing one sub-question, and synthesizing a cited answer. Follow the steps in order; each one fixes a problem the previous one created.

I'll keep the framework choice light. The same nine steps apply whether you orchestrate with the Claude Agent SDK, with Claude Code subagents, or with your own loop over the Anthropic Messages API. What matters is the shape of the loop, not the SDK.

Step 1–2: define the contract and the decomposition

Start by writing the data contract, not the prompt. Decide exactly what a subagent receives and what it returns. A clean contract is: input is a single self-contained sub-question plus any context it needs (never "figure out what to do"); output is a structured object — a finding string, a confidence flag, and a list of sources. Pin this in your code as a schema before you write a line of prompt, because every later step depends on it.

Now the decomposition. The orchestrator's first job is to turn the user's goal into a list of independent sub-questions. The prompt that does this should demand independence explicitly: "Break this into 3–6 sub-questions that can each be answered without knowing the others' answers." If sub-questions are entangled, parallelism is a lie and you'll get duplicated or contradictory work. Have the orchestrator emit the list as structured output so your code can iterate it directly.

Step 3–4: dispatch in parallel and isolate context

With a list of sub-questions in hand, spawn one subagent per question. The critical implementation detail: each subagent gets a fresh context — system prompt, its one sub-question, its tools — and nothing from sibling subagents. This isolation is the whole point. A subagent can spend a large context budget reading sources and still hand back a tiny result, and neither the orchestrator nor the siblings pay for that spend.

flowchart TD
  A["User question"] --> B["Orchestrator: decompose"]
  B --> C["List of sub-questions"]
  C --> D["Spawn N subagents (parallel)"]
  D --> E["Each: search + read in own context"]
  E --> F["Return finding + sources + confidence"]
  F --> G{"Gaps or low confidence?"}
  G -->|Yes| H["Orchestrator queues follow-ups"] --> D
  G -->|No| I["Synthesize cited answer"]

Run the subagent calls concurrently. On Claude Code this is native; with the SDK or raw API, fire the requests with your language's async primitives and gather the results. Parallelism is where multi-agent earns back some of its token premium in wall-clock time — five subagents finishing in the time of one is the payoff.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

Step 5: enforce compact handoff

This is the step teams skip and regret. A subagent that returns its entire transcript destroys the context-firewall benefit — now the orchestrator inherits all the noise you were trying to contain. Enforce compaction in the subagent's own instructions: "Return at most 200 words of findings plus your sources. Do not include your reasoning steps." Then validate it in code; if a return blows the budget, re-ask the subagent to summarize rather than passing the bloat upstream.

The handoff format is where structure pays off. Because step 1 defined a schema — finding, confidence, sources — the orchestrator receives uniform objects it can sort, dedupe, and reason over. Free-text returns force the orchestrator to re-parse prose every round, which is slow and error-prone. Structure at the boundary is the cheapest reliability win in the whole system.

Step 6–7: synthesize and decide whether to loop

Now the orchestrator has a list of structured findings. Its second prompt synthesizes them into an answer — but before it writes the final answer, it should evaluate coverage. Give it an explicit decision: are there sub-questions that came back low-confidence or contradictory? If yes, it queues follow-up sub-questions and the loop repeats; if no, it writes the answer. This is the G branch in the diagram, and it's what separates a system that converges from one that either stops too early or never stops.

Implement the loop with a hard cap — three or four rounds, not unbounded. Each round, pass the orchestrator a tight delta: the questions answered, the findings, and the open gaps. Do not re-feed it the entire history; that defeats the firewall and inflates cost round over round. The orchestrator should always be reasoning over a summary of state, never a raw log.

Step 8: synthesis prompt and citation discipline

The final synthesis prompt deserves real care because it's the only output the user sees. Instruct the orchestrator to write the answer strictly from the collected findings and to attach the sources each finding carried. A useful guardrail: "If a claim is not supported by a returned finding, omit it." This keeps the orchestrator from filling gaps with its own pre-training knowledge, which is exactly the kind of unsourced confidence you built a research system to avoid.

Test the synthesis in isolation by feeding it hand-written findings and checking the output. If synthesis is weak even with perfect inputs, no amount of better subagent work will save the system — so debug it independently before blaming the workers.

Step 9: instrument, cap, and harden

Before this touches real traffic, wire in observability. Log every subagent's brief, its token spend, and its return. Log the orchestrator's plan and each loop decision. When a run goes wrong — and they do — you want to read the orchestration trace and see immediately whether the failure was bad decomposition, a lossy handoff, or a loop that wouldn't terminate. Add a circuit breaker: a global token budget and a max-rounds cap that forces the orchestrator into a "report what you have" branch rather than spinning.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

One last hardening step: make subagent tool use idempotent and side-effect-aware. In a research agent the tools are read-only and this is easy, but the moment a subagent can write — to a file, a ticket, a database — you must ensure a retried or duplicated subagent doesn't double-apply its action. Build that discipline in from day one; retrofitting it after a duplicate-write incident is far more painful.

Frequently asked questions

How many subagents should the orchestrator spawn?

Enough to cover independent sub-questions and no more — typically three to six for a research task. Too few and you lose the parallelism benefit; too many and coordination overhead and token cost dominate. Let the decomposition step decide the count from the task, with a cap so a runaway plan can't spawn dozens.

Where does the implementation usually break first?

At the handoff. Subagents that return raw transcripts instead of compact structured findings flood the orchestrator's context and erase the whole benefit of splitting the work. Enforce a word cap and a return schema, and validate both in code.

Do I need the Claude Agent SDK to build this?

No. The SDK gives you orchestration primitives that save boilerplate, but the same nine steps work with Claude Code subagents or a plain async loop over the Messages API. The architecture — decompose, dispatch in parallel, compact handoff, loop with a cap, synthesize — is what matters.

How do I keep the loop from running forever?

Give it a hard max-rounds cap and a global token budget, and add an explicit termination branch where the orchestrator reports its best current answer instead of looping again. Convergence should be a designed exit, not something you hope emerges.

Bringing agentic AI to your phone lines

This decompose-dispatch-synthesize loop isn't just for research bots. CallSphere runs the same implementation discipline on voice and chat — agents that answer every call, fan out to tools mid-conversation, and book work 24/7. See the live build at callsphere.ai.


Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.