Skip to content
Agentic AI
Agentic AI7 min read0 views

Build a Multi-Agent System with Claude: Full Walkthrough

Step-by-step guide to building a working multi-agent system with Claude — decomposition, subagent dispatch, return contracts, validation, and synthesis.

Reading about orchestrators and subagents is one thing; actually getting a multi-agent run to do useful work without spiraling into a token bonfire is another. This walkthrough takes you from an empty file to a working orchestrator-subagent system, in the order you'd really build it. We'll use a concrete example throughout — a research assistant that answers a question by spawning subagents to investigate different angles — and call out the decision you face at each step.

I'm going to assume you're building on the Claude Agent SDK or a similar loop where you control dispatch, because that's where the mechanics are visible. If you're in Claude Code, the same stages happen, just with the Task tool handling some plumbing for you.

Step 1: Decide if you even need multiple agents

Before writing a line, sanity-check the premise. A multi-agent system earns its keep when the work splits into independent chunks or when one chunk's context would drown another. Our research example qualifies: "Compare three database options for our use case" naturally splits into three parallel investigations that don't need to see each other's notes. If your task is a single linear chain — fetch, transform, write — stop here and build one agent. You'll thank yourself.

Write the test down explicitly. Ask: can I describe two or more sub-tasks that could run at the same time without needing each other's intermediate results? If yes, proceed. If the only honest answer is "they kind of depend on each other," you want a pipeline or a single agent, not a fan-out.

Step 2: Write the orchestrator's planning prompt

The orchestrator's system prompt is the most important code you'll write, even though it isn't code. It must teach Claude how to decompose this specific class of task. Vague instructions produce vague plans. For the research assistant, the prompt spells out: identify the distinct angles the question requires, create one subagent task per angle, keep tasks non-overlapping, and never spawn more than a sensible ceiling of subagents.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

Give the planner a concrete output contract. I have the orchestrator emit a structured plan — a list of subagent briefs, each with a title, a precise objective, and the exact deliverable expected back — before any subagent is spawned. This makes the plan inspectable. You can log it, eyeball it during development, and catch a bad decomposition before it costs you five subagent runs.

flowchart TD
  Q["Incoming question"] --> D["Step 2: Orchestrator drafts plan"]
  D --> V{"Plan valid & non-overlapping?"}
  V -->|No| D
  V -->|Yes| F["Step 3: Dispatch subagents"]
  F --> S1["Subagent 1 runs tool loop"]
  F --> S2["Subagent 2 runs tool loop"]
  S1 --> C["Step 4: Collect structured returns"]
  S2 --> C
  C --> Y["Step 5: Synthesize final answer"]
  Y --> G{"Gaps or conflicts?"}
  G -->|Yes| F
  G -->|No| OUT["Return result"]

Step 3: Dispatch the subagents

Now you spin up each subagent as its own Claude invocation. Each gets a fresh system prompt scoping its narrow job, the specific brief from the plan, and only the reference material it needs — not the whole conversation. In the SDK this is a separate message thread; in Claude Code it's a Task call. The critical implementation detail is the subagent's return contract: tell it exactly what shape to hand back. "Return a JSON object with findings, sources, and a confidence note" beats "summarize what you found" every time, because the orchestrator has to parse these mechanically.

Run independent subagents concurrently. If your three database investigations don't depend on each other, fire all three and await them together rather than serially. This is the entire performance argument for multi-agent systems — squander it by running sequentially and you've paid the token premium for none of the speed. Make sure each subagent has a hard cap on tool calls so a confused one can't loop forever.

Step 4: Collect and validate returns

When subagents come back, don't blindly trust them. Each return is a summary the subagent chose to write, and subagents can be confidently wrong. Validate the shape first — did it return the contracted fields? — and then sanity-check the content where you can. If a research subagent claims a fact, having it return the source lets the orchestrator weigh it. If two subagents contradict each other, you want to detect that here, not paper over it in the synthesis.

This is also where you decide whether the plan is actually complete. Sometimes a subagent returns "I couldn't determine X." A good orchestrator notices the gap and either re-dispatches a focused follow-up subagent or flags the limitation in the final answer. Building that gap-detection step early saves you from shipping confident-sounding answers built on a missing leg.

Step 5: Synthesize the final answer

The synthesis prompt is a second distinct orchestrator skill. It reads all the validated returns and composes one coherent answer, explicitly reconciling disagreements rather than averaging them. For the research assistant, the synthesis prompt says: present each option's findings, call out where sources conflicted, and end with a recommendation grounded in the briefs — never invent facts not present in any return.

That last constraint matters. The single most common failure in the synthesis stage is the orchestrator smoothing over gaps by hallucinating connective tissue. Constrain it to the returns it was given. If something is missing, the honest output is "the subagents didn't cover X," which is far more useful than a fabricated bridge. Once this loop runs cleanly end to end, you have a real multi-agent system — and from here it's tuning, not architecture.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Step 6: Add observability before you scale

Before you turn this loose on real traffic, instrument it. Log the plan, every subagent brief, every return, and the final synthesis with token counts attached. Multi-agent systems fail quietly — a subagent half-completes, the orchestrator papers over it, and the answer looks fine until someone checks. With per-stage logs you can replay any run, see which subagent went sideways, and tune the prompt that caused it. Without them you're guessing.

Frequently asked questions

How many subagents should one orchestrator spawn?

Match the number to the genuinely independent angles in the task, and cap it. Three to five is a common sweet spot. Too few and you might as well use a single agent; too many and the orchestrator struggles to synthesize the flood of returns, and your token bill balloons without proportional benefit.

Should subagents run in parallel or one at a time?

Run independent subagents in parallel — that's the whole point of the pattern. Only run sequentially when one subagent genuinely needs another's output, in which case you're really building a pipeline and should be honest about the dependency in your plan.

What's the most common bug in a first multi-agent build?

An unstructured return contract. If subagents return free-form prose, the orchestrator can't reliably parse or validate them, and synthesis becomes guesswork. Define an explicit return shape per subagent from day one.

How do I keep the token cost from exploding?

Hand each subagent only the context it needs, cap its tool calls, and use multi-agent only where parallelism or isolation pays off. Log token usage per stage so you can see exactly which subagent is expensive and tighten its brief.

Putting agentic AI on the phone

CallSphere runs this exact build pattern for voice and chat — a coordinating agent that dispatches specialized helpers to look up accounts, check availability, and book work while the caller is still on the line. Watch the walkthrough come alive at callsphere.ai.


Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.