Build a Claude Agent Orchestrator: Step by Step
A hands-on walkthrough to build a Claude multi-agent orchestrator with the Agent SDK: plan as a tool, parallel waves, shared state, verification, and recovery.
Reading about orchestration architecture is one thing; getting a working system running on your own machine is another. This walkthrough is for the engineer who wants to follow real steps and end up with a Claude orchestrator that decomposes a goal, runs subagents in parallel, shares their results, and recovers from a failed branch. I will use the Claude Agent SDK and MCP as the foundation because they give you batteries-included primitives for tool calls, subprocess agents, and structured output.
We will build a small but honest example: an orchestrator that takes a research request, splits it into independent research questions, dispatches a subagent per question, and synthesizes a final brief. The shape generalizes to almost any orchestration task — code migration, document processing, customer triage — so treat the research example as scaffolding you will swap out.
Step 1: Define the plan as a callable tool
The first decision is how the orchestrator emits its plan. Do not parse prose. Instead, give Claude a single tool — call it submit_plan — whose input schema is an array of subtasks. Each subtask has an id, an objective string, a depends_on array of ids, and an allowed_tools list. Because the SDK validates tool inputs against the schema, the orchestrator's plan comes back as structured data you can trust, not text you have to coax into shape.
Your orchestrator prompt is short and strict: state the goal, list the available subtask tools, and instruct the model to call submit_plan exactly once with a complete, dependency-aware breakdown. Tell it to prefer the smallest number of subtasks that still allows independent branches to run in parallel. That one instruction keeps the model from over-decomposing a simple goal into twenty needless agents.
Step 2: Turn the plan into execution waves
Once you have the plan, scheduling is pure code, not model calls. Build a dependency graph from the depends_on fields and compute waves: every subtask whose dependencies are already satisfied goes into the current wave and runs concurrently. When a wave finishes, recompute and run the next. This is a textbook topological sort, and it is the backbone that makes your system fast and predictable.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
flowchart TD
A["Research request"] --> B["Orchestrator calls submit_plan"]
B --> C["Build dependency graph"]
C --> D{"Compute ready wave"}
D --> E["Spawn subagents in parallel"]
E --> F["Write results to state store"]
F --> G{"More waves?"}
G -->|Yes| D
G -->|No| H["Orchestrator synthesizes brief"]The loop in the diagram is the heart of the implementation. Notice that the model is only involved at the two ends — planning at the top and synthesis at the bottom — plus inside each subagent. The wave scheduling itself is deterministic. Keeping that boundary crisp is what makes the system debuggable: when something goes wrong you can immediately tell whether it was a model decision or a scheduling bug.
Step 3: Implement a subagent worker
A subagent is just a fresh Claude conversation scoped to one objective. The SDK lets you spin one up as a subprocess or an isolated session with its own system prompt, its own restricted tool set, and a clean context. Pass it exactly three things: the objective from the plan, the specific upstream results it depends on (pulled from the state store), and its allowed tools. Nothing else from the orchestrator's context should leak in.
Inside the worker, let Claude run its normal tool-use loop until it produces a result that satisfies the subtask contract. The contract is a small schema — for our research example, a findings string and a sources array. Have the worker return that structured object, and write it to the state store under the subtask's id. Because each worker is sandboxed to its objective, its context stays small, which keeps it fast and cheap and far less likely to wander.
Step 4: Wire the shared state store
For a first version, your state store can be an in-memory map keyed by subtask id, holding each completed result. The moment you care about runs that survive a crash or span minutes, swap that map for a tiny persistence layer — even a single SQLite or Postgres table with columns for run id, task id, status, and result JSON. The orchestrator and workers only ever touch this store through two operations: write a completed result, and read a dependency's result.
This discipline pays off immediately in step five. Because every completed subtask is durably recorded, a failure anywhere downstream never forces you to recompute upstream work. You resume from the last good wave. For long Claude orchestrations that make dozens of tool calls, this is the difference between a robust system and one that throws away ten minutes of work on a single transient error.
Step 5: Add verification and recovery
Now make the system honest about failure. Before a worker's result is accepted into the store, run a cheap check: validate it against the contract schema, and optionally ask a Haiku-class model whether the findings actually address the objective. If the check fails, the orchestrator retries that single subtask with a sharper brief — usually appending what was wrong with the first attempt — up to a small retry cap.
Recovery logic belongs in the scheduler, not in the workers. After each wave, the orchestrator inspects which subtasks passed and which need a retry, then folds retries into the next wave. Cap total retries per run so a genuinely impossible subtask cannot spin forever. With this in place you have a complete loop: plan, schedule, execute, verify, recover, synthesize. That is a real orchestrator, not a demo.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Step 6: Synthesize and ship the result
The final orchestrator call reads every completed result from the state store and composes the answer the user asked for. Give it a synthesis prompt that names the original goal and instructs it to integrate the subagent findings into one coherent output, flagging any gaps where a subtask failed permanently. Attach the audit trail — which subagents ran, what tools they called, what they returned — so the result is inspectable. Now you can iterate on prompts and schemas with confidence, because every run leaves a clear record of what happened and why.
Frequently asked questions
How many subagents should the orchestrator spawn?
As few as the work genuinely needs. Set a hard cap in code — say, eight per run — and instruct the planner to prefer fewer, broader subtasks over many tiny ones. Each subagent multiplies your token spend, so parallelism should buy you real speed or specialization before you pay for it.
Can I run subagents on a cheaper model than the orchestrator?
Yes, and you usually should. Reserve Opus-level capability for planning and synthesis where mistakes are expensive, and run the narrow, well-scoped subagents on Sonnet or even Haiku depending on difficulty. Mixing models by role is one of the most effective cost levers available.
What is the simplest way to start?
Build the in-memory version first: plan, single-wave parallel subagents, synthesis, no persistence, no retries. Get that working on a real task, then add the state store, then verification, then recovery. Each addition is independent and testable, so you are never debugging the whole system at once.
How do I test an orchestrator?
Test the deterministic parts directly — feed a hand-written plan into the scheduler and assert the wave order. Then test the model parts with a handful of representative goals and check that the audit trail shows sensible decomposition and that the final brief covers each subtask. Treat planning quality and scheduling correctness as separate concerns.
Bringing agentic AI to your phone lines
The same build steps — plan, dispatch specialists, share state, verify, synthesize — power CallSphere's voice and chat agents, which pick up every call and message, call tools mid-conversation, and book real work without a human in the loop. Try it at callsphere.ai.
Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.