Skip to content
Agentic AI
Agentic AI9 min read0 views

Build Parallel Agents in Claude Code: A Walkthrough

A hands-on walkthrough of building a parallel-agent workflow in Claude Code on desktop: define subagents, fan out, collect JSON results, retry, and verify.

Architecture diagrams are nice, but they do not compile. This post is the opposite: a hands-on walkthrough of standing up a parallel-agent workflow in Claude Code on desktop, from an empty project to a fleet of subagents finishing a multi-file task and reporting back. We will build a realistic example — translating a small library's docs into three languages at once — and you can follow every step.

The point of choosing this example is that the work is embarrassingly parallel: each translation is independent, each writes to its own file, and a clean merge is trivial. That makes it the ideal first parallel build before you tackle messier, contended work.

Key takeaways

  • Start with work that is naturally independent and writes to non-overlapping files — translation, per-module tests, per-endpoint docs.
  • Define subagents as named configs with a tight system prompt, a tool allowlist, and a return contract.
  • The orchestrator's job is to plan, fan out, collect summaries, and merge — never to do the leaf work.
  • Use a JSON result schema so the orchestrator can parse outcomes deterministically instead of re-reading prose.
  • Verify the merge with a single review pass before you accept the run.

Step 1 — Scope the task so it is truly parallel

Before any config, write down the chunks and confirm they share no state. For our example: translate README.md into French, German, and Japanese, writing README.fr.md, README.de.md, and README.ja.md. Three subagents, three output files, zero overlap. Each reads the same source (read-only) and writes a distinct target. If your real task does not factor this cleanly, stop and re-decompose — parallelism on contended work is where builds go wrong.

Step 2 — Define the subagent

In Claude Code, a subagent is a reusable definition: a name, a focused system prompt, the tools it may use, and the model. Keeping the tool allowlist tight is both a safety and a focus measure — a translation agent has no business running shell commands. Here is a minimal definition you can drop into your project's agent config.

{
  "name": "doc-translator",
  "description": "Translates one source doc into one target language.",
  "model": "claude-sonnet-4-6",
  "tools": ["read_file", "write_file"],
  "system_prompt": "You translate technical docs. Preserve code blocks and links verbatim. Translate prose into the requested target language. Return a JSON object: {\"file\": , \"language\": , \"status\": \"ok\" | \"failed\", \"notes\": }."
}

The return contract is the load-bearing part. Because the agent always ends with that JSON object, the orchestrator can parse three results deterministically instead of reading three blobs of prose and guessing whether each succeeded.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

Step 3 — Write the orchestrator prompt

The orchestrator is just another Claude loop, but its instructions forbid it from doing the translation itself. Its prompt names the three target languages, tells it to spawn one doc-translator per language with non-overlapping output paths, and tells it to collect the three JSON results and merge. The merge here is trivial — confirm all three files exist and all statuses are ok — but writing it explicitly is what makes the run auditable.

flowchart TD
  G["Goal: translate README into 3 langs"] --> O["Orchestrator plans tasks"]
  O --> T1["Spawn translator: fr"]
  O --> T2["Spawn translator: de"]
  O --> T3["Spawn translator: ja"]
  T1 --> W["Write README.fr.md"]
  T2 --> W2["Write README.de.md"]
  T3 --> W3["Write README.ja.md"]
  W --> C["Collect JSON results"]
  W2 --> C
  W3 --> C
  C --> M["Orchestrator verifies & reports"]

Notice each subagent writes a different file. Because the output paths never overlap, you do not even need write locks for this run — the independence is structural. That is exactly why this example is the right first build.

Step 4 — Run it and watch the fan-out

Kick off the orchestrator with a single instruction: "Translate the README into French, German, and Japanese." On desktop you will see three subagent panes spin up, each working its own context. They do not talk to each other. Within one round trip each writes its file and returns its JSON. The orchestrator collects all three and prints a short report: three files written, three statuses ok.

If one fails — say the Japanese agent hits an encoding issue and returns status: failed — the orchestrator does not silently pass. Because every result is structured, the orchestrator can re-spawn just the failed agent with a tweaked instruction, leaving the two successful files untouched. Selective retry is one of the biggest practical wins of the structured-result approach.

It is worth pausing on what you are actually watching during the run, because it teaches you how parallelism behaves. The three panes do not progress in lockstep — one language is shorter, one model response comes back first, one agent re-reads the source an extra time. You will see them finish at different moments, and that is correct. The orchestrator does not act until it has all three results, so the slowest agent sets the wall-clock time for the round. This is the parallel-agent version of the classic rule that a fan-out is only as fast as its slowest branch, and it is why balancing chunk sizes matters once tasks get larger: a single oversized chunk stalls the whole join.

You should also resist the urge to chain rounds prematurely. A common early mistake is to have the orchestrator immediately fan out a second wave the moment the first returns, before a human has glanced at the merged output. For a low-stakes translation that is fine, but for anything touching production code, insert a deliberate checkpoint after the first join. The structured summaries make this cheap — the orchestrator can present a four-line digest of what each agent did, and you approve or redirect before the next wave spends more tokens.

Step 5 — Add a verification pass

A run that writes files is not done until something checks them. Add a final orchestrator instruction: read the first 20 lines of each output file and confirm the code blocks and links are byte-for-byte identical to the source. This catches the most common translation regression — an agent "helpfully" translating a code identifier or a URL. The check is cheap, deterministic, and turns a hopeful run into a trustworthy one.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

It is worth spending a verification agent of its own here once the task grows. For three files an inline check is fine, but for forty modules you would spawn a dedicated reviewer subagent that consumes every map agent's result and flags the ones that drifted. The reviewer reads structured results, not raw transcripts, so its context stays small even as the fan-out width grows. This is the same separation of concerns as the rest of the build: leaf agents produce, the orchestrator plans, and a reviewer judges, each in its own clean context.

Common pitfalls

  • Letting the orchestrator translate. If its prompt is loose, it will just do the work itself in one context and you lose all parallelism. State explicitly: plan and delegate only.
  • Free-text results. Without a JSON contract you cannot reliably detect partial failure or retry one agent. Always end subagents with a fixed schema.
  • Overlapping output paths. The instant two agents write the same file, you need locks and conflict handling. Keep the first build path-disjoint.
  • Over-broad tool allowlists. A translator with shell access can wander. Grant the minimum tools the task needs.
  • Skipping verification. Parallel agents fail independently and quietly. A final review pass is non-negotiable.

Ship your first parallel run in 5 steps

  1. Pick a task with N independent chunks that write to N non-overlapping files.
  2. Define one subagent with a tight prompt, a minimal tool allowlist, and a JSON return contract.
  3. Write an orchestrator prompt that plans, fans out one subagent per chunk, and collects results.
  4. Run it, watch the fan-out, and let the orchestrator retry only failed agents.
  5. Add a verification pass that checks the merged output against the source.
DecisionChoose thisWhy
First task typePath-disjoint, independentNo locks, trivial merge
Subagent modelSonnet 4.6Fast, capable, cheaper at fan-out
Result formatFixed JSON schemaDeterministic parsing & retry
Failure handlingRe-spawn only failed agentsSaves tokens and time

Frequently asked questions

How many subagents should I spawn at once?

Spawn one per genuinely independent chunk, up to a sane concurrency cap. Token cost scales with active agents, so more is not free. For a first build, three to five is a comfortable range to observe behavior before scaling.

Do subagents share memory in Claude Code?

No. Each subagent runs in its own isolated context window and does not see the others' message history. They communicate only through the structured results they return to the orchestrator, which is what keeps them from confusing each other.

What model should subagents use?

For high-volume leaf work like translation or per-file edits, a fast capable model such as Sonnet 4.6 keeps cost down. Reserve the most capable model for the orchestrator's planning and merge decisions, where judgment matters most.

How do I retry just one failed agent?

Because each subagent returns a status in its JSON result, the orchestrator can detect exactly which chunk failed and re-spawn only that subagent with an adjusted instruction, leaving completed work in place.

Bringing agentic AI to your phone lines

The same fan-out-and-merge pattern powers CallSphere: voice and chat agents that split a conversation's tasks, call tools in real time, and hand back a clean result — answering every call and message and booking work 24/7. See it live at callsphere.ai.


Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.