Inside Claude Managed Agents: Architecture & Internals

When Anthropic shipped Claude Managed Agents, the headline feature was deceptively simple: you describe an outcome and the platform runs the agent that gets you there, including spinning up subagents when the work fans out. But the moment you put that in front of real traffic, the questions stop being about prompts and start being about systems. Where does state live? Who decides a task is "done"? What happens when a subagent stalls? This post pulls the lid off the architecture and walks the request from the first instruction to the final verified result.

I have built orchestrated agent systems on top of Claude's primitives for long enough to know that the magic is mostly plumbing. The model is the smart part; the architecture is what keeps the smart part honest, bounded, and observable. Understanding the internals is what lets you debug a run at 2 a.m. instead of re-rolling the dice.

Key takeaways

Managed Agents separate the outcome contract (what success looks like) from the execution graph (how Claude gets there), and the runtime owns the loop between them.
The orchestrator is itself a Claude turn: it plans, delegates to subagents, and reconciles their results against the success criteria before declaring completion.
Each subagent runs in its own context window with a scoped tool set, which is why multi-agent runs spend several times more tokens than a single agent.
State is split into durable run state (the platform's job) and conversational context (the model's working memory) — conflating them is the most common source of bugs.
Verification is a first-class node, not an afterthought: the loop only terminates when the outcome check passes or the budget is exhausted.

What "managed" actually means under the hood

A Claude Managed Agent is a runtime service that accepts an outcome specification, plans an execution graph, and drives that graph to completion while persisting state, enforcing budgets, and surfacing telemetry. The "managed" part is the runtime taking responsibility for the things you would otherwise hand-roll: the agent loop, retries, context isolation between subagents, tool authorization, and the decision of when to stop.

Contrast that with the raw Agent SDK, where you own the loop. In the SDK you write the while-loop that feeds tool results back to the model and you decide termination. With Managed Agents, you hand the platform a contract — "produce a reconciled month-end report and attach the three supporting queries" — and the runtime materializes the loop, the subagents, and the bookkeeping. You trade fine-grained control for a managed lifecycle.

The end-to-end request path

The cleanest mental model is three planes: a control plane that holds the run record and budgets, an orchestration turn that plans and delegates, and a fan-out of worker turns that do scoped work. The orchestrator and workers are all Claude calls, but they see different context and different tools.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

flowchart TD
  A["Outcome request + success criteria"] --> B["Control plane: create run, set budget"]
  B --> C["Orchestrator turn: plan + decompose"]
  C --> D{"Needs parallel work?"}
  D -->|No| E["Single worker turn with scoped tools"]
  D -->|Yes| F["Spawn subagents, isolated contexts"]
  E --> G["Reconcile results vs criteria"]
  F --> G
  G --> H{"Outcome met & budget OK?"}
  H -->|No| C
  H -->|Yes| I["Emit final artifact + trace"]

Notice the back-edge from the verification node to the orchestrator. That loop is the heart of the system. The orchestrator does not blindly trust a worker's "I'm done" — it re-reads the success criteria, inspects the produced artifacts, and decides whether to iterate, redirect, or finish. This is what makes the agent outcome-driven rather than step-driven.

How the orchestrator decomposes work

When a run starts, the orchestrator turn receives the outcome contract plus a description of the available tools and subagent types. Its first job is planning: it writes an explicit decomposition — usually a short list of subtasks, each with a goal and the tools it should be allowed to touch. Good decompositions are MECE-ish: mutually exclusive subtasks so two subagents do not stomp on each other, collectively exhaustive so nothing required is dropped.

The orchestrator then delegates. Each delegation is a fresh worker context seeded with only the slice of information that subtask needs — not the whole conversation. This isolation is deliberate. A worker tasked with "fetch and validate the three SQL queries" should not be carrying the full reasoning history of the report's narrative section; that context would dilute its attention and burn tokens. The price is coordination overhead, which is exactly why multi-agent runs cost several times more than a single agent and should be reserved for genuinely parallel or genuinely separable work.

State: durable run state vs. conversational context

The single most important internal distinction is between two kinds of state. Durable run state lives in the control plane: the run ID, the budget consumed, the artifacts produced, the status of each subagent, and the verification verdict. It survives restarts and is what you query from a dashboard. Conversational context is the model's working memory inside a turn — the messages, tool results, and intermediate reasoning. It is ephemeral and bounded by the context window.

Engineers get into trouble when they treat the context window as a database. If a fact must survive across subagents or across a retry, it has to be written to durable state — an artifact, a structured result, a note the orchestrator can re-read — not left implicit in a transcript that the next turn may never see. The runtime helps by persisting structured outputs, but the design discipline is yours.

Where verification and budgets live

Two control mechanisms keep the loop from running forever. The first is the outcome check: a structured evaluation, often itself a Claude call with a rubric, that compares the produced artifact against the success criteria. The second is the budget: a hard ceiling on tokens, tool calls, wall-clock time, or subagent count. The loop terminates when the outcome check passes or the budget is exhausted, and it always emits a trace explaining which.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

This is why I tell teams to write success criteria as if a skeptical reviewer will read them. "Generate a summary" gives the verifier nothing to check; "produce a 200-word summary that cites at least two of the attached sources and names the top risk" gives it a rubric. The crispness of your outcome contract directly determines how reliable the managed loop is.

Layer	Owns	Lifetime	You configure
Control plane	Run record, budget, status	Durable	Budgets, criteria
Orchestrator turn	Plan, delegation, reconcile	Per run	System prompt, subagent menu
Worker turns	Scoped task + tools	Per subtask	Tool scopes, context slice
Verifier	Outcome check vs rubric	Per iteration	Success criteria

Common pitfalls

Vague success criteria. Without a checkable rubric, the verifier rubber-stamps weak output. Fix: write criteria as testable assertions.
Over-parallelizing. Fanning out subagents for sequential work multiplies token cost without speed gains. Fix: only spawn subagents for truly independent subtasks.
Leaking context across workers. Passing the full transcript to every subagent dilutes attention and inflates cost. Fix: pass the minimal slice each subtask needs.
Treating context as storage. Facts that must survive a retry belong in durable artifacts, not in the transcript. Fix: persist structured outputs explicitly.
No budget ceiling. An unbounded loop on an under-specified outcome can spiral. Fix: always set token and subagent budgets.

Trace the architecture in 6 steps

Write the outcome contract as checkable success criteria, not a vibe.
Set hard budgets: max tokens, max subagents, max wall-clock.
Define the subagent menu and the tools each type may touch.
Run once and read the orchestrator's decomposition plan in the trace.
Inspect durable artifacts after each iteration to confirm state is being persisted, not just discussed.
Tighten the success rubric wherever the verifier passed weak output.

Frequently asked questions

How is a Managed Agent different from the Agent SDK?

The Agent SDK gives you the primitives and you write the loop, the retries, and the termination logic yourself. A Managed Agent is a runtime that owns that loop for you: you supply an outcome contract and budgets, and it plans, delegates, verifies, and persists state on your behalf.

No. Each subagent runs in its own isolated context with a scoped tool set, seeded with only the information its subtask needs. That isolation is what prevents cross-contamination but is also why multi-agent runs consume several times more tokens than a single agent.

What decides when a run is finished?

The verification node compares the produced artifact against your success criteria. The run terminates when that check passes or when a budget — tokens, tool calls, time, or subagent count — is exhausted, and it always emits a trace explaining which condition ended the loop.

From architecture to your phone lines

CallSphere runs these same outcome-driven, multi-agent patterns where they are hardest to fake — live voice and chat. Our assistants plan, call tools mid-conversation, and verify that the caller's goal was actually met before hanging up. See the architecture working in production at callsphere.ai.

Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.

Inside Claude Managed Agents: Architecture & Internals

Key takeaways

What "managed" actually means under the hood

The end-to-end request path

How the orchestrator decomposes work

State: durable run state vs. conversational context

Where verification and budgets live

Common pitfalls

Trace the architecture in 6 steps

Frequently asked questions

How is a Managed Agent different from the Agent SDK?

What decides when a run is finished?

From architecture to your phone lines

Try CallSphere AI Voice Agents

Related Articles You May Like

Where Claude Cowork is heading and how to prepare

Where Claude Code GTM engineering is heading next

Measuring Claude Cowork success: metrics that prove it

How to measure success of Claude Code GTM workflows

Claude Cowork walkthrough: from problem to shipped

End-to-end Claude Code GTM workflow: a real rebuild

Key takeaways

What "managed" actually means under the hood

The end-to-end request path

How the orchestrator decomposes work

State: durable run state vs. conversational context

Where verification and budgets live

Common pitfalls

Trace the architecture in 6 steps

Frequently asked questions

How is a Managed Agent different from the Agent SDK?

Do subagents share a context window with the orchestrator?

What decides when a run is finished?

From architecture to your phone lines

Try CallSphere AI Voice Agents

Related Articles You May Like

Where Claude Cowork is heading and how to prepare

Where Claude Code GTM engineering is heading next

Measuring Claude Cowork success: metrics that prove it

How to measure success of Claude Code GTM workflows

Claude Cowork walkthrough: from problem to shipped

End-to-end Claude Code GTM workflow: a real rebuild