Claude Managed Agents Architecture: How It Fits Together
How Claude Managed Agents fit together end to end: the runtime loop, context assembler, tool/MCP layer, subagents, and control plane for production agents.
Most teams that try to ship an agent in 2026 don't fail at the model. They fail at the plumbing around it: the loop that decides when to call a tool, the layer that hands results back, the memory that survives between turns, and the supervision that keeps a long-running task from drifting. Claude Managed Agents exist to take that plumbing off your plate. But to use them well — and to trust them in production — you need a mental model of what's actually running underneath. This post walks the architecture end to end.
The short version: a managed agent is the Claude model wrapped in a hosted execution loop, with a context assembler in front of it, a tool-and-MCP layer beside it, optional subagents below it, and a control plane around all of it. When you understand how those layers talk to each other, the difference between a flaky demo and a dependable production agent stops being mysterious.
What a managed agent actually is
A Claude Managed Agent is a hosted, stateful execution loop in which the Claude model repeatedly observes the current context, decides on an action — answer, call a tool, or spawn a subagent — and incorporates the result, until the task is complete or a stop condition fires. The word that matters is managed: Anthropic runs the loop, the retries, the context packing, and the tool transport, so you supply the instructions, the tools, and the policies rather than the orchestration code.
This is the same primitive set that powers Claude Code, exposed for your own agents through the Claude Agent SDK. The value is not that you couldn't write an agent loop yourself — plenty of teams have — it's that the hard, boring parts (token budgeting, tool-call parsing, partial-failure handling, parallel subagent fan-out) are already battle-tested. You inherit a loop that has survived millions of real tasks instead of debugging your own from scratch.
The five layers, from prompt to result
Think of the system as five layers stacked between the user's request and the final answer. The context assembler gathers the system prompt, the task instructions, relevant skills, tool schemas, and recent conversation into the model's window. The reasoning core — Opus, Sonnet, or Haiku depending on how you've routed — produces the next action. The tool layer executes that action against MCP servers or built-in tools and returns structured results. The subagent layer lets the core delegate a bounded chunk of work to a fresh agent with its own clean context. And the control plane enforces limits, logs every step, and decides when to stop.
flowchart TD
A["User request"] --> B["Context assembler: prompt + skills + tool schemas"]
B --> C{"Reasoning core: next action?"}
C -->|Answer| H["Final response"]
C -->|Tool call| D["Tool layer / MCP servers"]
D --> E["Structured result"] --> C
C -->|Delegate| F["Subagent with fresh context"]
F --> G["Summarized result"] --> C
C -->|Stop condition| I["Control plane: budget & logs"] --> H
The arrows that loop back into the reasoning core are the whole point. A managed agent is not a single forward pass; it's a cycle. Each pass through the core can read the accumulated results of every previous action, which is why these systems can do real multi-step work — and also why context management, covered below, is the make-or-break concern.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
How context is assembled each turn
Every turn, the assembler rebuilds the model's working set. It always includes the system prompt and the task framing. It conditionally includes skills — folders of instructions and scripts that Claude pulls in only when the current step looks relevant — so the window isn't permanently bloated with capabilities you rarely use. It includes the schemas for the tools the agent is allowed to call, and it includes a compacted view of the conversation so far.
That compaction is critical. With a 1M-token window you have a lot of room, but long-running agents still overrun it if they dump raw tool output back every turn. The managed loop summarizes older turns and prunes redundant tool results so the most recent, most relevant material stays sharp. When you design an agent, your job is to make the high-signal facts easy for the assembler to keep and the noise easy to drop — a theme we return to in the context-design facet of this series.
The tool and MCP layer
Tools are how the agent touches the world. In the managed architecture, every tool — whether a built-in like file editing or a custom capability you expose — is described by a JSON schema the model reads, and reached through a transport the runtime manages. Model Context Protocol (MCP) is the open standard that lets you plug external systems in as servers: a database, a CRM, an internal API, a search index. The agent sees a uniform tool interface; MCP handles the wire format underneath.
Architecturally, this decoupling is what lets you reach production quickly. You don't rewrite your agent when you swap a data source — you point it at a different MCP server. The runtime handles serializing the call, invoking the server, catching transport errors, and feeding a structured result back into the loop. Your responsibility shrinks to writing good schemas and idempotent handlers, which the tool-wiring facet of this series covers in depth.
Subagents and bounded delegation
When a task has a self-contained chunk — research this one vendor, refactor this one module, validate this one dataset — the core can spawn a subagent. The subagent gets a fresh context window seeded only with the sub-task, runs its own loop, and returns a compact summary rather than its entire transcript. This keeps the parent's context clean and lets independent chunks run in parallel.
The trade-off is cost. A multi-agent run typically burns several times more tokens than a single-agent run because each subagent carries its own prompt and tool overhead. The architectural discipline is to delegate only when the sub-task is genuinely separable and the parallelism or context-isolation pays for itself. Used deliberately, subagents are how a managed agent stays coherent on tasks too big for one window; used reflexively, they just multiply your bill.
The control plane: limits, logging, and stop conditions
Around everything sits the control plane. It enforces token and step budgets so a confused agent can't loop forever, captures a structured trace of every action for debugging and evals, and evaluates stop conditions — task complete, budget exhausted, or a guardrail tripped. In production this layer is where your operational confidence comes from: you can replay a run, see exactly which tool returned what, and tighten a policy without touching the agent's instructions.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
This is also where the 10x-faster claim earns its keep. Because retries, logging, budgeting, and supervision come built in, you spend your engineering time on the parts that are actually specific to your problem — the tools, the prompt, and the evals — instead of rebuilding orchestration infrastructure every project. The architecture hands you a reliable spine and asks you to supply the domain.
Frequently asked questions
How is a managed agent different from just calling the Claude API in a loop?
A raw API loop is one tool call at a time with no built-in context compaction, retry logic, subagent support, or trace logging — you write all of that. A managed agent is that loop already hardened: the runtime packs context, parses and retries tool calls, fans out subagents, and logs every step. You inherit production behavior instead of reinventing it.
Where does MCP fit in the architecture?
MCP is the tool-layer transport. Each external system you want the agent to use is an MCP server exposing typed operations; the agent calls them through a uniform interface while the runtime manages serialization and errors. It decouples your agent from any specific integration, so you can swap data sources without rewriting the agent.
When should I add subagents versus keeping one agent?
Add subagents only when a sub-task is genuinely separable — independent research items, parallel file edits, isolated validation — because each subagent multiplies token cost. If the work is sequential and shares context, one agent is cheaper and simpler. Reach for delegation when parallelism or context isolation clearly outweighs the overhead.
Which model should run the loop?
Route by step difficulty. Use Opus 4.8 for hard planning and ambiguous reasoning, Sonnet 4.6 for the bulk of competent tool-driven work, and Haiku 4.5 for cheap high-volume steps like classification or extraction. Many production agents mix models across the loop rather than picking one for everything.
Bringing agentic AI to your phone lines
The same architecture — a managed loop, a tool layer, deliberate delegation, and a control plane — is exactly what CallSphere runs for voice and chat: multi-agent assistants that answer every call and message, call tools mid-conversation, and book real work around the clock. See it live at callsphere.ai.
Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.