---
title: "Claude Cowork Architecture: How Enterprise Agents Run"
description: "Inside Claude Cowork's enterprise architecture: how the planning model, plugins, MCP connectors, and sub-agents fit together end to end."
canonical: https://callsphere.ai/blog/claude-cowork-architecture-how-enterprise-agents-run
category: "Agentic AI"
tags: ["agentic ai", "claude", "claude cowork", "enterprise", "architecture", "mcp", "sub-agents"]
author: "CallSphere Team"
published: 2026-03-28T08:00:00.000Z
updated: 2026-06-07T01:28:22.748Z
---

# Claude Cowork Architecture: How Enterprise Agents Run

> Inside Claude Cowork's enterprise architecture: how the planning model, plugins, MCP connectors, and sub-agents fit together end to end.

The first time a non-engineering team adopts Claude Cowork at scale, the questions stop being "what can it do" and start being "where does the data go, who can it act as, and what happens when a sub-agent fails halfway through a multi-step task." Those are architecture questions, and the answer to most of them lives in how Cowork is assembled internally. If you are the platform engineer asked to make Cowork enterprise-ready, you cannot treat it as a chat box. You have to understand the layers underneath it well enough to reason about isolation, identity, and failure.

This post walks the full stack from the moment a knowledge worker types a request to the moment a tool mutates a record in a system of record. The goal is a mental model precise enough that you can design guardrails, debug a stuck run, and explain to a security reviewer exactly what crosses which boundary.

## Key takeaways

- Cowork is a layered runtime: a planning model on top, a plugin/skill loader in the middle, and MCP connectors plus sub-agents at the edges where real side effects happen.
- Plugins are the enterprise packaging unit — they bundle skills, connectors, and sub-agent definitions so governance happens at the plugin boundary, not per-prompt.
- Context is assembled just-in-time: skills and tool schemas load only when relevant, keeping the working context small and the planning model focused.
- Sub-agents run in their own context windows and report back summaries, which is what keeps long multi-step tasks from collapsing into one bloated transcript.
- Identity and isolation are enforced at the connector layer; the model never holds raw credentials, it requests actions that the connector authorizes.

## What problem the architecture actually solves

A single large language model call is stateless and blind to your systems. It can write beautiful prose about your quarterly close but it cannot open the ledger, reconcile two spreadsheets, file the variance, and notify the controller. Enterprise knowledge work is a sequence of grounded actions against real systems, with checkpoints where a human or a policy must approve. The architecture of Claude Cowork exists to turn one model into a system that can plan that sequence, ground each step in real data, take authorized actions, and stay coherent across dozens of steps.

Claude Cowork is Anthropic's agentic product for non-engineering knowledge work, where capabilities are packaged as plugins that bundle Agent Skills, MCP connectors, and sub-agents into a governed unit. That definition matters because every enterprise concern — who can do what, where data flows, how a task is audited — maps onto one of those packaged components. When you make Cowork enterprise-ready, you are really configuring and constraining the plugin boundary.

The reason this layered design beats a monolithic "do everything in one prompt" approach is context economics. The planning model has finite attention. If you stuff every tool schema, every policy document, and every data sample into one window, the model degrades and your token bill explodes. Layering lets the system load only what the current step needs.

## The layers, from request to side effect

Picture the runtime as four bands. At the top is the **orchestrating model** — an Opus- or Sonnet-class model that reads the user's intent, decides on a plan, and chooses what to load next. Below it is the **skill and plugin loader**, which matches the task to installed skills and brings their instructions into context. Below that sits the **connector layer** (MCP servers) that exposes tools with typed schemas and holds the actual auth. At the edge are **sub-agents**, each a fresh model context spawned to handle a bounded chunk of work and return a compact result.

```mermaid
flowchart TD
  A["Knowledge worker request"] --> B["Orchestrating model: plan"]
  B --> C{"Skill or tool needed?"}
  C -->|Skill| D["Plugin loader injects skill instructions"]
  C -->|Action| E["MCP connector with typed schema"]
  D --> B
  E --> F["Connector authorizes & calls system of record"]
  F --> G["Structured result returned"]
  B --> H["Spawn sub-agent for bounded subtask"]
  H --> I["Sub-agent summary back to orchestrator"]
  G --> J["Orchestrator composes & checkpoints"]
  I --> J
```

The arrows that loop back to the orchestrator are the important ones. After a skill loads or a sub-agent finishes, control returns to the planning model with new information, and it re-plans. This loop is what makes the system agentic rather than a fixed pipeline. It also means a failure at any node surfaces back to the orchestrator, which can retry, route around it, or pause for a human.

## How context is assembled just in time

The single most consequential design choice is lazy context assembly. Skills are not all loaded at startup. An Agent Skill is a folder of instructions, scripts, and resources that Claude pulls into context only when the task matches its trigger description. So the controller's "variance analysis" skill, the "vendor onboarding" skill, and the "board-deck formatting" skill can all be installed, yet only the relevant one occupies context during any given run.

The same is true for tool schemas. The connector advertises which tools exist, but the full input schema and usage notes are surfaced to the model when a tool becomes plausibly useful. This keeps the working context lean. In practice a well-structured Cowork run touches a few thousand tokens of skill and schema content rather than tens of thousands, which both sharpens the model's choices and cuts cost.

For enterprises this lazy assembly is also a governance lever. Because skills and connectors are scoped to a plugin, you can grant a finance team a plugin whose connectors only reach the ERP and whose skills only describe approved finance procedures. The model physically cannot reach systems no connector exposes.

## Sub-agents and why they keep long tasks coherent

Long knowledge tasks fail in single-context agents because the transcript fills with intermediate noise — raw query results, half-finished drafts, tool errors — until the model loses the thread. Cowork's answer is sub-agents. The orchestrator spawns a sub-agent with a tight brief, the sub-agent works in its own clean context window, and it returns only a summary. The messy middle never pollutes the orchestrator's context.

The tradeoff is real and worth stating plainly: multi-agent runs typically consume several times more tokens than a single-agent run, because each sub-agent carries its own context overhead. So you spawn sub-agents deliberately — for genuinely independent or large subtasks like "reconcile these 600 line items" — not for every trivial step. Used well, this is the difference between an agent that finishes a 40-step close and one that drifts at step 12.

| Layer | Holds | Enterprise concern it owns |
| --- | --- | --- |
| Orchestrating model | Plan, task state | Reasoning quality, checkpoints |
| Plugin / skill loader | Instructions, procedures | Approved methods, scoping |
| MCP connectors | Tool schemas, auth | Identity, data flow, permissions |
| Sub-agents | Isolated sub-context | Coherence, cost control |

## Common pitfalls when reasoning about the architecture

- **Assuming the model holds credentials.** It does not. The connector layer authorizes calls. If you map permissions to the model instead of the connector, your threat model is wrong.
- **Loading every skill globally.** Skills are meant to be triggered. Installing them all as always-on bloats context and dulls the planner. Scope them to plugins and let triggers do the work.
- **Spawning sub-agents reflexively.** Each one multiplies token cost. Reserve them for large or parallelizable subtasks, and pass tight briefs so they return small summaries.
- **Treating a checkpoint as cosmetic.** The orchestrator's pause-for-approval points are where policy lives. Skipping them to "speed things up" removes the only human gate before a side effect.
- **Ignoring re-planning loops in debugging.** When a run stalls, look at what came back to the orchestrator — a failed tool result or an empty sub-agent summary — not just the last visible message.

## Map your deployment in five steps

1. Inventory the systems of record each team must touch, and list the MCP connectors that would expose them.
2. Group connectors and the skills that describe their procedures into one plugin per team or workflow.
3. Define which steps require a human checkpoint before a side effect, and confirm the orchestrator surfaces them.
4. Decide which subtasks justify a sub-agent versus inline execution, based on size and independence.
5. Trace one real end-to-end task on paper through all four layers, naming where identity is checked and where data crosses a boundary.

## Frequently asked questions

### How is Claude Cowork different from Claude Code architecturally?

They share primitives — skills, MCP connectors, sub-agents, and a planning model — but Claude Code targets engineering work in a terminal or IDE, while Cowork targets non-engineering knowledge work and packages capabilities as plugins for less technical users. The orchestration model underneath is the same family of ideas.

### Where do MCP servers sit in the stack?

At the connector layer, between the planning model and your systems of record. The model requests a typed action; the MCP server validates it against a schema, applies auth, calls the backend, and returns structured data. The model never sees raw credentials.

### Do sub-agents share memory with the orchestrator?

No. Each sub-agent runs in its own context window and reports back a summary. That isolation is deliberate — it keeps the orchestrator's context clean and is the main reason long tasks stay coherent.

### How do I keep context small in a large deployment?

Lean on lazy loading. Install many skills but rely on their trigger descriptions so only relevant ones enter context, scope connectors to plugins, and reserve sub-agents for large subtasks rather than every step.

## Bringing agentic AI to your phone lines

CallSphere takes these same architecture patterns — planning model, scoped tools, and isolated sub-agents — and applies them to **voice and chat**, so AI assistants answer every call and message, pull data mid-conversation, and book work around the clock. See the architecture in action at [callsphere.ai](https://callsphere.ai).

---

*Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.*

---

Source: https://callsphere.ai/blog/claude-cowork-architecture-how-enterprise-agents-run
