Claude Cowork Architecture: How the Pieces Fit (Deploy Cowork Across Enterprise)
End-to-end look at Claude Cowork's internals — plugins, skills, MCP connectors, and sub-agents — and how to govern each layer in an enterprise rollout.
When an enterprise decides to roll Claude Cowork out to thousands of non-engineering staff, the first hard question is rarely "will it answer well?" — it's "what is actually running underneath, and what do we have to govern?" Cowork looks like a single chat surface, but the thing executing your finance analyst's request is a layered runtime: a model loop, a tool layer, a skill loader, a set of MCP connectors, and an orchestration tier that can fan work out to sub-agents. If you deploy it without understanding how those layers hand off to each other, you end up debugging behavior you can't see. This post walks the whole stack end to end so you know exactly what each piece does and where your controls live.
Key takeaways
- Claude Cowork is an agent runtime, not a chatbot: a model loop wrapped by a tool layer, skill loader, connector layer, and sub-agent orchestrator.
- Plugins are the enterprise packaging unit — they bundle skills, MCP connectors, and sub-agents so admins distribute capability, not config.
- Skills are loaded dynamically by relevance; only their short descriptions sit in context until Cowork decides to open one.
- MCP connectors are where every external read/write happens — so they are also where your auth, scoping, and audit controls belong.
- Sub-agents run in isolated context windows; the orchestrator passes them a brief and receives a result, not their raw transcript.
What Claude Cowork actually is
Claude Cowork is Anthropic's agentic product for non-engineering knowledge work — it gives a marketing, finance, legal, or operations user the same agent primitives that Claude Code gives engineers, but packaged for documents, spreadsheets, tickets, and business systems instead of repositories. Under the surface it runs the same core loop: the model receives a goal, decides whether it needs a tool, calls it, reads the structured result, and continues until the task is done or it needs the human.
The important architectural fact is that the model is not the system. The model is one component inside a runtime that decides which tools are exposed, which skills are visible, which connectors are authorized for this user, and whether a request is large enough to warrant spawning helpers. When you deploy across an enterprise you are configuring that runtime far more than you are configuring the model itself.
The four layers, end to end
Think of a single Cowork request as descending through four layers and coming back up. The orchestration layer receives the user goal and decides whether one agent can handle it or whether it should delegate. The skill layer surfaces relevant instruction folders so the agent knows how to do the specialized task. The tool and connector layer is where the agent actually touches your data through MCP. The model loop sits in the middle of all of it, reasoning across steps.
flowchart TD
A["User goal in Cowork"] --> B["Orchestrator: one agent or many?"]
B -->|Single| C["Model loop"]
B -->|Delegate| D["Spawn sub-agents"]
C --> E{"Skill relevant?"}
E -->|Yes| F["Load skill instructions"]
E -->|No| G["Proceed with base context"]
F --> H["Call MCP connector"]
G --> H
H --> I["Connector returns structured data"]
D --> H
I --> J["Compose result for user"]
The diagram makes the key handoffs visible. Notice that both the single-agent path and the sub-agent path converge on the same connector layer — that is deliberate, and it is good news for governance, because it means every external action funnels through one auditable boundary regardless of how the work was orchestrated.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Plugins: the unit of enterprise distribution
In a single-user setup you might hand-wire a skill here and a connector there. That does not scale to a company. Cowork solves this with plugins. A plugin is a bundle that contains skills (the how-to knowledge), connectors (the MCP servers that reach your systems), and optionally sub-agent definitions. An admin publishes a plugin once, and every user in scope receives a coherent capability rather than a pile of loose settings.
This matters because it changes the deployment mental model. You are not configuring Cowork per person; you are authoring capability packages and assigning them to roles. A "Quarterly Close" plugin for finance might bundle a close-checklist skill, a connector to the ERP, and a reconciliation sub-agent. A "Contract Review" plugin for legal bundles a clause-library skill and a connector to the contract store. Distribution becomes a publishing problem, which is exactly the shape enterprise IT already knows how to manage.
How skills stay cheap until they're needed
A naive design would stuff every instruction the agent might need into the prompt. That wastes context and degrades reasoning. Cowork instead keeps skills dormant: only a short name and description for each available skill sits in context. The model reads that index, and when a task clearly matches — "reconcile the AP ledger" matches the reconciliation skill — it loads the full skill folder on demand, pulling in the detailed steps, scripts, and reference material only at that moment.
The architectural payoff is that you can ship hundreds of skills to a user without drowning the model. The cost of an unused skill is a one-line description, not a page of instructions. When you are planning an enterprise rollout, this is what lets you give every department deep, specific procedures without every agent paying for every department's knowledge on every turn.
Sub-agents and context isolation
For larger jobs the orchestrator spawns sub-agents, each with its own fresh context window. The orchestrator hands a sub-agent a focused brief — "summarize these forty contracts for renewal risk" — and receives back a result, not the sub-agent's entire reasoning transcript. This isolation is what keeps a big task from collapsing under its own token weight: each helper works in a clean window, and only distilled findings flow back up.
The tradeoff is real and worth stating plainly: multi-agent runs typically consume several times more tokens than a single agent doing the same work, because each sub-agent re-establishes its own context. So the orchestrator should delegate deliberately, not reflexively. A good rule for enterprise tuning is to reserve fan-out for genuinely parallelizable, high-volume tasks and keep everyday requests on the single-agent path.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Common pitfalls
- Treating connectors as plumbing instead of policy. Every external read and write happens at the connector layer, so that is where data scoping and least-privilege belong. Wire a connector with broad credentials and you've granted every agent that breadth.
- Over-bundling plugins. A mega-plugin that ships every skill to everyone defeats the dynamic-loading design and makes audits painful. Bundle by role and task, not by department-wide convenience.
- Letting the orchestrator fan out by default. Sub-agents multiply token cost. If you see runaway usage, check whether routine tasks are being delegated when a single agent would do.
- Assuming the model is the security boundary. It isn't — the connector auth and the plugin assignment are. Govern those layers, not the prose of your prompts.
- Ignoring skill descriptions. Because loading is triggered by the one-line description, a vague description means the right skill never fires. The index line is load-bearing.
Map your deployment in 5 steps
- Inventory the systems Cowork must touch and define one MCP connector per system with least-privilege credentials.
- Write skills for each repeatable procedure, giving each a crisp, trigger-friendly one-line description.
- Group skills and connectors into role-scoped plugins (finance, legal, ops) rather than one global bundle.
- Decide which tasks justify sub-agents and document the delegation policy so usage stays predictable.
- Assign plugins to roles, turn on connector-level audit logging, and pilot with one team before company-wide release.
Single agent vs. orchestrated sub-agents
| Dimension | Single agent | Orchestrated sub-agents |
|---|---|---|
| Best for | Most everyday tasks | Large, parallelizable jobs |
| Token cost | Baseline | Several times higher |
| Context isolation | One shared window | Fresh window per sub-agent |
| Failure blast radius | Whole task | Contained per sub-agent |
Frequently asked questions
What is Claude Cowork in one sentence?
Claude Cowork is Anthropic's agentic product that brings agent primitives — skills, MCP connectors, and sub-agents, packaged as plugins — to non-engineering knowledge work like documents, spreadsheets, and business systems.
Where do my security controls live in the architecture?
Primarily at the MCP connector layer, where every external read and write passes through, and at the plugin-assignment layer, which decides who gets which capability. Those two boundaries, not the model's prompts, are your real controls.
How do skills avoid bloating the context window?
Only a short description for each available skill stays in context. Cowork loads a skill's full instructions only when the task clearly matches its description, so unused skills cost almost nothing per turn.
When should I let Cowork use sub-agents?
Reserve sub-agents for genuinely large or parallel work, because each runs in its own context and multiplies token usage. Keep routine requests on the single-agent path for predictable cost.
Bringing agentic AI to your phone lines
CallSphere takes these same layered agent patterns — orchestration, tool calls, and skills — and points them at voice and chat, so a multi-agent assistant answers every call, looks things up mid-conversation, and books the work around the clock. See it live at callsphere.ai.
Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.