Inside Claude Cowork Plugins: Enterprise Architecture
How Claude Cowork plugins work end to end: skills, MCP connectors, sub-agents, routing, and context isolation across enterprise teams.
When a finance analyst, a recruiter, and a support lead all install "the same" Claude Cowork plugin, they are not getting one monolithic program. They are loading a bundle that quietly fans out into instructions, connectors, and helper agents — most of which never run unless the moment calls for them. Understanding that fan-out is the difference between a plugin that feels magical and one that silently does the wrong thing on a Friday close. This post walks the full internal architecture of a Cowork plugin across an enterprise, from the moment a user types a request to the moment a sub-agent writes a row back into a system of record.
I'll assume you've used Claude Cowork at least once and want to know what's underneath. The short version: a plugin is a packaging boundary, not a process. The interesting engineering is in how Cowork decides what to load, when, and with whose permissions.
What a Cowork plugin actually contains
A Claude Cowork plugin is a versioned bundle that packages three kinds of capability so non-engineering teams can install agentic behavior in one click: Agent Skills (folders of instructions, scripts, and reference files Claude loads on demand), connectors built on the Model Context Protocol (MCP) that expose external tools and data, and sub-agents that run scoped tasks in their own context windows. The manifest at the root of the bundle declares all three, plus metadata: which org units may install it, what scopes the connectors need, and which model tier the sub-agents should default to.
The crucial design point is that none of this is eagerly loaded. The manifest is cheap to read; the skill bodies, tool schemas, and sub-agent prompts are pulled into context lazily. A plugin can declare twenty skills and the main agent may load zero of them for a given conversation. That lazy posture is what keeps a department-wide plugin from blowing the context budget the instant someone installs it.
The discovery and routing layer
The first real piece of machinery is discovery. When a user makes a request, Cowork doesn't search the full text of every skill. Instead it reads the compact descriptions declared in each skill's metadata — a name and a one-line trigger — and lets the model decide which skills are relevant. Only the chosen skills get their full bodies expanded into the working context. This keeps the always-on footprint tiny while still giving the model a menu of everything it could reach for.
Routing is the same idea applied to work, not just instructions. The orchestrating agent reads the request, the loaded skills, and the available connectors, then decides whether to answer directly, call a tool, or delegate a chunk of work to a sub-agent. The diagram below shows that decision flow for a single enterprise request.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
flowchart TD
A["User request in Cowork"] --> B["Read plugin manifest & skill descriptions"]
B --> C{"Relevant skill?"}
C -->|No| D["Answer from base context"]
C -->|Yes| E["Expand skill body into context"]
E --> F{"Needs external data or write?"}
F -->|No| G["Compose answer"]
F -->|Yes| H["Call MCP connector with scoped token"]
H --> I{"Large or parallel subtask?"}
I -->|Yes| J["Spawn scoped sub-agent"]
I -->|No| G
J --> G
Notice that the model passes through several cheap gates before it spends tokens or touches a system of record. Each gate is a place where an enterprise can attach policy: a skill might be allowed for the support org but not legal, a connector might be read-only for one role and read-write for another.
How connectors bridge to systems of record
The connector layer is where Cowork stops being a chat product and becomes an operator. Each connector is an MCP server — sometimes hosted by the vendor whose system it fronts, sometimes self-hosted inside the enterprise's own network. The server advertises a set of tools, each with a JSON schema describing its inputs and outputs. When the agent decides to call one, Cowork serializes the arguments against that schema, attaches a scoped credential, and sends the call over the protocol's transport.
The architectural win is that the model never holds a raw API key or learns the quirks of a vendor's REST endpoint. It learns the tool: "create_ticket takes a title, body, and priority." The MCP server owns authentication, rate limiting, and the translation into whatever the underlying system actually wants. For an enterprise, this means you can swap the system of record behind a connector — migrate from one ticketing vendor to another — without touching a single skill or sub-agent prompt, as long as the tool surface stays stable.
Connectors also localize blast radius. Because each one carries its own scope, a misbehaving plugin in marketing cannot reach into the HR connector it was never granted. The permission boundary lives at the connector, not in the prompt, which is exactly where a security team wants it.
Sub-agents and context isolation
The third structural piece is delegation. When a task is large, repetitive, or parallelizable — reconcile forty invoices, screen three hundred resumes, summarize a quarter of support tickets — the orchestrator spawns sub-agents. Each sub-agent runs in its own context window with a narrow brief and only the tools it needs. It does the work, returns a compact result, and disappears. The orchestrator never sees the sub-agent's intermediate reasoning, only its conclusion.
This isolation is doing two jobs. First, it protects the main context from being flooded with raw intermediate data, which keeps the orchestrator coherent over a long session. Second, it lets independent subtasks run in parallel, so a batch job finishes in a fraction of the wall-clock time. The cost is real: multi-agent runs typically burn several times more tokens than a single agent doing the same work serially, so a well-built plugin reserves delegation for tasks where the breadth genuinely pays for itself.
Putting it together across an enterprise
Now zoom out to the org. The same plugin installed in three departments behaves differently because the things around it differ: the connectors are bound to each team's systems, the role determines which scopes are live, and the loaded skills depend on the actual requests people make. The plugin is constant; the runtime context is per-team. This is what lets a central platform group ship one "customer operations" plugin and have it do the right, scoped thing for support, success, and billing without three separate builds.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
The failure modes follow from the architecture too. Over-broad skill descriptions cause the model to load skills it doesn't need, wasting context. Connectors with sloppy schemas cause malformed tool calls. Sub-agents handed too much context lose the focus that made delegation worthwhile. Each of these is fixable precisely because the layers are separable — you can tighten a skill trigger without redeploying the connector.
Frequently asked questions
Is a Cowork plugin a single program or a bundle?
It's a bundle. A plugin packages Agent Skills, MCP connectors, and sub-agent definitions behind one manifest, but each piece is loaded lazily and runs in its own boundary. The plugin is a distribution and permission unit, not a single running process.
How does Cowork decide which skills to load?
It reads the short description and trigger declared in each skill's metadata, lets the model judge relevance to the current request, and only then expands the full body of the chosen skills into context. Unused skills cost almost nothing because their bodies are never pulled in.
Where do credentials live in this architecture?
Inside the MCP connector, not the model. The agent calls named tools with structured arguments; the connector attaches the scoped credential and talks to the underlying system. This keeps secrets out of the prompt and lets the enterprise enforce role-based scopes at the connector boundary.
When should a plugin use sub-agents?
For work that is broad, repetitive, or parallelizable enough that running it in isolated context windows outweighs the extra token cost. For a single focused answer, the orchestrator should handle it directly rather than delegating.
Bringing agentic AI to your phone lines
The same layered design — skills for know-how, connectors for tools, sub-agents for scale — is what powers CallSphere's voice and chat agents, which answer every call and message, reach into your systems mid-conversation, and book real work around the clock. See it live at callsphere.ai.
Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.