Claude Cowork Architecture: How the Pieces Fit
How Claude Cowork works internally — the agent loop, dynamically loaded skills, MCP connectors, and sub-agents, fit together end to end for knowledge work.
The first time most teams open Claude Cowork, they treat it like a smarter chat window. That mental model breaks the moment a task spans three tools, a long document, and a multi-step plan. To actually get good results — and to debug the runs that go sideways — you need a picture of what is happening underneath. This post walks the full architecture of Claude Cowork end to end: where the model sits, how skills and connectors get pulled in, and how a single request becomes a coordinated set of actions across your work tools.
Claude Cowork is Anthropic's agentic product for non-engineering knowledge work, where bundles called plugins package together skills, MCP connectors, and sub-agents so that a request like "reconcile last month's invoices and draft the summary" can be executed rather than merely answered. That definition matters for everything below, because each of those three ingredients — skills, connectors, sub-agents — is a distinct layer with its own lifecycle.
What sits at the center: the model and the agent loop
At the core is a Claude model — typically Sonnet 4.6 for everyday throughput or Opus 4.8 for the hardest reasoning — running inside an agent loop. The loop is deceptively simple: Claude receives the current context, decides on the next action (call a tool, read a file, ask a question, or finish), the action executes, the result is appended to context, and the loop repeats. Everything sophisticated about Cowork is really about what gets fed into that loop and what tools are exposed on each turn.
The loop is also where the model's planning lives. Rather than emitting one giant answer, Claude works in increments: it forms a plan, takes a step, observes the outcome, and revises. This is why Cowork can recover when a connector returns an unexpected error or a spreadsheet has a column it did not anticipate — the next turn simply sees the failure and adapts. Understanding this turn-by-turn structure is the single most useful thing for predicting how Cowork will behave on a messy real-world task.
The three layers that wrap the model
Around the loop sit three layers. Skills are folders of instructions, scripts, and resources that Claude loads dynamically only when a task looks relevant — a "brand voice" skill, an "expense policy" skill, a "quarterly report format" skill. Connectors are MCP servers that expose your real tools and data: your document store, calendar, CRM, ticketing system. Sub-agents are scoped child runs that the main agent can spawn to handle a contained piece of work with its own fresh context.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
flowchart TD
A["User request in Cowork"] --> B["Agent loop (Claude model)"]
B --> C{"Relevant skill?"}
C -->|Yes| D["Load skill instructions & scripts"]
C -->|No| E["Proceed with base context"]
D --> F{"Need external data?"}
E --> F
F -->|Yes| G["Call MCP connector"] --> H["Structured result back to loop"]
F -->|No| I["Spawn sub-agent for subtask"]
H --> J["Compose result & next step"]
I --> JThe diagram makes the key insight visible: skills, connectors, and sub-agents are not invoked in a fixed order. On each turn the model chooses which layer it needs. A task might load a skill, then call two connectors, then spawn a sub-agent, then call a connector again. The plugin you install simply makes these capabilities available; the loop decides when to reach for each one.
How a request becomes a plan
When a request arrives, Cowork first assembles context. This includes the system prompt and the active plugin's metadata, a short index of which skills exist (names and one-line descriptions, not full bodies), the list of connected tools and their schemas, and any documents you've attached. Crucially, skill bodies are not all loaded up front — only their descriptions are. This progressive disclosure keeps the context lean so the model isn't drowning in instructions it doesn't need for the current job.
From that assembled context, Claude drafts a plan. For a request to "prepare the board deck update," the plan might be: pull the latest metrics from the analytics connector, load the "board deck format" skill for structure, draft each section, then hand a fact-check pass to a sub-agent. The plan is not rigid — it is a starting intention the loop will revise as real data comes back. This is why a clear, well-scoped request produces a dramatically better plan than a vague one.
Where skills get discovered and loaded
Skill discovery is a two-stage process worth understanding because it explains a lot of "why didn't it use my skill" confusion. In stage one, Claude sees only the skill's name and description in context. If the current task semantically matches, in stage two it loads the full skill folder — the detailed instructions, any reference files, and any helper scripts. Only then does the guidance actually shape behavior.
The practical consequence: your skill descriptions are doing real routing work. A description like "formatting" is too vague to reliably trigger; "format and structure the monthly board deck, including the metrics table and risk section" gives the model the signal it needs. Teams that get the most from Cowork treat skill descriptions as a routing layer, not an afterthought, and they keep skill bodies focused so that when one loads, it doesn't bloat context.
How connectors move data in and out
Connectors are MCP servers, and the protocol is what makes them composable. Each connector advertises a set of tools with typed input and output schemas. When the loop decides it needs, say, the latest invoices, Claude emits a tool call matching the schema; the connector executes against the real system and returns structured data; that data lands back in context for the next turn. Because everything is schema-described, Claude can chain tools across different connectors without bespoke glue.
This is also where most production failures live. A connector that returns ambiguous errors, lacks idempotency on writes, or returns enormous unfiltered payloads will degrade the whole run. Good connector design — tight schemas, clear error messages, paginated or summarized responses — is the difference between an agent that quietly succeeds and one that burns turns thrashing. The architecture rewards connectors that behave like well-designed APIs.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
When and why sub-agents enter the picture
Sub-agents handle work that benefits from isolation. If the main run needs to verify fifteen claims against source documents, doing that inline would flood its context with quoted passages it doesn't need to keep. Instead it spawns a sub-agent with a narrow brief — "check these claims, return pass/fail with citations" — that works in its own fresh context and returns only the verdict. The parent stays focused; the detail stays contained.
The tradeoff is cost. Multi-agent runs typically consume several times more tokens than a single-agent pass, because each sub-agent carries its own context and overhead. So the architecture supports parallel, isolated work, but the discipline is to spawn sub-agents only when the isolation genuinely pays for itself — large fan-out research, independent verification, or parallelizable subtasks — not for every minor step.
Frequently asked questions
Is Claude Cowork just Claude Code for non-engineers?
They share primitives — the agent loop, skills, MCP connectors, sub-agents — but the surface differs. Claude Code targets the terminal, IDE, and coding workflows; Cowork targets knowledge work like analysis, drafting, and operations, packaged through plugins. The internals rhyme, which is why understanding one transfers to the other.
Do all my skills load into every conversation?
No. Only skill names and short descriptions are present by default. The full skill folder loads only when the current task matches its description, which keeps context lean and is why descriptive, specific skill descriptions matter so much.
How does Cowork decide between calling a tool and answering directly?
On each turn the model evaluates whether it has enough information to take the next step. If the answer requires real data it doesn't hold — current invoices, today's calendar, a live CRM record — it calls the relevant connector. If it can reason from existing context, it answers directly. The loop, not a fixed rule, makes that call.
What's the most common architectural mistake teams make early?
Overloading a single run with too many connectors and skills at once. Because context is a shared budget, dumping every tool and every instruction in degrades reasoning. Scoping plugins to a job, writing tight skill descriptions, and reserving sub-agents for genuine fan-out keeps the architecture working in your favor.
Bringing agentic AI to your phone lines
The same architecture — an agent loop, dynamically loaded skills, schema-typed tool connectors, and scoped sub-agents — is exactly what CallSphere runs on voice and chat, so its assistants answer every call and message, use tools mid-conversation, and book real work around the clock. See it live at callsphere.ai.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.