---
title: "Claude Cowork architecture: how the pieces fit together"
description: "Inside Claude Cowork's internals — the orchestration loop, context assembly, skills, MCP connectors, and sub-agents that run real knowledge work end to end."
canonical: https://callsphere.ai/blog/claude-cowork-architecture-how-the-pieces-fit-together
category: "Agentic AI"
tags: ["agentic ai", "claude", "claude cowork", "mcp", "agent architecture", "anthropic", "knowledge work"]
author: "CallSphere Team"
published: 2026-06-05T08:00:00.000Z
updated: 2026-06-06T20:01:42.267Z
---

# Claude Cowork architecture: how the pieces fit together

> Inside Claude Cowork's internals — the orchestration loop, context assembly, skills, MCP connectors, and sub-agents that run real knowledge work end to end.

The first time you watch Claude Cowork take a vague request like "reconcile last month's invoices against the contracts and flag the discrepancies" and actually do it — opening files, calling a billing system, writing a summary doc — it can feel like magic. It is not magic. Underneath is a fairly legible architecture: a planning loop, a context assembly layer, a set of dynamically loaded skills, and a connector layer that reaches out to your real systems through the Model Context Protocol. Understanding how those pieces fit together is the difference between trusting the tool blindly and being able to design work for it that reliably succeeds.

This post walks the full stack of Claude Cowork — the agentic product Anthropic built for non-engineering knowledge work — from the moment a request lands to the moment a deliverable is produced. The goal is a mental model accurate enough that you can predict its behavior, debug it when it stalls, and extend it on purpose.

## What problem the architecture is actually solving

Knowledge work is not a single task; it is a loosely specified sequence of sub-tasks that each touch different tools and data. A human analyst reconciling invoices switches between a spreadsheet, an email thread, a contract PDF, and a finance app, holding a running model of what "done" looks like. The architectural challenge for an agent is to reproduce that fluid switching without a hard-coded script, because the next request will look nothing like this one.

Claude Cowork answers this with a model-driven control loop rather than a workflow engine. There is no pre-built flowchart of "first do A, then B." Instead, the model itself decides each next action based on the current state, the available tools, and the instructions it has been given. The architecture's job is to feed that loop the right context at the right moment and to execute whatever the model decides — safely.

## The four layers, top to bottom

It helps to think of the system as four cooperating layers. The **orchestration loop** is the heartbeat: read state, ask the model for the next step, execute it, observe the result, repeat until the goal is met or a stop condition fires. The **context assembly layer** decides what the model sees on each turn — the task, recent observations, loaded skills, and tool schemas — while aggressively leaving out everything irrelevant. The **capability layer** is the union of skills (instructions and scripts) and connectors (MCP servers) that define what the agent can actually do. The **execution and safety layer** runs tool calls, enforces permissions, and gates anything irreversible.

```mermaid
flowchart TD
  A["User request"] --> B["Orchestration loop"]
  B --> C["Context assembly: task, state, skills, schemas"]
  C --> D{"Model decides next action"}
  D -->|Use capability| E["Skill script or MCP connector call"]
  E --> F["Execution & safety: permissions, gating"]
  F --> G["Observation returned to loop"]
  G --> B
  D -->|Goal met| H["Deliverable produced"]
```

The arrow from observation back into the loop is the most important edge in the whole diagram. Each tool result becomes new context, the model re-plans against it, and the cycle continues. This is why agentic systems can recover from surprises a rigid script never could: a failed API call is just another observation the model reasons about.

## How context assembly actually works

The single biggest lever on quality is what lands in the model's context window on each turn. Claude Cowork does not dump everything in. On a given step it typically assembles the original goal, a compacted history of what has happened, the schemas of currently relevant tools, and the body of any skill the model has chosen to load. A skill that is not relevant to the current task contributes only a one-line description until the model decides it is needed — at which point its full instructions are pulled in. This progressive disclosure keeps the window focused and the reasoning sharp.

As a run gets long, raw history would overflow even a large window. The architecture handles this by summarizing older turns into compact state — "invoices 1 through 40 reconciled, three discrepancies found and listed" — rather than carrying every raw tool output forever. Engineers extending Cowork should design their skills and tool outputs to be summarization-friendly: return structured, small results, not giant blobs that crowd out the model's working memory.

## Skills and connectors: the capability layer in detail

An Agent Skill is a folder of instructions, optional scripts, and resources that Claude loads dynamically when the task calls for it. A connector is an MCP server that exposes tools and data — a Google Drive connector, a finance system, a CRM. In Claude Cowork these are bundled as **plugins**: a plugin packages the skills, connectors, and any sub-agents needed for a domain so a team can install a coherent capability in one move rather than wiring five things by hand.

The clean separation matters. Skills carry the know-how ("here is our discrepancy policy, here is how we format the report"); connectors carry the reach ("here is how to read the contracts and write the doc"). The model is the reasoning that joins them. When you find yourself wishing the agent "just knew" your process, the answer is almost always a skill, not a bigger prompt — because a skill is loaded only when relevant and can carry far more detail than you would ever want permanently in context.

## Sub-agents and when the loop forks

For larger jobs, the orchestration loop can spawn sub-agents — separate context windows running their own loops on a slice of the work, reporting back a condensed result. A sub-agent reconciling one vendor's invoices does its messy intermediate work in its own window and hands back only the verdict, keeping the orchestrator's context clean. This is the same orchestrator–subagent pattern used across Claude's agentic tools. It buys parallelism and isolation, but multi-agent runs typically consume several times more tokens than a single agent, so the architecture forks deliberately, not by default.

## What to watch for when you build on it

Three failure modes recur. First, **context pollution**: a connector that returns a 50-page raw document instead of the relevant page degrades every subsequent decision. Shape your tool outputs. Second, **under-specified skills**: if the discrepancy policy lives in someone's head and not in a skill, the agent guesses. Write the know-how down. Third, **missing safety gates**: anything that sends email, moves money, or deletes data should require confirmation in the execution layer, because the model will occasionally be confidently wrong. Design the gates before you need them.

## Frequently asked questions

### What is Claude Cowork?

Claude Cowork is Anthropic's agentic product for non-engineering knowledge work, where plugins bundle skills, MCP connectors, and sub-agents so the model can plan and execute multi-step tasks against your real tools and data.

### How is Cowork different from a workflow automation tool?

A workflow tool runs a fixed sequence you wired in advance. Cowork is model-driven: the model decides each next step from the current state, so it adapts to requests you never explicitly scripted and recovers from unexpected results mid-run.

### Why do skills load dynamically instead of always being present?

Progressive disclosure keeps the context window focused. Each skill contributes only a short description until the task makes it relevant, then its full instructions load. This lets a team install many skills without drowning the model in irrelevant detail.

### When should I use sub-agents in Cowork?

Use sub-agents when the work splits cleanly into parallel slices or when a noisy sub-task would otherwise pollute the main context. Because multi-agent runs cost several times more tokens, reserve the pattern for jobs where the isolation or parallelism clearly pays off.

## Bringing agentic AI to your phone lines

CallSphere takes these same architectural ideas — a planning loop, tools mid-conversation, and clean context — and applies them to **voice and chat**, so agents answer every call, look things up in real time, and book work around the clock. See it live at [callsphere.ai](https://callsphere.ai).

---

Source: https://callsphere.ai/blog/claude-cowork-architecture-how-the-pieces-fit-together