Inside Claude Cowork: the agentic architecture explained
How Claude Cowork is built end to end — the planning loop, skills, MCP connectors, sub-agents, and context engine that make agentic knowledge work reliable.
The first time most knowledge workers watch Claude Cowork take a vague request like "reconcile last quarter's expense report against the contracts folder and flag anything weird," finish it, and hand back a tidy summary, the obvious question is: what is actually happening underneath? It does not feel like a chatbot. It opens files, runs little scripts, calls into your tools, second-guesses itself, and keeps going until the job is done. Understanding that machinery is the difference between treating Cowork as a novelty and trusting it with real work.
This post pulls the cover off. Claude Cowork is Anthropic's agentic product for non-engineering knowledge work, and architecturally it shares the same primitives that power Claude Code — a planning-and-execution loop, dynamically loaded skills, MCP connectors, and sub-agents — repackaged for people who write documents and reconcile spreadsheets rather than ship code. We will walk the whole stack, from the moment a request lands to the moment the model decides it is finished.
What problem the architecture is actually solving
A plain language model answers in one shot: prompt in, text out. That works for drafting an email and falls apart the instant a task needs more than one step, more than one tool, or more information than fits in a single prompt. Knowledge work is almost never one shot. Reconciling expenses means reading several files, comparing values, noticing exceptions, and writing them up — a loop, not a single function call.
The architecture exists to turn a single model into something that can plan, act, observe, and revise over many turns without a human steering every step. The core insight is that the model is not merely generating an answer; it is generating actions, watching what those actions return, and feeding the results back into its own context to decide what to do next. Everything else — skills, connectors, sub-agents — is scaffolding that makes that loop reliable, grounded, and safe.
This also explains why Cowork feels qualitatively different from a search box. A search box retrieves; an agent operates. The difference is the loop, and the loop is the load-bearing wall of the whole product.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
The agentic loop at the center
At the heart of Cowork is a control loop that runs until the task is complete or a guardrail stops it. Each iteration, the model receives the current context, decides whether to respond, call a tool, or load a skill, and the harness executes that decision and appends the result. An agentic loop is the cycle in which a model proposes an action, an external runtime executes that action, and the observed result is fed back into the model's context so it can choose the next action.
Concretely, a single turn looks like this: the model emits a structured block — sometimes plain text for the user, sometimes a tool-use request with a name and JSON arguments. If it is a tool request, the runtime pauses generation, runs the tool, and returns a tool-result block that the model reads on the next turn. The model never executes anything itself; it only proposes, and a trusted harness decides what to actually run. That separation is what keeps the system auditable.
flowchart TD
A["User request lands"] --> B["Build context: instructions & task"]
B --> C{"Model decides next action"}
C -->|Answer ready| H["Return result to user"]
C -->|Need a skill| D["Load matching skill"]
C -->|Need data/tool| E["Call MCP connector"]
C -->|Big subtask| F["Spawn sub-agent"]
D --> G["Append observation to context"]
E --> G
F --> G
G --> CSkills: the model's loadable know-how
Cowork does not stuff every possible instruction into one giant system prompt. Instead it uses Agent Skills — folders of instructions, scripts, and resources that the model loads dynamically only when a task makes them relevant. A skill might package the exact steps your finance team uses to format a board deck, or the regex rules for cleaning a vendor list. The model sees lightweight metadata about every available skill, and when a request matches, it pulls the full skill body into context.
This is progressive disclosure applied to capability. Keeping unused instructions out of context preserves the model's attention for the task at hand and keeps token usage sane, while still giving it deep, specific procedures the moment they are needed. In Cowork these skills frequently arrive bundled inside plugins — packages that combine related skills, the MCP connectors they depend on, and any sub-agents they orchestrate, so a non-technical user installs one thing and gets a coherent capability.
MCP connectors: how Cowork touches your tools
None of this matters if the agent cannot reach your actual data. That is the job of the Model Context Protocol. MCP is an open standard, introduced in late 2024, that connects Claude to external tools and data sources through MCP servers exposing a typed catalog of tools and resources. When Cowork needs your calendar, your CRM, or a folder in cloud storage, it is talking to an MCP server that advertises tools like list_events or search_records with JSON schemas describing their arguments.
The elegance is the division of labor: MCP gives the model access to a tool, and a skill gives the model the judgment to use that tool well. The connector says "here is a function that queries contracts"; the skill says "when reconciling expenses, query contracts by vendor first, then by date range, and treat anything over the approval threshold as a flag." The two layers compose, which is why the same connector can serve many different workflows.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Sub-agents and the context engine
Some tasks are too big or too noisy to keep in one conversation. For these, Cowork spawns sub-agents — fresh instances of the model with their own clean context window, handed a narrow brief and expected to return a compact result. An orchestrator might fan out one sub-agent per file, let each read and summarize independently, then collect the summaries. This keeps the main context uncluttered and lets work proceed in parallel.
The trade-off is real and worth stating plainly: multi-agent runs typically consume several times more tokens than a single-agent pass, because every sub-agent re-establishes its own context. So Cowork uses them deliberately, not reflexively, reserving fan-out for genuinely parallel or genuinely isolated subtasks. Tying everything together is the context engine — the component that decides, at every turn, what goes into the model's limited window: the system instructions, the live task state, the loaded skills, recent tool results, and a running summary of older history. Good agentic behavior is mostly good context management, and that engine is where Cowork earns or loses the user's trust.
Frequently asked questions
Is Claude Cowork just Claude Code with a different name?
They share primitives — the agentic loop, skills, MCP, sub-agents — but Cowork is tuned for non-engineering knowledge work: documents, spreadsheets, research, and operations rather than codebases. The packaging, default skills, and connectors differ even though the underlying engine is the same family.
How does Cowork avoid running out of context on long tasks?
The context engine continuously curates the window, summarizing older turns, dropping stale tool output, and offloading heavy subtasks to sub-agents with their own context. Skills are loaded only when relevant, so the window holds what matters now rather than everything that ever happened.
What stops the agent from taking an unsafe action?
The model only proposes actions; a separate harness decides what to actually execute, and connectors can require confirmation for sensitive operations. Because every tool call is structured and logged, the whole run is auditable after the fact.
Bringing agentic AI to your phone lines
The same architecture — a planning loop, tools called mid-task, and context curated turn by turn — is exactly what makes a voice agent feel competent. CallSphere applies these agentic-AI patterns to voice and chat, with assistants that answer every call, pull from your systems live, and book work around the clock. See it in action at callsphere.ai.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.