Skip to content
Agentic AI
Agentic AI8 min read0 views

How Claude Code Works Under the Hood: Architecture

A deep architectural tour of Claude Code — the agent loop, layered context, tool dispatch, subagents, and MCP — and how the pieces fit together end to end.

When a new engineer joins your team, they don't start by reading every file. They get a laptop, a few credentials, a map of where things live, and a sense of which questions to ask before touching production. Onboarding Claude Code well means giving it the same things — but to do that, you first have to understand the machine you're onboarding. Most teams treat Claude Code as a chat box that happens to edit files. It is actually a layered runtime with a tight control loop, and the quality of your results depends on how that loop is fed.

This post is the architecture tour I wish I'd had on day one: how the agent loop spins, where context comes from, how tools get dispatched, and how subagents and MCP servers extend the whole system without melting the context window.

The agent loop is the heartbeat

At the center of Claude Code is a loop, not a single request. The agent loop is the cycle in which the model reads the current context, decides on one action, the harness executes that action, and the result is appended back into context before the next turn. An action is usually a tool call — read a file, run a shell command, edit code, search the repo — or a final message to you. Nothing magic happens between turns; the model only ever sees what the harness chose to put in front of it.

That framing matters because it tells you where leverage lives. You don't control how the model thinks, but you control what enters the loop: the instructions, the file contents, the command output, the tool results. A well-onboarded Claude Code is one where each turn has exactly the signal it needs and little of the noise it doesn't. When the loop goes wrong — wandering, repeating, hallucinating a file path — it is almost always because the context being fed back was misleading, stale, or missing a key fact.

Context is assembled in layers

The single most important internal idea is that context is not one blob. It is assembled fresh on every turn from several sources with different lifetimes. Understanding the layers lets you predict the agent's behavior.

The durable layers load once and persist: the system prompt and tool definitions that define the harness's capabilities, and your project's CLAUDE.md memory file with conventions, commands, and gotchas. The session layer is the running conversation — your messages and the model's prior turns. The dynamic layer is everything pulled in mid-task: file reads, grep results, command output, and skill files loaded on demand. The ephemeral layer is scratch reasoning that informs one decision and is gone.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
flowchart TD
  A["CLAUDE.md memory & system prompt"] --> E["Context window for this turn"]
  B["Conversation history"] --> E
  C["Dynamic pulls: file reads, grep, command output"] --> E
  D["Skill files loaded on demand"] --> E
  E --> F{"Model picks one action"}
  F -->|Tool call| G["Harness executes tool"]
  G --> C
  F -->|Final answer| H["Reply to engineer"]

Notice the cycle: dynamic pulls feed the window, the model acts, the harness runs the tool, and the result loops back as a new dynamic pull. The durable layers never re-load, which is exactly why a precise CLAUDE.md pays off on every single turn — it is the cheapest, highest-leverage context you own.

Tool dispatch: how an intention becomes an effect

Claude Code never edits your disk directly. The model emits a structured tool call — a name and a JSON argument object that conforms to a schema the harness advertised. The harness validates the arguments, runs the underlying operation, captures the result, and serializes it back into the loop as a tool result message. This indirection is the safety boundary: permissions, sandboxing, and confirmation prompts all live in the harness, not in the model.

Because dispatch is schema-driven, the model's reliability is bounded by how well the tools are described. A tool named edit with a vague description and loose arguments invites mistakes; a tool with a crisp description, required fields, and a clear failure mode produces clean calls. This is the same principle that makes a good internal API easier for a human junior to use correctly — and it is why later posts in this series spend real time on tool and prompt design.

Subagents: parallelism without context collapse

A single context window is a finite resource, and long tasks fill it with detail that later steps don't need. Subagents solve this. The main agent — the orchestrator — can spawn a child agent with its own fresh context window, hand it a scoped task, and receive back only a compact summary. The child's noisy exploration never pollutes the parent's window.

This is genuinely powerful for work that fans out: search five directories at once, investigate three candidate root causes in parallel, or draft and critique in separate contexts. The tradeoff is cost. Multi-agent runs typically burn several times more tokens than a single agent doing the same work serially, because each subagent re-establishes its own context. The architectural rule of thumb: reach for subagents when subtasks are genuinely independent and each produces a small, mergeable result — not as a default for everything.

MCP servers extend the world the agent can touch

Out of the box, Claude Code can read and write your repo and run your shell. Real work needs more — your database, your issue tracker, your deployment API. Model Context Protocol is an open standard, introduced in late 2024, that lets Claude connect to external tools and data through MCP servers that expose typed tools and resources. Architecturally, an MCP server is just another tool provider: it advertises tool schemas, and those schemas join the same dispatch table the built-in tools use. From the agent loop's perspective, calling a database query tool over MCP looks identical to calling the built-in file reader.

The clean separation here is the point. The server owns the integration — auth, connection pooling, error mapping — while the agent owns the decision of when to call it. Skills then sit on top to teach the model how and when to use a given server effectively. The whole stack composes because every layer speaks the same tool-call protocol.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Why this architecture changes how you onboard

Once you see Claude Code as a layered loop rather than a clever autocomplete, onboarding stops being about clever phrasing and becomes engineering. You invest in durable memory so every turn starts informed. You shape tool schemas so dispatch is reliable. You reserve subagents for true parallelism. You wire MCP servers for the systems the agent must reach. Each of those is a concrete artifact you can review, version, and improve — the same way you'd improve a runbook for a human hire.

Frequently asked questions

What is the agent loop in Claude Code?

It is the core cycle where the model reads its current context, chooses a single action (typically a tool call), the harness executes that action, and the result is appended back into context before the next turn. Everything Claude Code does is a sequence of these turns; you influence behavior by controlling what enters each turn.

How does Claude Code avoid filling its context window on long tasks?

Mainly through subagents. The orchestrator spawns child agents with fresh, isolated context windows to handle scoped subtasks and returns only compact summaries, so noisy exploration never accumulates in the parent's window. The cost is higher total token usage, so subagents are used deliberately for independent subtasks.

How do MCP servers fit into the architecture?

An MCP server is a tool provider that advertises typed tools and resources over the Model Context Protocol. Its tool schemas join the same dispatch table as the built-in file and shell tools, so from the agent loop's perspective an MCP call is indistinguishable from a native tool call. The server handles the integration details; the agent decides when to call it.

Why does CLAUDE.md matter so much architecturally?

Because it loads into the durable context layer that persists across every turn without being re-pulled. A precise project memory file means each of the hundreds of turns in a task starts with your conventions, commands, and constraints already present — the highest-leverage, lowest-cost context you can provide.

Bringing agentic AI to your phone lines

The same layered agent loop, tool dispatch, and subagent patterns power great voice and chat agents too. CallSphere applies them so AI assistants answer every call and message, use tools mid-conversation, and book work around the clock. See it live at callsphere.ai.


Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.