Inside Claude Code: the HTML-shaped agent architecture
How Claude Code is wired end to end — agent loop, tool dispatch, context assembly, subagents, and why HTML-shaped context makes it all work.
Open up almost any production agent and you find the same quiet truth: the model is the easy part. What actually decides whether an agent works is the plumbing around it — how prompts are assembled, where tool results land, how the loop decides to keep going or stop. Claude Code is interesting precisely because its internals make those decisions legible. When you watch it run, you can see the pieces fit together, and a lot of that legibility comes from treating context as structured, tag-delimited markup rather than a flat blob of text.
This piece walks through Claude Code's architecture from the outside in: the agent loop, the tool dispatch layer, the context assembler, the subagent fabric, and the permission gate. The recurring theme — the "unreasonable effectiveness of HTML" — is that wrapping everything Claude reads in clear, nested, named boundaries (the way HTML wraps everything a browser reads) turns an unbounded prompt-engineering problem into a structural one you can reason about.
The agent loop is a state machine, not a chat
People picture an agent as a chat that occasionally calls a function. It is more accurate to picture a loop that holds a running transcript and, on every turn, decides among three moves: answer, call a tool, or stop. Claude Code's loop sends the current context to the model, parses the response for tool-use blocks, executes any tools it finds, appends the structured results back into the transcript, and repeats. The loop terminates when the model emits a final answer with no pending tool calls, or when a guard (turn cap, token budget, user interrupt) trips.
The crucial detail is that every artifact in that transcript has a shape. A tool call is a typed block with a name and a JSON arguments object. A tool result is a typed block keyed to the call that produced it. File contents get wrapped in delimiters that mark where the file starts and ends. This is the HTML insight applied to context: the model is never asked to guess where one thing ends and another begins, because boundaries are explicit. A browser can render a malformed page only because tags tell it the tree; Claude Code can act on a 40-file context only because tags tell it the structure.
The tool layer: dispatch, schema, and result framing
Between the model and the outside world sits a dispatch layer. When Claude emits a tool-use block, the runtime validates the arguments against that tool's JSON Schema, routes the call to the right handler (a built-in like file-read, a shell command, or an MCP server), runs it, and frames the output as a structured result. Framing matters as much as execution. A raw 8,000-line log dumped into context is nearly useless; the same log truncated, line-numbered, and wrapped in a labeled block is something Claude can navigate.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
This is also where idempotency and safety live. Read-style tools can be retried freely; write-style tools pass through a permission gate first. Errors are not thrown away — they are returned to the model as structured failure results so the loop can adapt rather than crash. The diagram below traces a single turn through these layers.
flowchart TD
A["Assembled context (tagged blocks)"] --> B["Claude Opus 4.x"]
B --> C{"Tool-use block?"}
C -->|No| D["Final answer emitted"]
C -->|Yes| E["Validate args vs JSON Schema"]
E --> F{"Write action?"}
F -->|Yes| G["Permission gate"]
F -->|No| H["Execute handler / MCP server"]
G --> H
H --> I["Frame result as labeled block"]
I --> AContext assembly: the part that actually wins
The context assembler is the component that decides, on every turn, what the model gets to see. It is the highest-leverage piece of the whole system. It pulls from several sources — the system prompt, the user's instructions, loaded skills, recent tool results, open files, and a compacted summary of older history — and lays them out in a deterministic order with stable delimiters.
Think of it as building a DOM before each render. The system prompt is the head; skills and tool definitions are like included stylesheets and scripts that tell Claude how to behave; the live file and result blocks are the body. Because each region is named and bounded, the assembler can swap pieces in and out — drop a stale file, inject a freshly-read one, summarize a long tool result — without confusing the model about what changed. Flat-string prompting cannot do this safely; structured context can.
When history grows past budget, the assembler compacts: older turns are summarized into a dense block and the verbatim originals are dropped. Because the boundaries were explicit, the assembler knows exactly which spans are safe to fold. This is why a long Claude Code session does not degrade into mush — the structure is load-bearing.
Subagents and the orchestrator fabric
For larger jobs, Claude Code can spawn subagents. The orchestrator decomposes a task, hands each subagent a focused brief and its own fresh context window, lets them run in parallel, and collects their structured outputs. The win is context isolation: a subagent grinding through fifty files of a search never pollutes the orchestrator's window with that noise — only the distilled finding comes back.
The cost is tokens. A multi-agent run can burn several times the tokens of a single agent because each subagent re-establishes its own context. The architectural discipline is to reach for subagents only when subtasks are genuinely independent and each produces a small, structured result worth more than the tokens it costs. The orchestrator treats each subagent like a tool that returns a tagged block — same HTML-shaped contract as everything else.
The permission gate and hooks
Wrapping the tool layer is a policy boundary. Before any side-effecting action runs, a permission gate checks it against configured rules and, where required, asks the user. Hooks let you splice deterministic code into lifecycle points — before a tool runs, after a file is edited, when the session ends — to lint, format, log, or veto. This is how you bolt non-negotiable guarantees onto a probabilistic core: the model proposes, your hooks dispose.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Architecturally, hooks and the gate keep the model's freedom bounded without constraining its reasoning. Claude can plan whatever it wants; the gate decides what is allowed to touch the world. That separation — flexible reasoning, rigid enforcement — is what makes the system safe enough to run against real repositories.
Frequently asked questions
What is the Claude Code agent loop?
The Claude Code agent loop is the runtime cycle that sends structured context to Claude, parses the response for tool calls, executes those tools, frames the results as labeled blocks, appends them to the transcript, and repeats until the model returns a final answer or a guard stops it.
Why does HTML-like structure matter for an agent?
Because the model never has to guess where one piece of context ends and another begins. Named, nested, delimited blocks — like HTML tags — let the runtime add, drop, summarize, and reorder context deterministically, which keeps long sessions coherent and makes compaction safe.
When should I use subagents in Claude Code?
Use them when subtasks are independent and each returns a small, distilled result — large searches, parallel file analysis, isolated investigations. Avoid them for tightly coupled work, since each subagent runs its own context and multi-agent runs cost several times more tokens.
What stops the agent from doing something dangerous?
A permission gate intercepts every side-effecting tool call before it executes, and hooks let you run deterministic checks at lifecycle points. The model proposes actions; your policy layer decides which ones actually run.
Bringing agentic AI to your phone lines
CallSphere takes these same architectural ideas — a tight agent loop, structured context, tool calls mid-task, and hard policy gates — and points them at voice and chat, so multi-agent assistants answer every call and message and book work around the clock. See it live at callsphere.ai.
Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.