Inside a Built-with-Opus Claude Code agent's architecture

At a recent Built-with-Opus hackathon, the teams that shipped something genuinely useful in 36 hours had one thing in common: they understood the shape of a Claude Code agent before they wrote a line of glue. The teams that flailed treated Claude like a magic text box and kept bolting on prompts until something stuck. This post is a teardown of the architecture that actually held up — not a single facet of it, but the whole machine, from the moment a user types a request to the moment the agent reports back.

We are deliberately staying at the internals level here. No step-by-step build, no prompt cookbook — just how the pieces connect and why the connections matter when you are running Opus 4.8 under a deadline and a token budget.

What an agent actually is under the hood

A Claude Code agent is a loop, not a function. A useful working definition: an agentic system is a program in which a model decides which actions to take, executes them through tools, observes the results, and repeats until a stopping condition is met. That last clause matters more than people expect. The hackathon agents that hung forever or burned $40 of tokens in an afternoon all had a weak stopping condition.

Concretely, the runtime holds four things in memory between turns: the running message transcript, the set of tool definitions the model is allowed to call, the accumulated tool results, and a small block of policy text (the system prompt) that never changes. Each turn, Claude reads all of that, emits either a final answer or one-or-more tool calls, and the harness executes the calls and feeds the outputs back in. The model never touches your filesystem or your database directly — it only ever emits structured tool-call requests, and your harness decides whether and how to honor them. That separation is the single most important architectural fact, because it is where every safety, auth, and idempotency control lives.

The five subsystems and how they connect

When we drew the winning agent on a whiteboard, it decomposed into five subsystems: the driver loop, the context store, the tool layer, the skill loader, and the orchestrator that can fan work out to subagents. The driver loop owns control flow. The context store owns what the model can see. The tool layer owns side effects. The skill loader owns just-in-time instructions. The orchestrator owns parallelism. Keeping these as separate concerns is what let a three-person team reason about the system at 3 a.m.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

flowchart TD
  A["User request"] --> B["Driver loop"]
  B --> C{"Need a tool or skill?"}
  C -->|Skill relevant| D["Skill loader injects instructions"]
  C -->|Tool needed| E["Tool layer executes via MCP"]
  C -->|Big subtask| F["Orchestrator spawns subagents"]
  D --> G["Context store updates transcript"]
  E --> G
  F --> G
  G --> H{"Stopping condition met?"}
  H -->|No| B
  H -->|Yes| I["Final report to user"]

Notice the cycle. Everything routes back through the context store before the loop re-evaluates. That is intentional: the context store is the one place where you can compress, redact, or summarize before the next expensive model call. Teams that wrote tool results straight into the transcript without a compression step hit the context ceiling on long tasks and watched quality fall off a cliff.

The driver loop and its stopping condition

The driver loop is deceptively small — often under 80 lines. It calls the model, inspects the response for tool calls, runs them, appends results, and checks whether to continue. The art is entirely in the continue/stop decision. The robust agents used a composite condition: stop when the model emits a final answer with no tool calls, OR when a hard turn-count cap is hit, OR when a wall-clock budget expires, OR when the same tool is called with identical arguments twice in a row (a loop-detection heuristic). Any one of those alone is insufficient. The turn cap saves you from infinite loops; the repeat-detection saves you from a model that is convinced one more retry will work.

We also learned to make the loop observable. Each iteration logged the turn number, the tool called, the argument hash, and the latency. When an agent misbehaved during judging, that trace was the difference between a five-minute fix and a panicked rewrite.

The context store: the real bottleneck

Opus 4.8 ships with a very large context window, which tempts you to throw everything in. Do not. The context store should treat tokens as a managed resource. The winning pattern kept a structured store with three tiers: pinned facts that survive every turn (the task spec, key file paths, the user's actual goal), recent tool outputs kept verbatim for a few turns, and archived material that gets summarized into a one-paragraph digest and dropped from the live window. When the model needs an archived detail, it re-fetches it through a tool rather than carrying it forever.

This tiering is what kept long sessions coherent. A retrieval-heavy agent that dumps every search result into context will, by turn fifteen, be reasoning over mostly stale noise. By contrast, an agent that summarizes aggressively stays sharp because the model's attention is spent on what is currently relevant.

Where the tool layer and skills sit

The tool layer is the boundary between Claude's intentions and the real world. Each tool is a typed contract: a name, a JSON schema for arguments, and a handler that performs the side effect and returns a structured result. Skills sit one layer up — they are not tools, they are bundles of instructions and helper scripts that the skill loader injects only when relevant, teaching the model how to use the tools well for a particular kind of task. The mental model that clicked for the team: tools give the agent hands, skills give it know-how, and MCP servers are how both reach across process boundaries to external systems.

Architecturally, the important thing is that skills are loaded lazily. You do not pay for a skill's instructions on every turn; you pay only when its trigger fires. That keeps the base context lean while still giving the agent deep, specialized competence on demand.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

When to add a second agent

The orchestrator subsystem is the one most teams reached for too early. A multi-agent system is a configuration where a lead agent decomposes a task and delegates independent sub-tasks to separate agent instances that each carry their own context. It shines when sub-tasks are genuinely parallel and independent — researching five competitors at once, or refactoring six modules that do not touch each other. It is a trap when the sub-tasks are sequential or share state, because multi-agent runs commonly burn several times more tokens than a single agent, and coordination overhead eats the gains. The architectural rule we settled on: stay single-agent until you can name two or more sub-tasks that have no data dependency between them.

Frequently asked questions

What is the minimum viable architecture for a Claude Code agent?

A driver loop, a context store, and a tool layer with at least one real tool. Skills and orchestration are powerful additions but not required to ship something useful. Start with the three-part core and add layers only when a concrete need appears.

Why keep the context store separate from the transcript?

Because the transcript is what you send to the model, and the store is where you decide what the transcript should contain. Separating them gives you a single place to compress, redact, and re-prioritize before every expensive call — which is the main lever you have over both cost and quality on long tasks.

How does Opus 4.8 change the architecture versus smaller models?

Mostly it lets you trust the planning layer more, so you can hand the model larger, fuzzier goals and fewer hand-held steps. The subsystem boundaries stay the same; you simply lean harder on the model's own decomposition and lighter on scripted control flow.

Where do safety and permissions live in this picture?

In the tool layer, always. The model only requests actions; the handler decides whether to perform them. Gating, allow-lists, and confirmation prompts belong in the harness around each tool, never in the prompt alone.

Bringing agentic AI to your phone lines

CallSphere runs this same loop-plus-tools architecture over voice and chat — agents that answer every call, pull data through tools mid-conversation, and book real work around the clock. See the architecture in action at callsphere.ai.

Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.

Inside a Built-with-Opus Claude Code agent's architecture

What an agent actually is under the hood

The five subsystems and how they connect

The driver loop and its stopping condition

The context store: the real bottleneck

Where the tool layer and skills sit

When to add a second agent

Frequently asked questions

What is the minimum viable architecture for a Claude Code agent?

Why keep the context store separate from the transcript?

How does Opus 4.8 change the architecture versus smaller models?

Where do safety and permissions live in this picture?

Bringing agentic AI to your phone lines

Try CallSphere AI Voice Agents

Related Articles You May Like

Where Claude Code GTM engineering is heading next

Where Claude Cowork is heading and how to prepare

Measuring Claude Cowork success: metrics that prove it

How to measure success of Claude Code GTM workflows

Claude Cowork walkthrough: from problem to shipped

End-to-end Claude Code GTM workflow: a real rebuild