Skip to content
Agentic AI
Agentic AI8 min read0 views

Inside the Claude API Skill: Architecture of Agentic Tooling

How the Claude API skill wires models, tools, MCP servers, and the agentic loop into one system across developer tools. A deep architectural walkthrough.

When engineers first reach for the Claude API inside a developer tool, they usually treat it as a thin wrapper: send a prompt, get a string back. That mental model survives exactly until the first tool call. The moment Claude needs to read a file, query a database, or hit an MCP server, the architecture stops being a request-response line and becomes a small distributed system with a loop at its center. Understanding the pieces of that system — and how they fit end to end — is the difference between a demo that works once and a tool that ships.

This post walks the full architecture of what I'll call the Claude API skill: the bundle of conventions, primitives, and control flow that lets Claude act as an agent inside something like an IDE plugin, a CLI, or a code-review bot. Everything routes through one endpoint, POST /v1/messages, yet the behavior that emerges is anything but a single round trip.

One endpoint, many surfaces

The first thing to internalize is that tools, structured outputs, thinking, and streaming are not separate APIs. They are features of the same Messages endpoint. You pass a tools array, an output_config, a thinking mode, and a list of messages, and the model decides what to emit: plain text, a thinking block, or a tool_use block requesting an action. The endpoint is stateless — you resend the full conversation history every turn — which means the architecture's memory lives entirely in the message array you maintain on the client side.

This statelessness is a design gift, not a limitation. Because the server holds no session, your tool can fork a conversation, replay it, snapshot it to disk, or hand it to a subagent, all by manipulating an ordinary list. The cost is discipline: every byte you put in front of a cache breakpoint is part of the prefix, and any change invalidates the cache downstream. Architecture decisions about where volatile data lives (request IDs, timestamps, the user's latest question) are really caching decisions in disguise.

The agentic loop is the load-bearing wall

At the heart of the skill sits a loop. You send messages with a tool list; the model responds; you inspect stop_reason. If it's end_turn, you're done. If it's tool_use, you execute the requested tools, append their results as a user-role message, and call again. The loop repeats until the model stops asking for tools. That's the entire control structure — and the official SDKs ship a tool_runner that runs it for you, so you only write the tool functions.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
flowchart TD
  A["User prompt + tool list"] --> B["POST /v1/messages"]
  B --> C{"stop_reason?"}
  C -->|end_turn| D["Return final text"]
  C -->|tool_use| E["Execute requested tools"]
  E --> F["Append tool_result messages"]
  F --> G{"MCP or local tool?"}
  G -->|MCP server| H["Call server, get structured data"]
  G -->|Local function| I["Run client-side code"]
  H --> B
  I --> B

The reason this loop is the load-bearing wall is that everything else — schemas, error handling, idempotency, context management — hangs off it. A tool's JSON schema shapes what the model emits in the tool_use block. Your error handling decides whether a failed tool result flows back as a recoverable is_error message or crashes the loop. Idempotency keys protect you when the loop retries a side-effecting call. The loop is small, but it is where your tool's reliability is won or lost.

Where MCP and Skills slot in

Two newer pieces extend the architecture without changing the loop. Model Context Protocol (MCP) is an open standard that connects Claude to external tools and data through MCP servers, exposing their capabilities as callable tools. From the loop's perspective an MCP tool is just another entry in the tools array — the SDK converts the server's advertised tools into Anthropic tool definitions, and when the model calls one, the call is routed to the server, which returns structured data. The model never knows whether a tool ran locally or on a remote server.

Agent Skills are the complementary half. A skill is a folder containing a SKILL.md plus optional scripts and resources; its short description sits in context by default, and Claude reads the full file only when a task makes it relevant. Where MCP gives Claude capabilities, skills give it know-how — the procedural knowledge of how and when to use those capabilities. Architecturally, skills keep the base system prompt small while preserving discoverability, because the description-then-load pattern is a progressive-disclosure mechanism layered on top of the same message array.

Server-side versus client-side tools

The architecture has a clean split that trips up newcomers. Client-side tools are defined by Anthropic (name, schema, expected usage) but executed by your harness — the model emits a call, your code runs it, you send the result back. Server-side tools like code execution and web search run entirely on Anthropic's infrastructure; you just declare them in tools and the model handles the rest, sometimes pausing the turn with pause_turn when a server-side loop hits its iteration cap.

This split matters because it determines where your security boundary sits. A client-side bash tool gives the model broad leverage but hands your harness an opaque command string. Promoting an action to a dedicated typed tool gives the harness a hook it can gate, render, or audit. The architecture lets you choose per action: breadth via bash, control via dedicated tools. Most production tools end up with a mix, and the mix is itself an architectural statement about what you trust the model to do unsupervised.

Context as a first-class component

Over a long agentic run the message array grows, and the architecture provides three knobs to manage it. Context editing prunes stale tool results and thinking blocks. Compaction summarizes earlier history server-side when you approach the context window, returning a compaction block you must append back verbatim. Memory persists state across sessions via a tool-backed directory. These aren't optional polish — on a tool that runs for thousands of turns, they are the only thing standing between you and a context-window wall.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

The subtlety with compaction is that you must append response.content in full, not just the extracted text. The compaction block is how the API replaces compacted history on the next request; drop it and you silently lose the state. This is the kind of architectural detail that doesn't show up in a quickstart but ends every long-running agent that ignores it.

Frequently asked questions

Is the Claude API skill a separate product from the Messages API?

No. It's a way of using the Messages API — the same POST /v1/messages endpoint — with tools, an agentic loop, and conventions like MCP and Skills layered on top. There's no separate agent endpoint for the custom-tool path; you orchestrate the loop yourself or let the SDK's tool runner do it.

How does Claude know which tool to call?

From the tool descriptions and input schemas you provide. The model reads them at inference time and emits a tool_use block naming the tool and its arguments. Prescriptive descriptions that state when to call a tool — not just what it does — measurably improve selection, especially on recent Opus models that reach for tools more conservatively.

Do I need MCP to build an agent with Claude?

No. MCP is one way to expose tools, valuable when you want a standard interface to external systems or want to reuse community servers. You can build a complete agent with only locally-defined functions. MCP shines when the same tools must be shared across many tools or teams.

What model should the loop run on?

Default to the most capable Opus tier (claude-opus-4-8) for the main reasoning loop, and consider a cheaper model like Haiku for narrow subagent tasks. Switching models mid-conversation invalidates the prompt cache, so isolate model changes inside subagents rather than swapping the main loop's model.

Bringing agentic AI to your phone lines

CallSphere takes the same end-to-end agentic architecture — a tool-using loop, structured context, and disciplined error handling — and points it at voice and chat, so every call and message is answered by an assistant that can use tools mid-conversation and book real work around the clock. See the architecture in action at callsphere.ai.


Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.