How Claude Agent Skills Work: Architecture Internals

The first time you watch an agent pull off something genuinely specialized — formatting a brand-compliant invoice, running a finicky data migration, following your company's release checklist without being told twice — it can feel like magic. It isn't. Underneath, Claude Agent Skills are a remarkably simple architecture: folders of instructions and code that the model decides to read, on demand, when the work calls for them. The magic is entirely in how those pieces are arranged and when they enter the model's context.

This post takes the architecture apart end to end. We'll trace what actually lives in a skill, how the agent runtime discovers and indexes skills, how a skill is matched to a task and loaded, and how its scripts and resources get executed. If you understand this flow, you stop guessing at why a skill did or didn't fire, and you start designing skills the way the system wants to consume them.

What a skill actually is on disk

An Agent Skill is a directory. At its root sits a manifest — typically a SKILL.md file — that begins with a small block of metadata: a name, a one-line description, and sometimes a list of files or tools the skill relies on. Below that metadata is the body: prose instructions written for the model, the same way you'd brief a capable colleague. Around that file, the folder can hold anything else the skill needs: Python or shell scripts, reference documents, JSON schemas, example outputs, templates.

The crucial design choice is that only the metadata is cheap to read. The runtime can scan dozens or hundreds of skill descriptions without spending much context. The full body and the bundled files are loaded lazily — pulled in only once a skill is judged relevant. This is what lets a single agent carry a large library of specialized capabilities without drowning every prompt in instructions it doesn't need.

Discovery and the skill index

When an agent session starts, the runtime walks the configured skill locations — a project directory, a user-level folder, a plugin bundle — and reads the metadata header from each SKILL.md. It builds a lightweight index: name, description, and location, nothing more. That index is what the model sees as available capabilities. The bodies stay on disk.

This separation is the whole trick, so it's worth stating precisely. An Agent Skill is a self-contained folder of instructions, scripts, and resources that an agent loads into context only when the skill's description matches the task at hand. Because matching happens against the short description, the description is the single most load-bearing line in the entire skill. A vague description means the skill never fires; a sharp, trigger-rich description means it fires exactly when it should.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

flowchart TD
  A["Agent session starts"] --> B["Scan skill folders"]
  B --> C["Read SKILL.md metadata only"]
  C --> D["Build lightweight skill index"]
  D --> E{"User task arrives"}
  E --> F{"Description matches task?"}
  F -->|No| G["Answer with base capabilities"]
  F -->|Yes| H["Load full SKILL.md body"]
  H --> I["Execute scripts & read resources"]
  I --> J["Compose final result"]

Matching a task to a skill

When a request comes in, the model evaluates it against the indexed descriptions. This is not keyword grep — it's the same semantic judgment the model uses for everything else. A user who says "clean up this spreadsheet before the board sees it" can trigger a skill described as "format and validate financial spreadsheets to internal standards" even though none of the words overlap exactly. The description's job is to capture the situations the skill applies to, not just its nouns.

Multiple skills can be relevant at once, and the architecture handles that by composition rather than exclusion. The agent may load a data-cleaning skill and a brand-style skill together, threading both sets of instructions into its working context. Because each skill is self-contained, they layer cleanly — there's no central registry forcing them to know about each other.

Loading: progressive disclosure in action

Once a skill is selected, the runtime reads the full SKILL.md body into context. This is the moment the agent inherits the specialized behavior: the step-by-step procedure, the gotchas, the references to bundled files. If the body says "use the script at scripts/validate.py to check the schema," that path is now meaningful — the file is sitting right there in the skill folder.

Larger skills lean on a second tier of disclosure. The body stays lean and points to deeper reference files that are only opened when a specific branch of the work demands them. A skill might keep a 60-page style guide in reference/style.md and instruct the model to read just the relevant section. The architecture rewards this discipline: context is finite, and a skill that loads its entire knowledge base up front competes with the actual task for the model's attention.

Execution: scripts, tools, and the model loop

Many skills don't just instruct — they ship executable code. When the body tells the agent to run a bundled script, the runtime invokes it through the agent's normal tool surface (a code-execution or shell tool), captures stdout and stderr, and feeds the result back into the loop. This is where skills become more than prompts: deterministic work — parsing, validation, transformation — runs as real code, while the model handles judgment and orchestration around it.

The execution loop is iterative. The model reads the skill, runs a step, observes the output, and decides the next move. If a script fails, the error text returns to context and the model can adapt — fix an argument, try a fallback, or surface the problem to the user. Crucially, the skill author controls how robust this is: a well-written skill anticipates failure modes and tells the model what to do when they happen, rather than leaving it to improvise.

How skills sit alongside MCP and subagents

Skills don't operate in isolation — they share the runtime with two other agentic primitives, and understanding the boundaries clarifies the whole architecture. MCP servers provide the agent's tools: the typed, callable surface for touching databases, APIs, and services. A skill frequently references those tools, telling the model when and how to use them, but the skill is instructions while the MCP server is capability. One teaches; the other does. They pair so naturally that many skills exist precisely to make a set of MCP tools usable like an expert would use them.

Subagents are the third piece. When a task is large or parallelizable, an orchestrating agent can spawn subagents, each with its own focused context — and each subagent can load whichever skills its slice of the work needs. Because skills are folders matched by description, a subagent inherits the same library and pulls only what's relevant to its narrow job. The architecture composes cleanly: subagents partition the work, skills supply specialized procedure, and MCP servers supply the hands. None of the three knows the internals of the others; they coordinate through clean interfaces, which is exactly why the system stays maintainable as it grows.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Why this architecture scales

The payoff of all this indirection is that capability scales independently of context cost. Adding a skill costs one short description in the index; it costs nothing until it's used. A team can maintain hundreds of skills across a shared folder, and any agent picks up exactly the handful each task needs. Versioning is just editing files. Sharing is just copying a folder or shipping a plugin bundle.

It also keeps the model and the knowledge cleanly separated. When Anthropic ships a stronger model, your skills don't change — the same folders make the new model better immediately. And when your process changes, you edit a Markdown file, not a fine-tune. That decoupling is the quiet reason Skills have become the default way to give Claude-based agents durable, specialized expertise.

Frequently asked questions

Where do Agent Skills live and who loads them?

Skills live in folders the agent runtime is configured to scan — project-level, user-level, or inside a plugin bundle. At session start the runtime reads only each skill's metadata to build an index; the full instructions and bundled files load lazily when a skill is matched to a task.

How does the agent decide which skill to use?

It compares the incoming task against the indexed skill descriptions using semantic judgment, not keyword matching. The description should describe the situations the skill applies to. Multiple skills can load together and compose, since each skill folder is self-contained.

Can a skill run real code, not just give instructions?

Yes. A skill folder can bundle scripts that the agent executes through its code or shell tool. Output flows back into the model's loop, letting deterministic work run as code while the model handles orchestration and judgment around it.

Why not just put everything in the system prompt?

Because context is finite and shared with the task. Loading every capability up front crowds out the work itself and slows the agent. Skills use progressive disclosure — cheap descriptions in the index, full bodies only on demand — so capability scales without paying for what you don't use.

Bringing agentic AI to your phone lines

CallSphere takes these same skill-driven agent patterns and points them at voice and chat — assistants that load the right playbook mid-call, run tools while the caller is still on the line, and book real work around the clock. See it live at callsphere.ai.

Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.

How Claude Agent Skills Work: Architecture Internals

What a skill actually is on disk

Discovery and the skill index

Matching a task to a skill

Loading: progressive disclosure in action

Execution: scripts, tools, and the model loop

How skills sit alongside MCP and subagents

Why this architecture scales

Frequently asked questions

Where do Agent Skills live and who loads them?

How does the agent decide which skill to use?

Can a skill run real code, not just give instructions?

Why not just put everything in the system prompt?

Bringing agentic AI to your phone lines

Try CallSphere AI Voice Agents

Related Articles You May Like

Where Claude Cowork is heading and how to prepare

Where Claude Code GTM engineering is heading next

Measuring Claude Cowork success: metrics that prove it

How to measure success of Claude Code GTM workflows

Claude Cowork walkthrough: from problem to shipped

End-to-end Claude Code GTM workflow: a real rebuild