How Claude Agent Skills Work: The Internal Architecture

The first time an engineering org adopts Agent Skills across more than a handful of teams, the question stops being "how do I write one" and becomes "how does this actually work under the hood." If you don't understand the internals, you end up with a hundred overlapping skills, a context window that bloats on every turn, and no idea why Claude picks one skill over another. This post walks the full architecture: what a skill is on disk, how Claude knows it exists, when it gets loaded, and how all the pieces fit together end to end.

Key takeaways

A skill is a folder of instructions, scripts, and resources that Claude loads dynamically only when it judges the skill relevant to the task.
Discovery and loading are two separate phases: lightweight metadata is always in context, but the heavy body is read on demand.
The SKILL.md frontmatter (name + description) is the single most important architectural lever — it is the index Claude searches.
Skills compose with MCP servers: MCP supplies the tools, skills supply the know-how for using them.
Progressive disclosure is the core design principle — keep the top level small, push detail into linked files.

What a skill actually is on disk

Strip away the marketing and a skill is a directory. At minimum it contains a SKILL.md file with YAML frontmatter and a Markdown body. Optionally it carries scripts (Python, shell, anything executable), reference documents, templates, and example files that the body can point to. An Agent Skill is a self-contained folder of natural-language instructions and supporting files that Claude reads at runtime to perform a specialized task without those instructions living permanently in the system prompt.

The frontmatter is the load-bearing part. It holds a name and a description, and the description is what Claude reads when deciding whether the skill applies. The body below the frontmatter can be long and detailed, because it is only pulled into context after the skill has been selected. This split — tiny always-on header, large on-demand body — is the architectural heart of the whole system.

my-skill/
  SKILL.md            # frontmatter + instructions
  reference/
    api-schema.md     # loaded only if SKILL.md links to it
    error-codes.md
  scripts/
    validate.py       # executed, not read into context
  templates/
    report.html

That layout matters because of how context budgets work. The validate.py script never needs to enter the model's context — Claude runs it as a tool and reads only the output. The reference docs only enter context if the body explicitly tells Claude to open them for a given subtask. You are, in effect, building a small lazy-loaded knowledge tree.

Discovery: how Claude knows the skill exists

At session start, the runtime scans the skill directories available to it and builds an index from each skill's frontmatter. Only the name and description go into this index — not the body, not the scripts, not the reference files. This keeps the always-present footprint of even a large skill library down to a few hundred tokens per skill rather than thousands.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

flowchart TD
  A["Session starts"] --> B["Runtime scans skill dirs"]
  B --> C["Read each SKILL.md frontmatter"]
  C --> D["Build name + description index"]
  D --> E{"User prompt arrives"}
  E --> F{"Description matches intent?"}
  F -->|No| G["Skill stays dormant"]
  F -->|Yes| H["Load full SKILL.md body"]
  H --> I["Follow links to reference files as needed"]

When a user prompt arrives, Claude compares the intent against those descriptions. This is why the description is not throwaway prose — it is a retrieval query target. A description like "helps with documents" will misfire constantly; "Extracts tables and figures from financial PDFs and converts them to normalized JSON" gives Claude a sharp signal about exactly when to reach for it. The discovery phase is cheap, repeatable every turn, and entirely driven by that one string.

Loading: progressive disclosure in action

Once a skill is judged relevant, the runtime reads the full SKILL.md body into context. The body is where the real instructions live: step-by-step procedures, constraints, formatting rules, which scripts to run in which order. If the body references additional files — say reference/api-schema.md — those are loaded only when Claude reaches the point of needing them. This is progressive disclosure: the model walks the tree depth-first, paying context cost only for the branches it actually visits.

The payoff is concrete. A skill that documents a 40-endpoint internal API can keep its SKILL.md under 500 words by listing endpoints briefly and linking each domain's full schema to a separate file. A request that only touches billing pulls in the billing schema; the inventory and auth schemas never cost a token. Without progressive disclosure that same skill would force every detail into context on every invocation.

How skills, MCP, and tools fit together

Skills do not replace tools — they orchestrate them. Model Context Protocol servers expose tools and data sources to Claude; a skill tells Claude the playbook for using those tools well. Picture an MCP server that connects to your data warehouse and exposes a run_query tool. The server makes the capability available, but it says nothing about your table naming conventions, which joins are safe, or how to format results for finance. The skill supplies exactly that institutional knowledge.

This separation is what makes skills the right unit for organizational knowledge. Tools are generic and reusable; skills are where your company's specific way of doing things lives. When the architecture is healthy you'll see thin, broadly useful MCP servers paired with rich, opinionated skills layered on top. The skill can even call the script in its own folder to pre-validate inputs before invoking an MCP tool, closing the loop between local logic and remote capability.

The runtime lifecycle, end to end

Putting it together, a single agentic turn flows like this. The user sends a prompt. Claude consults the always-present skill index and decides whether any skill's description matches. If one does, the body loads and becomes part of the working instructions. Claude then executes — running the skill's scripts as tools, opening reference files as the body directs, and calling MCP tools where the skill instructs. The skill's content shapes the plan; the tools do the side-effecting work; the reference files supply detail just in time.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Crucially, when the task ends or moves on, that skill body does not have to persist. The next unrelated turn can drop it and load a different skill. This stateless-per-turn loading is why an org can ship hundreds of skills without every conversation paying for all of them. The context cost scales with what a given task touches, not with the size of your skill library.

Common pitfalls

Vague descriptions. If the frontmatter description is generic, discovery fails silently — Claude simply never loads the skill. Write descriptions as precise trigger conditions, naming the inputs and outputs.
Monolithic SKILL.md files. Stuffing everything into the body defeats progressive disclosure and bloats context on every load. Push detail into linked reference files.
Overlapping skills. Two skills with near-identical descriptions make selection ambiguous and nondeterministic. Give each skill a distinct, non-overlapping scope.
Embedding data that should be executed. Don't paste a 300-line lookup table into the body when a script could produce it. Scripts run as tools and never consume context.
No versioning discipline. Skills are organizational knowledge; treat them like code with review and version control, or they drift out of sync with the systems they describe.

Architecture review in 5 steps

Audit every skill's frontmatter description and rewrite any that don't name concrete triggers, inputs, and outputs.
Measure each SKILL.md body length and split anything over a few hundred lines into linked reference files.
Identify data baked into bodies that should instead be produced by a script, and move it.
Map each skill to the MCP servers it relies on, and confirm the tools exist and are scoped correctly.
De-duplicate overlapping descriptions so discovery is unambiguous.

Layer	Lives where	Enters context
Frontmatter (name + description)	Top of SKILL.md	Always (the index)
Instruction body	SKILL.md body	On selection
Reference files	Linked docs	When the body opens them
Scripts	scripts/ folder	Never — run as tools

Frequently asked questions

What is an Agent Skill in one sentence?

An Agent Skill is a self-contained folder of instructions, scripts, and resources that Claude discovers via its frontmatter description and loads into context only when the current task makes it relevant.

How is a skill different from an MCP server?

An MCP server exposes tools and data — the raw capability — while a skill encodes the procedural knowledge for using those tools well. You typically pair one rich skill with one or more thinner MCP servers.

Why doesn't a large skill library blow up the context window?

Because only frontmatter descriptions live in the always-on index. Bodies, reference files, and scripts load lazily, so context cost scales with what a task touches, not with library size.

Can a skill run code?

Yes. Scripts placed in the skill folder are executed as tools, and only their output enters context, which keeps deterministic logic out of the model's reasoning path entirely.

Bringing agentic AI to your phone lines

CallSphere takes these same skill-and-tool architectures and points them at voice and chat — agents that load the right playbook mid-call, use tools while the caller is on the line, and book real work around the clock. See it live at callsphere.ai.

Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.

How Claude Agent Skills Work: The Internal Architecture

Key takeaways

What a skill actually is on disk

Discovery: how Claude knows the skill exists

Loading: progressive disclosure in action

How skills, MCP, and tools fit together

The runtime lifecycle, end to end

Common pitfalls

Architecture review in 5 steps

Frequently asked questions

What is an Agent Skill in one sentence?

How is a skill different from an MCP server?

Why doesn't a large skill library blow up the context window?

Can a skill run code?

Bringing agentic AI to your phone lines

Try CallSphere AI Voice Agents

Related Articles You May Like

Where Claude Cowork is heading and how to prepare

Where Claude Code GTM engineering is heading next

Measuring Claude Cowork success: metrics that prove it

How to measure success of Claude Code GTM workflows

Claude Cowork walkthrough: from problem to shipped

End-to-end Claude Code GTM workflow: a real rebuild