Skip to content
Agentic AI
Agentic AI7 min read0 views

Inside Claude Code Skills: The Architecture End to End

How Claude Code Skills work end to end: discovery, progressive disclosure, the SKILL.md contract, and how skills, MCP, and subagents fit together.

The first time you watch Claude Code reach for a skill mid-task, it can feel like magic: you ask it to fix a flaky deploy, and out of nowhere it pulls in a folder of release-engineering instructions you wrote three weeks ago, runs a bundled script, and gets the steps right that it used to fumble. There is no magic — there is an architecture, and once you understand the pieces, you can build skills that fire reliably instead of hoping they trigger. This post walks the whole system end to end, from how a skill is even noticed to how its instructions land in the model's context window.

What a skill actually is

An Agent Skill is a folder on disk containing a SKILL.md file plus any supporting scripts, reference documents, and templates. The SKILL.md opens with YAML frontmatter — a name and a description — followed by Markdown instructions. That is the entire contract. The folder can sit in a project's .claude/skills directory, in the user's home configuration, or ship inside a plugin. Nothing about a skill is compiled or registered ahead of time; it is discovered by scanning directories at startup.

The reason this format matters is that it separates two concerns that older prompt-engineering approaches jammed together. The frontmatter is metadata used for routing — deciding whether this skill is relevant right now. The body is the payload — the detailed knowledge the model uses once the skill is chosen. Keeping routing cheap and the payload rich is the central design idea, and it is what lets a single agent carry hundreds of latent capabilities without drowning its context window.

Progressive disclosure: the core mechanism

The heart of the architecture is progressive disclosure. Claude Code does not load every skill's full instructions into context. At session start it reads only the lightweight metadata — each skill's name and one-line description — and holds that compact index in the system prompt. When your request comes in, the model matches it against that index. Only when a skill looks relevant does Claude read the full SKILL.md body into context, and only when the body points at a bundled file does that file get read. This is a three-tier loading strategy: metadata always, instructions on demand, resources on demand.

The flow below shows how a single user prompt travels from arrival to a loaded skill and back to an answer.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
flowchart TD
  A["User prompt arrives"] --> B["Claude scans skill metadata index"]
  B --> C{"Any description match the task?"}
  C -->|No| D["Answer with base model + tools"]
  C -->|Yes| E["Read full SKILL.md body into context"]
  E --> F{"Body references a script or doc?"}
  F -->|Yes| G["Load bundled resource on demand"]
  F -->|No| H["Follow instructions directly"]
  G --> H
  H --> I["Execute steps, call tools, return result"]

Progressive disclosure is what makes the token economics work. If you have fifty skills averaging two thousand tokens of instructions each, eagerly loading them would burn a hundred thousand tokens before the user even speaks. With the index approach, you pay only a few hundred tokens for the catalog and the full cost of exactly the skills that fire. This is the same principle that makes a good library index useful: you read the spine labels, not every book.

How the description field drives routing

Because routing happens against the description alone, that one line is the single most load-bearing string in the whole skill. A vague description like "helps with data" will either never trigger or trigger constantly. A precise one — "Use when converting CSV exports from the billing system into the reconciled ledger format, including currency normalization" — gives the model a sharp signal about both when and for what. The model is effectively doing semantic retrieval over these descriptions, so they should read like the trigger conditions an experienced teammate would recognize.

A useful mental model is that the description is a tiny classifier prompt. It needs the activating nouns (the systems, file types, and domains involved) and the activating verbs (convert, reconcile, deploy, audit). Engineers who treat the description as an afterthought end up with skills that exist on disk but never load — the most common failure mode in practice, and one that no amount of body quality can fix.

Where MCP and subagents fit

Skills do not replace tools; they orchestrate them. Model Context Protocol servers give Claude Code access to external systems — a database, a ticketing API, a file store — by exposing typed tools. A skill is the layer that teaches Claude how and when to use those tools for a particular job. The MCP server provides the verbs; the skill provides the playbook. You can ship both together in a plugin so that installing the plugin wires up the connector and the know-how at once.

Subagents add a third axis. Claude Code can spawn parallel subagents, each with its own context window, and a skill's instructions can tell the orchestrator to fan a task out — for example, "spawn one reviewer per changed file." Each subagent can itself discover and load skills. The architecture composes cleanly because every layer respects the same context-window discipline: load metadata broadly, load detail narrowly, and keep each agent's working set small enough to reason over.

The lifecycle of one skilled task

Putting it together, a single task moves through distinct phases. Discovery happens once at startup when directories are scanned and the index is built. Matching happens per turn as the model compares the request to descriptions. Activation reads the chosen body into context. Execution follows the instructions, calling MCP tools or shell commands and pulling in bundled resources only as the steps demand them. Finally, the result is composed and the heavy context can be dropped, leaving room for the next turn.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Understanding these phases tells you exactly where to debug. If a skill never fires, the problem is in matching — fix the description. If it fires but does the wrong thing, the problem is in the body — tighten the instructions. If it loads a script that fails, the problem is in execution — check the bundled resource and its environment. The architecture's clean seams are also its debugging seams.

Frequently asked questions

What is an Agent Skill in one sentence?

An Agent Skill is a folder containing a SKILL.md file with a name, a description, and Markdown instructions, which Claude loads dynamically when its description matches the current task. It packages reusable procedural knowledge plus optional scripts and reference files.

How does Claude decide which skill to use?

At startup Claude reads a lightweight index of every skill's name and description. For each request it matches the task against those descriptions and loads the full instructions only for the skills that look relevant, so the description field is what controls triggering.

Do skills increase token usage a lot?

Not if progressive disclosure is working. Only the compact metadata index is always in context; full instructions and bundled resources load on demand, so you pay the large cost only for the skills that actually fire on a given task.

How are skills different from MCP servers?

MCP servers expose external tools and data to Claude, while skills teach Claude how and when to use those tools for a specific job. They are complementary — connectors provide capability, skills provide the procedure, and plugins can bundle both.

Bringing agentic AI to your phone lines

The same discover-load-execute discipline that makes Claude Code skills reliable is exactly how CallSphere builds voice and chat agents that answer every call, pull the right playbook mid-conversation, and book work around the clock. See it live at callsphere.ai.

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.