Skip to content
Agentic AI
Agentic AI8 min read0 views

Build Your First Claude Agent Skill: A Walkthrough

Step-by-step: build a working Claude Agent Skill from an empty folder to a tested, agent-loaded capability with a bundled script and house-style rules.

Reading about Agent Skills only gets you so far. The understanding clicks when you've built one with your own hands — picked the task, written the description that makes it fire, bundled a script, and watched an agent actually use it on a real request. This walkthrough does exactly that. We'll build a single, concrete skill end to end: a release-notes generator that turns a list of merged pull requests into a clean, house-style changelog. By the end you'll have a folder you can drop into Claude Code or a Claude Agent SDK project and a clear mental model for building any skill.

I'm using release notes because it has the two ingredients every good skill needs: judgment the model is good at (deciding what's user-facing, grouping changes, choosing tone) and deterministic plumbing better handled by code (parsing the input, enforcing the section order). That split is the heart of skill design, and you'll see it shape every step.

Step 1: Define the trigger before you write anything

Resist the urge to start writing instructions. Start by deciding exactly when this skill should fire, because that decision becomes the description, and the description is what the runtime matches against. Write three or four real phrasings a user might use: "draft release notes for this release," "turn these merged PRs into a changelog," "write the v2.4 release summary." Find the common shape. Ours is: the user has a set of changes and wants a published-quality changelog in our format.

That gives us a description: "Generate house-style release notes and changelogs from a list of merged pull requests or commits. Use when the user wants a changelog, release summary, or 'what changed' write-up for a version." Notice it names both the inputs (PRs, commits) and the trigger situations (changelog, release summary). A description this specific fires reliably; a generic one like "helps with releases" would not.

Step 2: Scaffold the folder

Create the directory structure. Everything the skill needs lives inside one folder so it can be copied or version-controlled as a unit:

release-notes/
  SKILL.md
  scripts/
    parse_prs.py
  reference/
    house-style.md

The SKILL.md holds the metadata and the procedure. scripts/parse_prs.py does deterministic parsing. reference/house-style.md holds the longer style rules we don't want cluttering the main body — it's loaded only when the model needs the fine detail.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
flowchart TD
  A["Define trigger & description"] --> B["Scaffold skill folder"]
  B --> C["Write SKILL.md procedure"]
  C --> D["Add parse_prs.py script"]
  D --> E["Agent matches description"]
  E --> F["Load body & run script"]
  F --> G{"Output passes style check?"}
  G -->|No| H["Revise reference rules"]
  H --> C
  G -->|Yes| I["Ship the skill folder"]

Step 3: Write the SKILL.md body

The body is a briefing for a capable colleague, written in the second person. Keep it as a numbered procedure so the model follows it predictably. Ours reads roughly like this: (1) If the user gave raw PR text, run scripts/parse_prs.py to normalize it into a structured list. (2) Drop anything internal-only — dependency bumps, CI tweaks, refactors with no user impact. (3) Group the rest under our fixed headings: Added, Changed, Fixed, Deprecated. (4) Write each line in imperative, user-facing voice. (5) If tone or edge cases are unclear, read reference/house-style.md.

That last line is deliberate. The body stays short and skimmable; the heavy detail sits in the reference file and loads only when a branch of the work needs it. This is progressive disclosure applied inside a single skill, and it keeps your context budget honest.

A subtle point about voice: write the body as if briefing a new hire on their first day, not as if configuring a machine. "Drop anything internal-only" reads better and is followed more faithfully than a terse "filter(label != internal)." The model is a strong reader of natural instruction, and prose that explains the intent behind a step — why we cut internal changes, so users aren't shown noise they don't care about — generalizes to cases your literal rule didn't anticipate. Explain the why once, briefly, and the model handles the long tail of edge cases you never enumerated.

Step 4: Write the deterministic script

The script handles what code does better than prose. parse_prs.py takes pasted PR lines — titles, numbers, labels — and emits clean JSON: title, PR number, label, author. It strips noise, normalizes labels to our taxonomy, and exits non-zero with a clear message if the input is unparseable. The point is to hand the model structured data instead of asking it to eyeball a messy paste, which is where formatting errors creep in.

Keep scripts small, single-purpose, and loud about failure. When a script fails, its stderr returns to the model's context, so a precise error message ("no PR numbers found — expected lines like '#1234 Title (label)'") lets the agent recover or ask the user a sharp follow-up instead of flailing.

Step 5: Load it and run a real request

Drop the release-notes/ folder into your project's skills directory (or register it through the Agent SDK). Start a session and give it a genuine task — paste a dozen merged PRs and ask for release notes. Watch the trace. You should see the agent recognize the skill from its description, read the body, invoke parse_prs.py, and assemble the changelog under the right headings. If it skips the script or ignores a rule, that's signal: the instruction wasn't explicit enough, not that the model is incapable.

Test the unhappy paths too. Feed it PRs that are all internal — it should produce a near-empty changelog and say so, not invent user-facing features. Feed it malformed input — the script should fail cleanly and the agent should ask for the right format. These cases are where skills earn their reliability.

Step 6: Add a reference file for the long-tail detail

Our skill works, but real changelogs have edge cases: how to phrase a breaking change, how to handle a security fix you can't fully disclose, whether to credit external contributors. None of that belongs in the main procedure — it would bloat the body and load on every run even when today's release has no breaking changes. This is exactly what reference/house-style.md is for. Move the long-tail rules there and keep the body's pointer: "If tone or edge cases are unclear, read reference/house-style.md."

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

The discipline pays off the first time the model hits an unusual release. It reads the section it needs, applies the rule, and moves on — without you having carried that rule's weight through a hundred ordinary runs. As you discover new edge cases in production, you add them to the reference file rather than the body. The body stays the same lean procedure it was on day one; the skill gets smarter underneath it. This is the single habit that keeps a maturing skill from collapsing under its own accumulated rules.

Step 7: Iterate on the description and the body separately

When something's off, diagnose which layer to fix. If the skill doesn't fire at all, the problem is the description — broaden or sharpen the trigger phrasings. If it fires but does the work wrong, the problem is the body or the script — add an explicit rule, tighten the procedure, or move a fuzzy decision into code. Treating these as two distinct knobs keeps iteration fast. Most beginners conflate them and end up rewriting the whole skill when one line was the issue.

Once it's solid, the skill is shippable as-is. Commit the folder, share it with teammates, or bundle it into a plugin. The same folder works across any Claude-based agent that scans skills, and it'll get better automatically as the underlying model improves — you never touch the skill to benefit from a model upgrade.

Frequently asked questions

What's the minimum viable skill?

A single folder with one SKILL.md containing a metadata header (name and a trigger-rich description) and a short procedure body. Scripts and reference files are optional — add them only when you have deterministic work or detail that shouldn't live in the main body.

How do I know my skill will actually fire?

Write three or four real user phrasings up front and make sure the description covers those situations, not just keywords. Then test with paraphrased requests. If it fires on the exact wording but not on synonyms, the description is too narrow and needs broader trigger language.

Where should logic go — the body or a script?

Put judgment in the body (what counts as user-facing, tone, grouping) and put deterministic, error-prone mechanics in a script (parsing, normalizing, validating). The split makes the skill both reliable and easy to reason about when something breaks.

Do I need to redeploy when Claude updates?

No. Skills are model-independent folders. A stronger model reads the same instructions and follows them better automatically, so you get the upgrade without editing the skill.

Bringing agentic AI to your phone lines

CallSphere turns this build-a-skill workflow into voice and chat agents — capabilities that load the right procedure mid-conversation, run real scripts while the caller waits, and finish the task. See a working agent at callsphere.ai.


Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.