Skip to content
Agentic AI
Agentic AI8 min read0 views

Build Your First Claude Skill: A Step-by-Step Guide

A hands-on walkthrough for building, testing, and shipping a Claude Agent Skill — folder layout, SKILL.md, bundled scripts, and debugging triggers.

The fastest way to understand Agent Skills is to build one that actually does something, watch it trigger, and then deliberately break it so you learn the failure modes. In this walkthrough we will build a Skill that turns messy customer-export CSVs into a clean, validated report — a task that is annoying to describe in a prompt every time and error-prone when Claude does the parsing by hand. By the end you will have a working folder, you will understand exactly why it loads when it does, and you will know how to debug the moment it refuses to.

Step 1: Create the folder and the SKILL.md spine

A Skill is a directory. Create one named after the job, not after the tool — csv-report-builder, say. Inside it, the only mandatory file is SKILL.md. Its YAML frontmatter carries two fields the runtime cares about: name and description. The name is a stable identifier; the description is the single most important line in the whole Skill because it is what Claude reads to decide whether to load you.

Write the description in terms of triggers and outcomes, not vague capability. "Helps with data" will almost never fire. "Use when the user provides a customer CSV export and wants it cleaned, validated, and summarized into a report" fires reliably because it names the input, the intent, and the output. Spend real time here — a Skill with a perfect body and a fuzzy description is invisible.

A useful exercise before you write a single line of the body: imagine ten different ways a teammate might phrase the request that should trigger this Skill, then make sure your description contains the nouns and verbs those phrasings share. If half of them would not obviously match, the description is too narrow or too abstract. This five-minute check up front saves you the far more frustrating experience later of a Skill that works perfectly once loaded but mysteriously refuses to load for half the requests it should handle.

Step 2: Write the body as a procedure, not an essay

Below the frontmatter, the Markdown body is the instructions Claude loads once triggered. Treat it like a runbook a competent new hire would follow. List the steps in order: detect the delimiter, validate required columns, drop rows with malformed emails, compute the summary stats, and emit the report in the agreed format. Be specific about edge cases you have already hit — duplicate headers, BOM characters, mixed date formats — because those are exactly the places an unprompted model improvises and gets it wrong.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
flowchart TD
  A["Create csv-report-builder/ dir"] --> B["Write SKILL.md: name + trigger description"]
  B --> C["Author body as ordered runbook"]
  C --> D["Add scripts/validate.py for parsing"]
  D --> E["Drop Skill into .claude/skills/"]
  E --> F["Send a real CSV task"]
  F --> G{"Did the Skill load?"}
  G -->|Yes| H["Iterate on body + script"]
  G -->|No| I["Sharpen description triggers"]
  I --> F

Step 3: Bundle a script for the deterministic parts

Anything that is pure mechanics — parsing, validation, math — should be code, not prose the model executes in its head. Add a scripts/validate.py that reads a CSV, applies the rules, and prints a structured result. In the body, instruct Claude to run that script rather than reason through the parse manually. This is the single biggest reliability win in Skill authoring: you offload the brittle, repetitive work to deterministic code and reserve the model for judgment — deciding what the report should emphasize, writing the prose summary, handling ambiguity the script flagged.

The script can assume it runs from the Skill directory, so reference bundled resources with relative paths. Keep it dependency-light; a Skill that needs an exotic package to be pre-installed will fail silently on a teammate's machine. Standard-library Python or a single well-known dependency is the safe zone.

There is a design decision worth making explicit here: have the script print structured output — a small JSON blob or a clearly tagged block — rather than a paragraph of prose. When the script returns {"rows_in": 4120, "rows_dropped": 37, "bad_emails": 37}, Claude reads those numbers directly and builds the report around them. When it returns a chatty sentence, the model has to re-parse the prose and sometimes re-derives a value incorrectly. Structured hand-offs between your deterministic code and the model's judgment are the seam where reliability is won or lost, so design that seam deliberately rather than letting it happen by accident.

Step 4: Install it and trigger it for real

Drop the folder where the agent looks for Skills — for Claude Code that is typically a .claude/skills/ directory in the project, or a user-level skills location for ones you want everywhere. Restart or refresh so the metadata index picks it up. Now send a genuine task: attach a CSV and ask for a cleaned report. If the description is good, Claude reads the index, matches your request, loads the body, and follows the runbook — including running your script.

Watch the trace. You want to see the Skill name appear and the script invocation happen. If both occur and the output is right, you have a working Skill. Test it again with a deliberately broken CSV — missing columns, junk rows — and confirm the validation logic catches it instead of producing a confidently wrong report. That second test is what separates a demo from something you trust in production.

Step 5: Debug the most common failure — it won't load

The number one problem is a Skill that simply never fires. Ninety percent of the time the description is the culprit: too abstract, missing the words a user would actually use, or overlapping ambiguously with another Skill. Rewrite it to name the trigger concretely and re-test. The second most common issue is the body asking for a resource that is not in the folder, or a script that errors because of an environment assumption — check that the script runs standalone before you blame the model. The third is scope collision: two Skills with descriptions so similar that Claude picks the wrong one. Tighten both descriptions so their boundaries are crisp.

Step 6: Promote, version, and share

Once it works, treat the Skill like code. Commit the folder to your repo so the whole team gets it. Because Skills are portable folders, the same csv-report-builder works in Claude Code, in Cowork as part of a plugin, and inside an Agent SDK build with no changes. Version it the way you version any internal tool, and when you improve the runbook, everyone benefits on their next pull. That portability is what turns a one-off prompt into durable team infrastructure.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

One habit pays off enormously as your Skill count grows: keep a short changelog comment at the top of the body and review Skill edits in pull requests like any other code. A Skill that silently drifts — a teammate tweaks the runbook to fix their case and breaks yours — is the agentic equivalent of an untested config change in production. Because the whole Skill is plain files in your repo, you already have every tool you need to govern it: diffs, reviews, blame, and the ability to roll back. Use them. The teams that get the most out of Skills are the ones that treat them with the same seriousness as the rest of their codebase rather than as throwaway prompt snippets.

Frequently asked questions

Where exactly do I put the Skill folder?

For project-scoped Skills, a .claude/skills/ directory inside the project is the conventional home; for Skills you want available everywhere, use the user-level skills location. The runtime scans these on startup and indexes the frontmatter. Refresh after adding one so the index updates.

My Skill never triggers — what's the first thing to check?

The description. It is what Claude matches against the task. Rewrite it to name the concrete input and intent ("use when the user provides X and wants Y") rather than a vague capability. This fixes the large majority of non-triggering Skills.

Should logic live in the body or in a script?

Deterministic mechanics — parsing, validation, arithmetic — belong in a bundled script that Claude runs. Judgment, summarization, and handling ambiguity belong in the model. Splitting them this way makes the Skill both cheaper and more reliable.

Can one task pull in more than one Skill?

Yes. If a request matches several descriptions, Claude can load multiple Skill bodies and combine them. Keep descriptions distinct so it loads the ones you intend rather than a confusing overlap.

Bringing agentic AI to your phone lines

CallSphere applies these same build-and-trigger patterns to voice and chat agents that load the right procedure mid-call, run tools to validate and book work, and never miss a conversation. See it live at callsphere.ai.


Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.