Build a Claude Agent Step by Step with the Agent SDK
A concrete, follow-along walkthrough for building a production Claude agent in 2026 — from a bare loop to tools, MCP, skills, and a real stop condition.
Reading about agent architecture is one thing; getting a working agent running on your own machine is another. This is a hands-on walkthrough. We'll build a single Claude agent from an empty file up to something that reads code, calls a tool, queries an external service over MCP, and knows when to stop. No hand-waving — every step is a thing you actually type, in the order you'd type it. By the end you'll have a mental template you can reuse for any agent you ship.
Step 1: Start with the smallest possible loop
Before tools or MCP, build the bare heartbeat. Create a project, install the Claude Agent SDK, and write a loop that sends a prompt to the model, prints the reply, and exits. The Claude Agent SDK is Anthropic's toolkit for building production agents on top of the same primitives that power Claude Code — the model loop, tool execution, context management, and subagents. Pick your model deliberately at this stage: Sonnet 4.6 is the sensible default for development, Haiku 4.5 for cheap high-volume steps, Opus 4.8 when a task needs the deepest reasoning.
Run it with a trivial request like "summarize this paragraph." If you see a coherent answer, your credentials, model selection, and transport are all working. Resist the urge to add features. A reliable agent is a reliable loop with things bolted on; if the loop is shaky, nothing above it will be stable. Confirm this layer end to end before moving on.
Step 2: Give the agent one tool
Now make it act. Define a single tool — say, a function that reads a file from disk. A tool definition has three parts the model relies on: a clear name, a one-line description of when to use it, and a JSON schema for its inputs. Register the tool with the SDK, then change your prompt to something that requires it: "read config.json and tell me which port the server uses."
Watch the loop now. The model won't answer directly; it will emit a tool call with the filename as an argument. The SDK runs your function, captures the result, appends it to context, and re-invokes the model, which now answers using the file's real contents. This is the entire agentic pattern in miniature. If the model calls the wrong tool or passes bad arguments, the fix is almost always the description and schema, not the model — invest in making those crisp.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
flowchart TD
A["Write bare model loop"] --> B["Add one file-read tool"]
B --> C["Register MCP server"]
C --> D{"Tool call emitted?"}
D -->|Yes| E["SDK executes tool / MCP"]
E --> F["Append result to context"]
F --> G{"Stop condition met?"}
G -->|No| D
G -->|Yes| H["Return final answer"]
D -->|No| H
Step 3: Connect an MCP server
File reads are local. Real agents need to reach systems you don't own the code for — a database, an issue tracker, a search API. That's where MCP comes in. Configure an MCP server in your agent's settings: point it at the server command or URL, supply credentials through environment variables rather than hardcoding them, and let the SDK handle the handshake that lists the server's available tools.
Once connected, the server's tools appear to the model exactly like your local file-read tool — same call shape, same result shape. Ask a question that needs live data: "how many open tickets are assigned to the billing team?" The model picks the MCP tool, the SDK routes the call to the server, and the structured result flows back into context. The first time you do this, log every MCP request and response. Most early failures are schema mismatches or auth errors at the server boundary, and you want them visible.
Step 4: Add a skill for repeatable procedures
By now your agent can act and fetch data, but it doesn't yet know your way of doing things. Encode that as a skill: a folder with an instructions file describing a procedure, plus any helper scripts or reference files it needs. Maybe it's "how we triage a bug report" — check these fields, query that service, format the summary this way.
The agent loads the skill only when a task matches, so it costs no context the rest of the time. Test it by giving a request that should trigger the skill and confirming the agent follows your procedure instead of improvising. Skills are where institutional knowledge lives; the difference between a generic agent and one that feels like a seasoned teammate is almost entirely the quality of its skills.
Step 5: Define a real stop condition
An agent without a clear stop condition either quits too early or loops forever burning tokens. Decide explicitly what "done" means for your task and make it checkable. For a coding agent, done might be "the test suite passes." For a research agent, "every sub-question has a sourced answer." Encode the check as a tool the agent can call or a hook the harness runs after each step.
Pair this with a hard ceiling — a maximum number of iterations or a token budget — so a confused agent fails loudly instead of spinning silently. In practice the stop condition is the difference between a demo and something you'd run unattended. Build it before you trust the agent with anything that costs money or touches production.
Step 6: Observe, then harden
With the agent working, instrument it. Log each turn: what the model decided, which tool it called, the arguments, and the result. When something goes wrong — and it will — this trace tells you exactly which step drifted. Common fixes you'll discover: a tool description that's too vague, an MCP result that's too verbose and crowds context, or a skill that should have triggered but didn't.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
From there, hardening is incremental. Add retries and idempotency keys to tools that mutate state. Add a moderation or approval hook before destructive actions. Introduce a subagent only when you hit work that genuinely parallelizes. Each addition is small and testable because you built the foundation cleanly. That's the whole method: a solid loop, one capability at a time, observability throughout.
Frequently asked questions
Do I need Claude Code to build an agent, or just the SDK?
The SDK is enough to build a custom agent in your own application. Claude Code is the ready-made agentic coding tool built on the same primitives. Use Claude Code when you want an out-of-the-box terminal/IDE agent; use the Agent SDK when you're embedding agentic behavior into your own product or workflow.
How do I stop the agent from looping forever?
Define an explicit, checkable stop condition for the task and enforce a hard iteration or token ceiling alongside it. The stop condition tells the agent when it has succeeded; the ceiling guarantees it fails loudly rather than spinning silently if it gets confused.
Which Claude model should I develop against?
Start with Sonnet 4.6 — it balances capability, speed, and cost for most development. Drop to Haiku 4.5 for cheap, high-volume steps, and reach for Opus 4.8 on tasks that need the strongest reasoning. You can route different steps to different models within one agent.
What's the single most common reason a tool call goes wrong?
A weak tool description or schema. The model decides when and how to call a tool almost entirely from those two fields. If it picks the wrong tool or passes bad arguments, sharpen the description and tighten the schema before blaming the model.
Bringing agentic AI to your phone lines
The same step-by-step pattern — a loop, tools, MCP, skills, and a firm stop condition — is exactly how CallSphere builds voice and chat agents that pick up every call, act on live data, and book work without a human in the loop. Try one yourself at callsphere.ai.
Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.