---
title: "Build a Claude Agent: A Step-by-Step Walkthrough"
description: "Follow a concrete walkthrough to build a working Claude agent: define tools, run the loop, handle results, add a scratchpad, and test against real inputs."
canonical: https://callsphere.ai/blog/build-a-claude-agent-a-step-by-step-walkthrough
category: "Agentic AI"
tags: ["agentic ai", "claude", "agent tutorial", "tool use", "claude agent sdk", "agent loop"]
author: "CallSphere Team"
published: 2026-03-05T08:23:11.000Z
updated: 2026-06-06T21:47:43.897Z
---

# Build a Claude Agent: A Step-by-Step Walkthrough

> Follow a concrete walkthrough to build a working Claude agent: define tools, run the loop, handle results, add a scratchpad, and test against real inputs.

Reading about agent architecture is one thing; getting a loop to actually run, call a tool, and finish a task is another. This walkthrough builds a small but real Claude agent from an empty file to a working loop. The example task is a support-triage agent: it reads an incoming customer message, looks up the account, decides on an action, and drafts a reply. You can swap the tools for your own domain, but the skeleton is identical for almost any agent you will ever write. I will keep the code honest and minimal so the moving parts stay visible.

## Step 1: Define the task and the tools it needs

Start by writing down, in one sentence, what "done" looks like — "a drafted reply plus a chosen disposition (refund, escalate, or answer)" — and then list the tools required to get there. For triage that is roughly three: `lookup_account` to fetch the customer record, `search_kb` to find relevant help articles, and `submit_draft` to record the final reply and disposition. Resist adding tools you do not yet need; every extra tool is more schema for Claude to reason over and more surface area for mistakes.

Each tool gets a JSON schema describing its name, a one-line purpose, and its parameters with types and descriptions. Write these descriptions for Claude, not for yourself — say exactly when to use the tool and what it returns. A vague description like "gets account info" produces vague calls; "returns the customer's plan, open tickets, and refund eligibility given an email" produces precise ones.

## Step 2: Frame the system prompt and the loop

The system prompt sets the agent's job, its constraints, and its stopping condition. Tell Claude it is a triage agent, that it must look up the account before drafting, that refunds over a threshold must be escalated rather than auto-approved, and that it finishes by calling `submit_draft` exactly once. A clear stopping condition is what keeps the loop from spinning. Then write the loop: send the messages to Claude, inspect the response, run any tool calls, append results, and repeat until Claude stops requesting tools.

```mermaid
flowchart TD
  A["Customer message in"] --> B["Build messages + tool schemas"]
  B --> C["Call Claude"]
  C --> D{"submit_draft called?"}
  D -->|Yes| E["Persist reply + disposition, exit"]
  D -->|No| F["Run lookup_account / search_kb"]
  F --> G["Append tool_result to messages"]
  G --> H{"Step budget exceeded?"}
  H -->|Yes| I["Stop, flag for human"]
  H -->|No| C
```

Notice the two exits: the happy path where Claude calls `submit_draft`, and the safety valve where a step budget is exceeded and the case is flagged for a human. Never ship an agent loop without a hard step limit. It is the difference between a bounded cost per run and a runaway that burns tokens until someone notices.

## Step 3: Execute tools and format results honestly

When Claude emits a tool-use block, your executor matches it by name, validates the arguments against the schema, and runs the real function. The result must go back as a tool-result message tied to the same tool-use id. Return structured, compact data — JSON the model can parse — not a wall of prose. If a lookup returns nothing, say so explicitly: `{"found": false}` beats an empty string, because Claude can reason about an explicit negative but will guess about silence.

Error handling lives here too, and it matters more than the happy path. If `lookup_account` throws, do not crash the loop — return a tool result like `{"error": "account service timeout, retry once"}`. Claude is remarkably good at recovering when you tell it plainly what went wrong and what it can do about it. Swallowing the error or returning a raw stack trace both lead the agent astray.

## Step 4: Add a scratchpad for longer tasks

Triage is short, but the moment a task spans many steps you want an externalized scratchpad so the agent does not lose the plot when context tightens. Give it a `write_notes` and a `read_notes` tool, or in Claude Code simply let it write to a file. Early in the run, instruct the agent to record its plan and the facts it has gathered; later, instruct it to re-read those notes before deciding. This keeps live context lean and makes the agent's reasoning auditable after the fact.

The scratchpad also gives you a clean recovery point. If a run is interrupted, the notes file holds enough state to resume without starting over. For production agents this durability is not a nicety — it is what lets you retry safely after a transient failure without redoing expensive, already-completed steps.

## Step 5: Test against real and adversarial inputs

Before you trust the agent, run it against a fixed set of cases: a routine question, a refund just under the escalation threshold, one just over it, a message about an account that does not exist, and a deliberately confusing multi-part request. For each, assert on the observable outcome — the disposition and whether escalation fired — not on the exact wording of the reply. Wording varies run to run; the decision should not. This small eval suite is what lets you change the prompt later without silently breaking behavior.

Pay special attention to the boundary cases. The over-threshold refund that must escalate is exactly the kind of policy the model will occasionally try to shortcut if your system prompt is soft. Watching it fail that case once teaches you to make the constraint sharper — "you do not have authority to approve refunds above X; you must escalate" — and the fix sticks.

## Step 6: Decide single-agent versus subagents

For triage, one agent is plenty. But suppose the job grew to "triage, then if it is a bug, reproduce it and file a detailed report." Now you have two very different skills with different context needs, and splitting them into an orchestrator plus a reproduce-and-file subagent can keep each context clean. The rule of thumb: stay single-agent until a clear seam appears where subtasks are independent and one's intermediate mess would pollute the other. Multi-agent setups cost several times the tokens, so earn that cost before paying it.

## Frequently asked questions

### How many tools should my first agent have?

As few as the task strictly requires — often two or three. Each tool adds schema the model must weigh on every turn, so a lean tool set produces sharper decisions. Add tools only when a concrete step in your task cannot be completed without one, and remove any the agent never reaches for.

### What is the simplest way to stop an agent from looping forever?

Combine a clear stopping condition in the system prompt with a hard step budget in the loop. The prompt tells Claude how to know it is finished; the step budget guarantees termination even if the prompt fails. When the budget trips, hand the task to a human rather than returning a half-finished result silently.

### Should tool results be plain text or structured JSON?

Structured JSON, almost always. It is unambiguous, compact, and lets you encode explicit states like `found: false` or `error` that the model can reason about. Plain text invites the agent to guess at meaning, which is exactly what you are trying to avoid in a production loop.

## Bringing agentic AI to your phone lines

The same build-the-loop discipline in this walkthrough — clear tools, a step budget, honest results — is how CallSphere ships agents that work over **voice and chat**, answering calls, acting mid-conversation, and booking real jobs 24/7. Try it at [callsphere.ai](https://callsphere.ai).

---

*Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.*

---

Source: https://callsphere.ai/blog/build-a-claude-agent-a-step-by-step-walkthrough