---
title: "Build a Claude Agent: A Step-by-Step Walkthrough (Enterprise AI Transformation Claude)"
description: "A concrete, code-level walkthrough to build an enterprise Claude agent from empty repo to a running tool-using loop with MCP and guardrails."
canonical: https://callsphere.ai/blog/build-a-claude-agent-a-step-by-step-walkthrough-enterprise-ai-transfor
category: "Agentic AI"
tags: ["agentic ai", "claude", "implementation", "tool use", "anthropic", "agent sdk"]
author: "CallSphere Team"
published: 2026-04-25T08:23:11.000Z
updated: 2026-06-07T01:28:22.483Z
---

# Build a Claude Agent: A Step-by-Step Walkthrough (Enterprise AI Transformation Claude)

> A concrete, code-level walkthrough to build an enterprise Claude agent from empty repo to a running tool-using loop with MCP and guardrails.

Architecture diagrams are useful right up until you open an empty editor and have to type something that runs. This post is the opposite of a diagram: it is a concrete, ordered walkthrough an engineer can follow to stand up a working Claude agent that uses tools, respects guardrails, and could plausibly graduate to production. We will build a small internal-operations agent — one that can look up an order, check shipping status, and draft a customer reply — because that shape generalizes to almost any enterprise use case. By the end you will have a tool-using loop you understand line by line.

## Key takeaways

- Start with the **smallest agent that does one real task**; resist adding tools you do not yet need.
- Define each tool as a name, description, and JSON schema, then write the actual **handler function** behind it.
- The agent **loop** you write by hand is only about 30 lines — most of the work is good tools and good context.
- Add a **dry-run mode** and a confirmation gate before any tool that changes state.
- Instrument every turn with **logging and a turn cap** before you let anyone else touch it.

## Step 1: Scaffold and define the contract

Begin with a single file and the Anthropic SDK. Before writing any loop, decide the agent's contract: what it is allowed to do. For our operations agent that is three tools — `lookup_order`, `get_shipping_status`, and `draft_reply`. The first two read; the third produces text a human will review. Writing the contract first keeps scope honest.

```
const tools = [
  {
    name: "lookup_order",
    description: "Fetch order details by order ID. Read-only.",
    input_schema: {
      type: "object",
      properties: { order_id: { type: "string" } },
      required: ["order_id"]
    }
  },
  {
    name: "get_shipping_status",
    description: "Return current carrier status for an order's shipment.",
    input_schema: {
      type: "object",
      properties: { order_id: { type: "string" } },
      required: ["order_id"]
    }
  }
];
```

These objects are not documentation — they are the literal interface Claude reasons over. Notice how each description states the side-effect profile ("Read-only"). That single phrase nudges the model toward safe defaults and gives your gate something to key on later.

## Step 2: Write the loop

The loop is the spine. You send the conversation plus the tool list to Claude. If the response asks to use a tool, you run the matching handler, append the result, and call again. You stop when Claude returns a normal text answer or you hit a turn cap. The diagram makes the control flow concrete.

```mermaid
flowchart TD
  A["Append user message"] --> B["Call Claude with tools"]
  B --> C{"stop_reason == tool_use?"}
  C -->|No| D["Return final text"]
  C -->|Yes| E["Run matching handler"]
  E --> F{"State-changing tool?"}
  F -->|Yes| G["Require confirmation"]
  F -->|No| H["Append tool result"]
  G --> H
  H --> I{"Turn cap reached?"}
  I -->|No| B
  I -->|Yes| D
```

In code, that flow is short. The key detail is `stop_reason`: when Claude wants a tool it returns `tool_use` blocks, and you must respond with matching `tool_result` blocks before the next call, or the conversation is malformed.

```
let messages = [{ role: "user", content: userInput }];
for (let turn = 0; turn  {
    const o = await db.orders.find(order_id);
    if (!o) return "No order found for " + order_id;
    return JSON.stringify({ id: o.id, status: o.status, total: o.total });
  },
  get_shipping_status: async ({ order_id }) => {
    return await carrier.status(order_id);
  }
};
```

## Step 4: Add a system prompt and the draft tool

So far the agent has only read-only tools. The task asked for a drafted customer reply, which is text the model produces and a human approves — a soft action, not a hard write. This is where a clear system prompt does real work. The prompt sets the agent's role, its hard constraints, and the shape of its output, so the model knows it must look up facts before drafting and must never invent order details. A few well-chosen sentences here save dozens of corrective tool calls later.

```
const system = `You are an operations assistant.
Always look up the order before drafting a reply.
Never invent order numbers, totals, or ship dates.
If data is missing, say so and ask for the order ID.
When drafting, keep replies under 120 words and friendly.`;
```

Pass this as the `system` field on each call. Notice it does not list the tools or restate their schemas — those already live in the `tools` array, and duplicating them in the prompt only invites drift between the two. The system prompt is for judgment and tone; the tool definitions are for capability. Keeping that separation clean is one of the quiet habits that distinguishes an agent that ages well from one that rots.

## Step 5: Add guardrails before anyone else runs it

An agent that can only read is low-risk; the moment you add a write tool, you need a gate. The simplest effective pattern is a dry-run flag plus a confirmation step. In dry-run, state-changing handlers log what they *would* do and return that description instead of executing. For real runs, a human approves the proposed action. Combine this with a turn cap so a confused agent cannot loop a thousand times and a per-session token budget so cost stays bounded.

This is also the moment to add structured logging. Record, per turn, the tools requested, their inputs, the outputs, and the token usage. When the agent does something surprising next week, this trace is the only thing that will let you explain why. A few lines of structured logging now is the difference between a five-minute root cause and a lost afternoon.

## Step 6: Promote tools to an MCP server

The in-process handlers above are perfect for a prototype. To make the same tools reusable across agents and teams, move them behind a Model Context Protocol server. The server exposes the identical schemas over a standard transport, so any Claude host — Claude Code, Claude Cowork, or your own app — can discover and call them without copying code. The walkthrough does not change; you swap local function calls for MCP calls and gain reuse, auth, and isolation for free.

| Stage | Tool delivery | Best for |
| --- | --- | --- |
| Prototype | In-process handlers | One agent, fast iteration |
| Team | Shared MCP server | Reuse, central auth, audit |
| Org | MCP + skills | Procedures encoded, governed rollout |

## Common pitfalls

- **Forgetting tool_result blocks.** Every `tool_use` must be answered with a matching `tool_result` in the next message, or the API rejects the call.
- **Returning raw, oversized output.** A handler that returns megabytes of JSON blows the context budget. Trim to what the model needs to decide.
- **No turn cap.** Without `MAX_TURNS`, a single misjudgment can spin into an expensive loop. Cap it from day one.
- **Skipping the dry-run.** Letting write tools execute during development invites real, irreversible mistakes against real data.
- **Choosing the biggest model reflexively.** Sonnet handles most tool-loops well; reserve Opus for genuinely hard reasoning to control latency and spend.

## Ship your first agent in 6 steps

1. Pick **one task** and the two or three tools it needs.
2. Write the tool **schemas**, marking read vs. write.
3. Implement **thin handlers** that return compact results.
4. Drop in the **loop** with a turn cap and tool dispatch.
5. Add a **dry-run + confirmation gate** and per-turn logging.
6. Move the tools behind an **MCP server** once a second agent needs them.

## Frequently asked questions

### Which Claude model should I start with?

Start with Sonnet 4.6 for most tool-using agents — it is fast and capable. Move specific hard-reasoning steps to Opus 4.8 and offload cheap, high-volume calls to Haiku 4.5. Pick per task, not per project.

### How do I stop the agent from looping forever?

Set a hard `MAX_TURNS` in the loop and a per-session token budget. If either is hit, return what the agent has so far rather than continuing. A loop limit is the single most important safety control in a hand-built agent.

### When should I switch from local handlers to MCP?

As soon as a second agent or team needs the same tools, or you want central authentication and audit. MCP gives you reuse and governance without rewriting your loop.

## Bringing agentic AI to your phone lines

The same tool-loop you just built is what powers CallSphere's **voice and chat** agents — they look things up mid-call, draft replies, and complete bookings live. Hear it work at [callsphere.ai](https://callsphere.ai).

---

*Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.*

---

Source: https://callsphere.ai/blog/build-a-claude-agent-a-step-by-step-walkthrough-enterprise-ai-transfor
