Skip to content
Agentic AI
Agentic AI8 min read0 views

Build a Claude Agent: A Step-by-Step Walkthrough (Enterprise AI Transformation Claude)

A concrete, code-level walkthrough to build an enterprise Claude agent from empty repo to a running tool-using loop with MCP and guardrails.

Architecture diagrams are useful right up until you open an empty editor and have to type something that runs. This post is the opposite of a diagram: it is a concrete, ordered walkthrough an engineer can follow to stand up a working Claude agent that uses tools, respects guardrails, and could plausibly graduate to production. We will build a small internal-operations agent — one that can look up an order, check shipping status, and draft a customer reply — because that shape generalizes to almost any enterprise use case. By the end you will have a tool-using loop you understand line by line.

Key takeaways

  • Start with the smallest agent that does one real task; resist adding tools you do not yet need.
  • Define each tool as a name, description, and JSON schema, then write the actual handler function behind it.
  • The agent loop you write by hand is only about 30 lines — most of the work is good tools and good context.
  • Add a dry-run mode and a confirmation gate before any tool that changes state.
  • Instrument every turn with logging and a turn cap before you let anyone else touch it.

Step 1: Scaffold and define the contract

Begin with a single file and the Anthropic SDK. Before writing any loop, decide the agent's contract: what it is allowed to do. For our operations agent that is three tools — lookup_order, get_shipping_status, and draft_reply. The first two read; the third produces text a human will review. Writing the contract first keeps scope honest.

const tools = [
  {
    name: "lookup_order",
    description: "Fetch order details by order ID. Read-only.",
    input_schema: {
      type: "object",
      properties: { order_id: { type: "string" } },
      required: ["order_id"]
    }
  },
  {
    name: "get_shipping_status",
    description: "Return current carrier status for an order's shipment.",
    input_schema: {
      type: "object",
      properties: { order_id: { type: "string" } },
      required: ["order_id"]
    }
  }
];

These objects are not documentation — they are the literal interface Claude reasons over. Notice how each description states the side-effect profile ("Read-only"). That single phrase nudges the model toward safe defaults and gives your gate something to key on later.

Step 2: Write the loop

The loop is the spine. You send the conversation plus the tool list to Claude. If the response asks to use a tool, you run the matching handler, append the result, and call again. You stop when Claude returns a normal text answer or you hit a turn cap. The diagram makes the control flow concrete.

flowchart TD
  A["Append user message"] --> B["Call Claude with tools"]
  B --> C{"stop_reason == tool_use?"}
  C -->|No| D["Return final text"]
  C -->|Yes| E["Run matching handler"]
  E --> F{"State-changing tool?"}
  F -->|Yes| G["Require confirmation"]
  F -->|No| H["Append tool result"]
  G --> H
  H --> I{"Turn cap reached?"}
  I -->|No| B
  I -->|Yes| D

In code, that flow is short. The key detail is stop_reason: when Claude wants a tool it returns tool_use blocks, and you must respond with matching tool_result blocks before the next call, or the conversation is malformed.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
let messages = [{ role: "user", content: userInput }];
for (let turn = 0; turn < MAX_TURNS; turn++) {
  const res = await client.messages.create({
    model: "claude-sonnet-4-6",
    max_tokens: 1024,
    tools,
    messages
  });
  messages.push({ role: "assistant", content: res.content });
  if (res.stop_reason !== "tool_use") return res;
  const results = [];
  for (const block of res.content) {
    if (block.type !== "tool_use") continue;
    const output = await handlers[block.name](block.input);
    results.push({ type: "tool_result", tool_use_id: block.id, content: output });
  }
  messages.push({ role: "user", content: results });
}

Step 3: Implement the handlers

Each tool name maps to a real function. Keep handlers thin: validate input, call your actual system, return a compact string or JSON the model can read. Do not return a 4,000-row dump; return the rows that matter. The handler is also where you enforce reality — a missing order should return a clear "not found," not throw, so the model can recover gracefully.

const handlers = {
  lookup_order: async ({ order_id }) => {
    const o = await db.orders.find(order_id);
    if (!o) return "No order found for " + order_id;
    return JSON.stringify({ id: o.id, status: o.status, total: o.total });
  },
  get_shipping_status: async ({ order_id }) => {
    return await carrier.status(order_id);
  }
};

Step 4: Add a system prompt and the draft tool

So far the agent has only read-only tools. The task asked for a drafted customer reply, which is text the model produces and a human approves — a soft action, not a hard write. This is where a clear system prompt does real work. The prompt sets the agent's role, its hard constraints, and the shape of its output, so the model knows it must look up facts before drafting and must never invent order details. A few well-chosen sentences here save dozens of corrective tool calls later.

const system = `You are an operations assistant.
Always look up the order before drafting a reply.
Never invent order numbers, totals, or ship dates.
If data is missing, say so and ask for the order ID.
When drafting, keep replies under 120 words and friendly.`;

Pass this as the system field on each call. Notice it does not list the tools or restate their schemas — those already live in the tools array, and duplicating them in the prompt only invites drift between the two. The system prompt is for judgment and tone; the tool definitions are for capability. Keeping that separation clean is one of the quiet habits that distinguishes an agent that ages well from one that rots.

Step 5: Add guardrails before anyone else runs it

An agent that can only read is low-risk; the moment you add a write tool, you need a gate. The simplest effective pattern is a dry-run flag plus a confirmation step. In dry-run, state-changing handlers log what they would do and return that description instead of executing. For real runs, a human approves the proposed action. Combine this with a turn cap so a confused agent cannot loop a thousand times and a per-session token budget so cost stays bounded.

This is also the moment to add structured logging. Record, per turn, the tools requested, their inputs, the outputs, and the token usage. When the agent does something surprising next week, this trace is the only thing that will let you explain why. A few lines of structured logging now is the difference between a five-minute root cause and a lost afternoon.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Step 6: Promote tools to an MCP server

The in-process handlers above are perfect for a prototype. To make the same tools reusable across agents and teams, move them behind a Model Context Protocol server. The server exposes the identical schemas over a standard transport, so any Claude host — Claude Code, Claude Cowork, or your own app — can discover and call them without copying code. The walkthrough does not change; you swap local function calls for MCP calls and gain reuse, auth, and isolation for free.

StageTool deliveryBest for
PrototypeIn-process handlersOne agent, fast iteration
TeamShared MCP serverReuse, central auth, audit
OrgMCP + skillsProcedures encoded, governed rollout

Common pitfalls

  • Forgetting tool_result blocks. Every tool_use must be answered with a matching tool_result in the next message, or the API rejects the call.
  • Returning raw, oversized output. A handler that returns megabytes of JSON blows the context budget. Trim to what the model needs to decide.
  • No turn cap. Without MAX_TURNS, a single misjudgment can spin into an expensive loop. Cap it from day one.
  • Skipping the dry-run. Letting write tools execute during development invites real, irreversible mistakes against real data.
  • Choosing the biggest model reflexively. Sonnet handles most tool-loops well; reserve Opus for genuinely hard reasoning to control latency and spend.

Ship your first agent in 6 steps

  1. Pick one task and the two or three tools it needs.
  2. Write the tool schemas, marking read vs. write.
  3. Implement thin handlers that return compact results.
  4. Drop in the loop with a turn cap and tool dispatch.
  5. Add a dry-run + confirmation gate and per-turn logging.
  6. Move the tools behind an MCP server once a second agent needs them.

Frequently asked questions

Which Claude model should I start with?

Start with Sonnet 4.6 for most tool-using agents — it is fast and capable. Move specific hard-reasoning steps to Opus 4.8 and offload cheap, high-volume calls to Haiku 4.5. Pick per task, not per project.

How do I stop the agent from looping forever?

Set a hard MAX_TURNS in the loop and a per-session token budget. If either is hit, return what the agent has so far rather than continuing. A loop limit is the single most important safety control in a hand-built agent.

When should I switch from local handlers to MCP?

As soon as a second agent or team needs the same tools, or you want central authentication and audit. MCP gives you reuse and governance without rewriting your loop.

Bringing agentic AI to your phone lines

The same tool-loop you just built is what powers CallSphere's voice and chat agents — they look things up mid-call, draft replies, and complete bookings live. Hear it work at callsphere.ai.


Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.