Build a Claude Agent: A Step-by-Step Walkthrough (Building Effective AI Agents)

Reading about agent architecture is one thing; getting a blinking cursor to turn into a working agent is another. This is the walkthrough I wish I had the first time — a linear path from an empty Python file to a Claude agent that plans, calls tools, recovers from failures, and stops when it should. No hand-waving, no "and then add error handling later." We build it in the order you would actually build it.

The running example is a small "research assistant" agent that can search a knowledge base and fetch a URL, then synthesize an answer. It is deliberately modest so the scaffolding is visible. Everything here uses the Claude Messages API with the Agent SDK conventions current in 2026.

Key takeaways

Start with the dumbest possible loop and a single tool — get one full perceive-act-observe cycle working before anything else.
Define tools as explicit JSON schemas; the description field does more for reliability than any prompt tweak.
Add a turn cap and token budget before you add a second tool, not after.
Normalize every tool result — success and failure — into the same structured shape Claude can reason over.
Trace every turn from day one; you cannot debug an agent you cannot replay.

Step 1 — the bare loop

Begin with the smallest thing that exercises the full cycle: send a prompt, let Claude ask for a tool, run it, feed the result back. Resist adding features. The goal of this step is to see the model request a tool and your code execute it.

from anthropic import Anthropic
client = Anthropic()

TOOLS = [{
  "name": "kb_search",
  "description": "Search the internal knowledge base. Use for any factual question about our product or docs. Returns up to 5 snippets with source IDs.",
  "input_schema": {
    "type": "object",
    "properties": {"query": {"type": "string"}},
    "required": ["query"]
  }
}]

def kb_search(query):
    return {"snippets": lookup(query)[:5]}

That description field is not decoration. "Use for any factual question about our product or docs" tells Claude exactly when to reach for this tool versus answering from its own knowledge. Vague descriptions are the number one cause of agents that either never call tools or call them constantly.

Step 2 — drive the loop

Now wrap the call in a loop that handles the tool_use stop reason. This is the engine. Every later feature hangs off this skeleton, so get the control flow exactly right before decorating it.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

def run(goal, max_turns=8):
    messages = [{"role": "user", "content": goal}]
    for turn in range(max_turns):
        resp = client.messages.create(
            model="claude-opus-4-8", system=SYSTEM,
            messages=messages, tools=TOOLS, max_tokens=2048)
        messages.append({"role": "assistant", "content": resp.content})
        if resp.stop_reason != "tool_use":
            return resp
        results = []
        for b in resp.content:
            if b.type == "tool_use":
                out = dispatch(b.name, b.input)
                results.append({"type": "tool_result",
                    "tool_use_id": b.id, "content": json.dumps(out)})
        messages.append({"role": "user", "content": results})
    return resp  # hit the turn cap

The max_turns guard is in place from the first real version. This is intentional. An agent without a turn cap is a billing incident waiting to happen, and you will forget to add it later.

How a single request flows

Before adding more tools, picture what one user goal does as it moves through the code you just wrote. The diagram below maps the exact control flow of the run function, including the two ways the loop can terminate.

flowchart TD
  A["Goal in"] --> B["messages.create with tools"]
  B --> C{"stop_reason == tool_use?"}
  C -->|No| D["Return final answer"]
  C -->|Yes| E["dispatch each tool_use block"]
  E --> F["Append tool_result to messages"]
  F --> G{"turn < max_turns?"}
  G -->|Yes| B
  G -->|No| D

Two terminal edges both land on "Return final answer": one when Claude is genuinely done, one when the turn cap fires. Make sure your code returns something sensible in the cap case — a partial answer plus a flag is far better than an exception that loses the work.

Step 3 — robust dispatch

The dispatch function is where reliability lives. It must turn every outcome, including exceptions and timeouts, into a structured result the model can read and act on. A tool that throws and crashes the loop is useless; a tool that returns a clean error lets Claude apologize, retry, or try a different approach.

def dispatch(name, args):
    try:
        fn = REGISTRY[name]
        return {"ok": True, "data": fn(**args)}
    except KeyError:
        return {"ok": False, "error": f"unknown tool {name}"}
    except Exception as e:
        return {"ok": False, "error": str(e), "retryable": True}

Returning retryable: true is a small signal that pays off enormously. Claude reads it and decides whether to try again or change course, instead of you encoding that policy in brittle runtime code. The same idea extends to richer signals: a requires_human flag tells the agent to escalate rather than thrash, and an empty: true on a zero-result search tells it the query was fine but found nothing — a distinction the model genuinely uses when deciding whether to broaden its search or give up gracefully.

A subtle point about dispatch: keep it pure plumbing. The temptation is to sneak business logic in here — "if the search returned nothing, automatically broaden it." Don't. That decision belongs to the model. The dispatcher's only job is to run the tool and report what happened in a shape the model can read. The moment you start making decisions in dispatch, you have quietly turned your agent back into a hard-coded workflow.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Step 4 — budgets and tracing

With two or three tools registered, add a token budget that accumulates resp.usage each turn and halts when it crosses a ceiling. Then add a trace: log the full messages array and usage after every turn to a file or table keyed by a run ID. When an agent does something baffling, you replay the trace turn by turn and the cause is almost always obvious within a minute.

Step 5 — add a system prompt with a stop rule

The last piece is the system prompt, and the part people skip is the stop rule. Tell the agent, in plain language, what "done" looks like for this task and what to do when it is stuck. Something as simple as "When you have an answer supported by at least one source, write it and stop. If two searches return nothing useful, say so and stop" prevents the most common failure mode — an agent that keeps searching variations of the same query forever. Pair this prompt-level stop with the runtime turn cap from step 2 and you have two independent guarantees that the loop terminates, which is exactly what you want for anything running unattended.

Common pitfalls

Adding tools before the loop is solid. Each new tool multiplies the ways the loop can misbehave. Lock the engine first.
Forgetting to append the assistant message. If you execute the tool but never add the assistant's tool_use block to messages, the next API call is malformed. Append both sides every turn.
Stringifying errors as plain text. Return JSON with an ok flag so the model can branch on it reliably.
No run ID. Without a correlation ID you cannot tie logs, traces, and costs together when investigating.
Overlong tool descriptions. One or two precise sentences beat a paragraph; Claude reads the whole catalog every turn, so verbosity costs tokens and clarity.

Ship it in five steps

Write one tool with a sharp description and a hard-coded executor.
Build the loop with a max_turns cap and correct message appending.
Wrap dispatch so every result is structured JSON, including errors.
Add a token budget and a per-turn trace keyed by run ID.
Register remaining tools one at a time, testing each in isolation before combining.

Single tool vs. full agent

Capability	One-shot call	Agent loop
Multi-step tasks	No	Yes
Recovers from tool errors	No	Yes
Cost predictability	High	Needs a budget
Best for	Extraction, classification	Research, automation

Frequently asked questions

How many tools should my first agent have?

One. Get a single tool working through the full loop, then add the second only once the first is reliable. Most early agent bugs come from too many tools with overlapping descriptions, which forces Claude to guess.

What should max_turns be?

Start low — six to eight — and raise it only if you observe legitimate tasks getting cut off. A low cap surfaces looping bugs early instead of letting them hide behind a generous ceiling.

Should I stream responses during development?

Not at first. Streaming adds parsing complexity that obscures the loop logic. Build and debug with non-streaming calls, then add streaming for the user-facing layer once the agent behaves correctly.

Agentic AI for every conversation

The same loop you just built — decide, call a tool, observe, repeat — is exactly what powers CallSphere's voice and chat agents, which handle live calls, use tools mid-conversation, and book work around the clock. Try it at callsphere.ai.

Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.

Build a Claude Agent: A Step-by-Step Walkthrough (Building Effective AI Agents)

Key takeaways

Step 1 — the bare loop

Step 2 — drive the loop

How a single request flows

Step 3 — robust dispatch

Step 4 — budgets and tracing

Step 5 — add a system prompt with a stop rule

Common pitfalls

Ship it in five steps

Single tool vs. full agent

Frequently asked questions

How many tools should my first agent have?

What should max_turns be?

Should I stream responses during development?

Agentic AI for every conversation

Try CallSphere AI Voice Agents

Related Articles You May Like

Where Claude Cowork is heading and how to prepare

Where Claude Code GTM engineering is heading next

Measuring Claude Cowork success: metrics that prove it

How to measure success of Claude Code GTM workflows

Claude Cowork walkthrough: from problem to shipped

End-to-end Claude Code GTM workflow: a real rebuild