Build a Claude Agent: Step-by-Step Tool-Use Walkthrough
A step-by-step guide to building a working Claude agent with tool use, the agentic loop, and structured outputs you can ship today. Code included.
Reading about agent architecture is one thing; getting a cursor blinking in a real loop is another. This is a hands-on walkthrough for building a working Claude agent from an empty file. By the end you'll have an agent that takes a natural-language request, decides which of your tools to call, executes them, and returns a grounded answer — the same skeleton that powers coding assistants, support bots, and internal automation tools. I'll use Python, but the shape translates directly to the TypeScript, Go, and Ruby SDKs.
Step 1: The smallest thing that works
Start with a single message and no tools, just to confirm your credentials and model are wired up. The client resolves your key from the environment, so don't hardcode it.
import anthropic
client = anthropic.Anthropic()
resp = client.messages.create(
model="claude-opus-4-8",
max_tokens=1024,
messages=[{"role": "user", "content": "Say hi in five words."}],
)
print(next(b.text for b in resp.content if b.type == "text"))Notice the response is a list of content blocks, not a string. You filter for the text block. This matters because once tools enter the picture, the same response can carry thinking blocks and tool_use blocks side by side, and you'll need to handle each by type.
Step 2: Define a real tool
An agent is only as useful as the actions it can take. Define one tool that does something concrete — say, looking up an order's status. With the SDK's tool runner you write a typed function and decorate it; the schema is generated from the signature and docstring.
from anthropic import beta_tool
@beta_tool
def get_order_status(order_id: str) -> str:
"""Look up the current status of a customer order.
Args:
order_id: The order identifier, e.g. ORD-4821.
"""
record = db.fetch(order_id) # your data layer
if record is None:
return f"No order found with id {order_id}."
return f"Order {order_id}: {record.status}, ships {record.eta}."The docstring is not decoration — it becomes the tool description the model reads to decide when to call it. Be prescriptive: say when to use the tool, name the argument format, and describe the return shape. Vague descriptions are the single most common reason an agent ignores a tool it should have used.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Step 3: Let the runner drive the loop
Here's where the walkthrough diverges from a plain API call. The tool runner handles the whole agentic loop: it calls the API, sees the tool_use request, runs your function, feeds the result back, and repeats until Claude is done.
runner = client.beta.messages.tool_runner(
model="claude-opus-4-8",
max_tokens=4096,
tools=[get_order_status],
messages=[{"role": "user",
"content": "Where is order ORD-4821 and when does it ship?"}],
)
for message in runner:
for block in message.content:
if block.type == "text":
print(block.text)Each iteration yields a complete BetaMessage; the loop stops automatically when there are no more tool calls. That's a full agent in roughly fifteen lines. The flow underneath is worth visualizing before we make it production-grade.
flowchart TD
A["User: where is ORD-4821?"] --> B["tool_runner calls Messages API"]
B --> C{"Model output"}
C -->|text only| F["Print answer, stop"]
C -->|tool_use: get_order_status| D["Runner executes function"]
D --> E["db.fetch returns record"]
E --> G["Append tool_result"]
G --> BStep 4: Take control with a manual loop
The runner is great until you need a gate — human approval before a destructive call, custom logging, or conditional execution. Then you write the loop yourself. The pattern is mechanical: call, check stop_reason, execute tool_use blocks, append results, repeat.
messages = [{"role": "user", "content": user_input}]
while True:
resp = client.messages.create(
model="claude-opus-4-8", max_tokens=4096,
tools=tools, messages=messages)
if resp.stop_reason == "end_turn":
break
calls = [b for b in resp.content if b.type == "tool_use"]
messages.append({"role": "assistant", "content": resp.content})
results = []
for call in calls:
out = execute(call.name, call.input) # your dispatch
results.append({"type": "tool_result",
"tool_use_id": call.id, "content": out})
messages.append({"role": "user", "content": results})Two details people miss: you must append the full resp.content (including the tool_use blocks) before sending results, and every tool_result must carry the matching tool_use_id. Get either wrong and the API rejects the turn or the model loses track of what it asked for.
Step 5: Make the output structured
For an agent feeding a downstream system, free text is a liability. Constrain the final answer to a schema with messages.parse() and a Pydantic model, so what you hand off is guaranteed to validate.
from pydantic import BaseModel
class Answer(BaseModel):
order_id: str
status: str
ships_on: str
resp = client.messages.parse(
model="claude-opus-4-8", max_tokens=1024,
messages=messages, output_format=Answer)
print(resp.parsed_output.status)Step 6: Handle the unhappy paths
Before you ship, wrap the loop in typed exception handling — RateLimitError, APIStatusError for 5xx, APIConnectionError — and return tool failures as tool_result blocks with is_error: true rather than throwing. That last point is what lets the agent recover: when a tool returns an informative error, the model reads it and tries a different approach instead of the loop dying. Also handle pause_turn if you use any server-side tools, and set a max-iterations cap so a confused agent can't loop forever.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Frequently asked questions
Should I use the tool runner or write the loop myself?
Start with the tool runner — it eliminates an entire class of bugs around appending content and matching tool_use_ids. Drop to a manual loop only when you need human-in-the-loop approval, custom logging, or conditional execution that the runner can't express.
How many tools should one agent have?
Keep the set focused. Too many tool schemas in context can confuse selection and inflate token use. If you genuinely have dozens, look at the tool-search pattern, which lets the model discover relevant tools on demand instead of loading every schema upfront.
How do I stop the loop from running forever?
Cap iterations explicitly in the manual loop, and rely on stop_reason == "end_turn" as the natural exit. For server-side tools watch for pause_turn and bound the number of continuations you allow.
What max_tokens should I set?
Default to a generous value (around 4K–16K for non-streaming) so answers aren't truncated mid-thought. For long outputs, stream and raise the ceiling. Only go small for classification-style tasks where you want a one-word reply.
Bringing agentic AI to your phone lines
The same loop you just built — define tools, let the model call them, return structured results — is exactly what CallSphere runs for voice and chat, so an assistant can check an order, book an appointment, or escalate a call without a human in the seat. Hear it answer a live call at callsphere.ai.
Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.