Build a Claude Code agent: a step-by-step walkthrough (Built With Opus Hackathon)
An hour-by-hour walkthrough of building a working Claude Code agent at a Built-with-Opus hackathon: scaffold, loop, tools, skills, and ship.
Architecture diagrams are nice, but at a hackathon you need to actually build the thing. This post is the implementation walkthrough I wish we had on hour one of a Built-with-Opus event: a linear path from empty directory to a working Claude Code agent that does real work, with the decisions called out where they matter. Follow it in order and you will have something demoable in an afternoon.
We will build a small but honest agent: one that takes a vague request like "summarize what changed in this repo this week and draft release notes," reasons about it, calls a couple of tools, and produces a real artifact. The point is the method, not the demo — swap the domain and the steps are identical.
Step 1 — Scaffold the project and pin the goal
Start by writing down the agent's job in one sentence and pinning it where the model will always see it. Create a project directory, a config file holding the model id (Opus 4.8 for the planning-heavy work) and the turn and budget caps, and a single SYSTEM.md that states the agent's role, its hard constraints, and its definition of done. Resist the urge to write three pages here. A tight, specific system prompt outperforms a sprawling one, because every extra paragraph competes for the model's attention on every turn.
Concretely, your SYSTEM.md should answer four questions: who the agent is, what it is allowed to do, what it must never do, and what "finished" looks like. That fourth question is the one beginners skip, and it is why their agents ramble. Give the model an explicit stopping target and it will aim for it.
Step 2 — Write the driver loop before any tools
Build the loop with zero tools first, so you can confirm the message round-trip works. The loop sends the system prompt plus the running transcript to the model, receives a response, and — for now — just prints it. Once that round-trips cleanly, add the tool-call branch: if the response contains tool calls, execute them, append the results to the transcript, and continue; otherwise, treat it as the final answer and stop. Wrap the whole thing in a turn counter that hard-stops at, say, twenty iterations.
flowchart TD
A["Scaffold + SYSTEM.md"] --> B["Driver loop, no tools"]
B --> C["Add first tool with schema"]
C --> D["Test: does model call it?"]
D -->|No| E["Tighten tool description"]
E --> D
D -->|Yes| F["Add a skill for the task"]
F --> G["Add stop + budget caps"]
G --> H["Demo end to end"]
Doing the loop first feels slow but pays off immediately: when a tool later misbehaves, you already trust the surrounding machinery, so you know the bug is in the tool. Teams that wrote tools and loop simultaneously spent the night unable to tell which half was broken.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Step 3 — Add your first tool with a real schema
Pick the single most valuable tool and define it precisely. A tool is a name, a JSON schema for its arguments, and a handler. For our example, the first tool is list_commits with arguments { since: string, until: string } returning an array of commit objects. Two things make or break this step. First, the description field is prompt engineering, not documentation — write it for the model: "List git commits in a date range. Use this before drafting release notes to learn what changed." Second, validate the arguments in the handler and return a structured error the model can read, never an uncaught exception that crashes the loop.
Run the agent and watch whether it actually calls the tool unprompted. If it does not, the description is too vague or the system prompt did not establish the need. Tighten the description, re-run, repeat. This tight feedback loop is the core skill of agent building.
Step 4 — Layer in a skill for the task
Now teach the agent how to do the domain task well. A skill is a folder of instructions and optional helper scripts that the agent loads only when the task is relevant. For release notes, the skill contains a markdown file describing your house style — group changes by type, lead with user-facing impact, link issue numbers — plus maybe a small script that formats the final output. The skill loader injects this only when the request matches, so your base context stays small.
The difference is dramatic. Without the skill, the agent produces a flat bullet list. With it, the agent groups, prioritizes, and matches your team's voice — because the know-how arrived exactly when needed instead of cluttering the system prompt permanently.
Step 5 — Add the safety rails: stop conditions and budgets
Before you demo, make the agent impossible to run away. Add three guards to the loop: a turn cap (already in place), a wall-clock budget that aborts after N seconds, and a repeat-call detector that stops if the model issues the same tool call with identical arguments twice running. Each guard catches a different failure mode — runaway planning, slow external calls, and stubborn retry loops respectively. At a hackathon, a runaway agent during the live judging is the most common self-inflicted disaster, and these three lines of code prevent it.
Also add a dry-run flag for any tool with side effects. During development and judging, you want to see the agent's intended writes before they happen. Flipping the flag to live should be a one-line change, not a refactor.
Step 6 — Wire observability and do a full dress rehearsal
Finally, log every turn: iteration number, tool name, argument hash, latency, and token count. Then run the whole task end to end at least twice on different inputs. The second run is where you catch the prompt that only works on your one happy-path example. If the agent succeeds on input A but stalls on input B, the trace tells you exactly which turn diverged. Fix it, run a third time, and you are demo-ready.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
The teams that medaled were not the ones with the cleverest idea — they were the ones whose agents survived an unfamiliar input in front of the judges. That survival comes entirely from steps 5 and 6, which is why I refuse to skip them even when the clock is screaming.
Frequently asked questions
How long does this take for a first agent?
An experienced engineer can get through all six steps in three to five hours, with most of the time spent on step 3 tuning the tool description until the model calls it reliably. The loop and scaffold are quick; the model-facing prose is where the iteration lives.
Should I use the Claude Agent SDK or build the loop by hand?
For a hackathon, building the loop by hand for the first agent teaches you the mechanics and is only about eighty lines. Once you understand the loop, the Agent SDK saves real time on production concerns like retries, streaming, and tool plumbing. Learn the manual version first, then graduate.
What's the most common mistake in this walkthrough?
Writing tools before the loop works, and writing a giant system prompt. Both stem from skipping the discipline of testing one layer at a time. Add capability only after the layer beneath it is proven.
How do I know my tool description is good enough?
The model calls the tool, unprompted, in the situations you intended, and does not call it when it shouldn't. If it over-calls, the description is too broad; if it under-calls, the description doesn't connect to a need stated in the system prompt.
Bringing agentic AI to your phone lines
The same build sequence powers CallSphere's voice and chat agents — loop, tools, skills, and guardrails — answering calls and booking jobs 24/7. Watch a built-out agent handle real conversations at callsphere.ai.
Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.