Build a Claude Code agent: a step-by-step walkthrough
Follow a concrete build of a working agent on Claude Code primitives — memory, schema-typed tools, the loop, HTML-framed results, and a real task.
Reading about agent architecture only gets you so far. The fastest way to internalize how Claude Code works is to build a small agent and watch each piece light up. This walkthrough takes you from an empty directory to an agent that reads a codebase, makes a scoped change, runs the tests, and reports back — using the same HTML-shaped context discipline that keeps real agents coherent. Follow it in order; each step assumes the last one worked.
Step 1: Lay out the project and give Claude its memory
Start with a clean repository and a single file that orients the agent: a CLAUDE.md at the root. This is the agent's standing brief — what the project is, how to run it, what conventions to honor, what never to touch. Keep it tight and factual. Treat it like the <head> of a page: metadata that applies to every turn, not a place to dump narrative.
A good memory file states the build command, the test command, the directory map in two lines, and three or four hard rules ("never edit generated files," "all DB access goes through the repository layer"). The agent reads this on startup, so anything you would otherwise repeat in every prompt belongs here. This single step removes most of the per-task friction people blame on the model.
Step 2: Define your tools with strict schemas
An agent is only as capable as the tools you give it. For this build you need three: read a file, write a file, and run a shell command. Each tool needs a name, a one-line description written for the model ("Read a UTF-8 text file and return its contents with line numbers"), and a JSON Schema for its arguments. The schema is not bureaucracy — it is how the runtime validates what Claude asks for before anything executes.
Write descriptions from the model's point of view. Say what the tool does, what it returns, and when to use it. A vague description like "file tool" produces vague tool use; a precise one produces precise calls. Mark read tools as safe and write/shell tools as requiring confirmation, so the permission gate knows which calls to wave through and which to hold.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Step 3: Wire the agent loop
Now the engine. The loop holds a transcript, sends it to Claude, and reacts to the response. The diagram shows the exact cycle you are about to implement.
flowchart TD
A["Load CLAUDE.md + task"] --> B["Assemble tagged context"]
B --> C["Call Claude (Sonnet 4.x)"]
C --> D{"Tool calls present?"}
D -->|No| E["Print final report & stop"]
D -->|Yes| F["Validate & run each tool"]
F --> G["Wrap output in labeled block"]
G --> H{"Turn cap hit?"}
H -->|Yes| E
H -->|No| BIn code, the loop is small. Maintain a list of message blocks. On each iteration, send the system prompt plus the running blocks to the model. If the response contains tool-use blocks, validate each against its schema, execute it, and append a tool-result block tied to the call's id. If the response contains no tool calls, you have your final answer — print it and break. Add a turn cap (say twelve) so a confused agent cannot spin forever.
Step 4: Frame tool results like HTML, not like logs
This is the step most people skip and most regret. When your file-read tool returns content, do not append a naked string. Wrap it: a clear opening marker with the path, line-numbered content, and a closing marker. When your shell tool returns output, separate stdout, stderr, and exit code into labeled regions, and truncate anything enormous with an explicit "[truncated 1,200 lines]" note.
The reason is the same one that makes browsers robust: explicit boundaries let the reader navigate. Claude can refer to "line 84 of src/server.ts" only if line numbers are present and the file's edges are marked. It can tell a real error from noise only if stderr is its own labeled block. Spend your effort here and the model's behavior visibly sharpens.
Step 5: Run a real task and watch the loop work
Give the agent something concrete: "Add input validation to the createUser endpoint and update its test." Watch the transcript. The agent should read the memory file, search for the endpoint, read the handler and its test, propose an edit, write it (the permission gate prompts you), run the test command, read the result block, and either finish or fix and retry. Each turn is one trip around the loop you built.
If it flails, the cause is almost always context, not capability. A tool result that wasn't labeled, a file dropped without a marker, a description that was too vague — those are the failure modes. Fix the framing and rerun. You are debugging the structure, not the model.
Step 6: Add a hook to enforce a non-negotiable
Finally, bolt on a guarantee. Register a post-edit hook that runs your formatter and linter every time a file is written, and rejects the change if the linter fails. Now the agent physically cannot leave malformed code behind, regardless of what it intended. This is the production pattern: let the model reason freely, and let deterministic hooks enforce the rules you refuse to leave to chance.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
With those six steps you have the whole shape of a real Claude Code agent in miniature — memory, schema-typed tools, a bounded loop, HTML-framed results, a live task, and a hard enforcement hook. Everything bigger is the same pattern scaled up.
Frequently asked questions
Do I need the Claude Agent SDK to build this?
The SDK gives you the loop, tool plumbing, permission gate, and hooks out of the box, so you write less infrastructure. You can also build the loop yourself against the model API to learn the mechanics; the concepts in this walkthrough are identical either way.
How do I stop the agent from looping forever?
Add a turn cap and a token budget as guards in the loop, and terminate when the model returns a response with no tool calls. A confused agent burns turns quickly, so a cap of around a dozen turns is a reasonable starting guard for small tasks.
Why line-number the file contents I return?
Line numbers give the model stable coordinates to reference and edit against. Without them, an agent has to quote surrounding text to locate a change, which is brittle; with them, it can say "replace lines 40 to 52" precisely.
What's the single biggest mistake in a first build?
Returning raw, unframed tool output. Wrapping results in labeled, bounded blocks with truncation and separate error regions does more for reliability than any prompt tweak, because it lets the model navigate context instead of guessing at it.
Bringing agentic AI to your phone lines
The same build pattern — memory, typed tools, a bounded loop, and framed results — is what powers CallSphere's voice and chat agents, which take real calls and messages, call tools mid-conversation, and book work 24/7. See a working agent at callsphere.ai.
Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.