Skip to content
Agentic AI
Agentic AI6 min read0 views

A Step-by-Step Claude Opus + Claude Code Walkthrough

A reproducible Claude Code workflow with Opus: write CLAUDE.md, plan first, drive tools, iterate on real test output, then verify the diff before committing.

Reading about agentic coding is one thing; sitting down and shipping a real change with it is another. This walkthrough is the version I wish I had when I started — a concrete sequence you can follow on your own repository to make Claude Opus inside Claude Code productive on the first afternoon, not the third week. We will take a single realistic task, adding rate limiting to an API endpoint, and move through it the way an experienced operator does, calling out the decision at each step.

The goal is not to show off a clever one-liner prompt. It is to build a repeatable workflow: prepare the ground, get a plan before code, let Opus drive the tools, iterate against real signals, and verify before you trust. Each phase has a specific best practice attached, and by the end you will have a loop you can reuse on any task.

Step 1: Prepare the project so Opus starts informed

Before you type a single request, give the agent a map. Create a CLAUDE.md at the repo root that states the stack, the package manager, how to run the test suite, the lint command, and any conventions a new engineer would need on day one. This file is injected into context on nearly every turn, so a tight, accurate one saves Opus from rediscovering your build commands by trial and error.

Keep it lean and factual. List the exact commands — "run tests with pytest -q", "format with ruff format" — rather than prose. If you have a directory whose purpose is non-obvious, one line about it pays for itself. Resist the urge to dump the entire architecture; the harness lets Opus read files on demand, so CLAUDE.md should hold only what is hard to infer and expensive to get wrong.

Step 2: Ask for a plan before any edits

Open the task by asking Opus to investigate and propose a plan, explicitly without writing code yet. A prompt like "Look at how requests flow through the API layer and propose where rate limiting should live. Don't edit anything; just give me a plan with the files you'd touch" sets the right mode. Opus will grep, read the relevant handlers, and come back with a scoped proposal.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

This step is the highest-leverage habit in the whole workflow. A plan is cheap to read and cheap to correct, while a wrong implementation is expensive to unwind. You catch misunderstandings — it picked the wrong middleware layer, it missed an existing limiter — before they cost tokens and edits. Only when the plan looks right do you say "go ahead and implement that."

flowchart TD
  A["Write CLAUDE.md"] --> B["Ask Opus for a plan"]
  B --> C{"Plan correct?"}
  C -->|No| D["Correct scope & constraints"] --> B
  C -->|Yes| E["Opus implements via tools"]
  E --> F["Run tests & linter"]
  F --> G{"Green?"}
  G -->|No| E
  G -->|Yes| H["Review diff & commit"]

Step 3: Let Opus drive the tools, and watch the trace

Once you approve the plan, Opus enters the agent loop: it reads the handler, writes the limiter, wires it into the route, and adds a test. You will see each tool call stream by — the greps, the file edits, the shell commands. Do not look away. The trace is your window into the model's reasoning, and the cheapest place to catch a wrong turn is the moment it happens.

If you see it about to edit a generated file or run a migration you did not expect, interrupt. Claude Code lets you steer mid-loop. A quick "stop — don't touch the generated client, edit the source schema instead" course-corrects far more cheaply than letting the run finish and reverting. The best operators treat the session as a pair-programming conversation, not a fire-and-forget command.

Step 4: Iterate against real signals, not vibes

When the first pass lands, run the tests. If they fail, hand the failure straight back: paste the failing output or simply say "the rate-limit test fails with this error" and let Opus read the traceback. The single biggest accelerator here is giving the model ground truth — actual test output, actual stack traces, actual lint warnings — rather than your paraphrase of what went wrong.

This is why a working test command in CLAUDE.md matters so much. With it, Opus can run the suite itself, see the red, and fix forward without you in the loop for every cycle. Let it iterate two or three rounds against the real signal. If it gets stuck looping on the same failure, that is your cue to step in with a hypothesis — usually it is missing a piece of context only you hold.

Step 5: Verify before you trust the green checkmark

A passing suite is necessary, not sufficient. Read the diff yourself. Opus is strong, but you are accountable for the code. Check that the limiter actually keys on the right identifier, that error responses use your conventions, and that no test was quietly weakened to pass. Ask pointed questions: "What happens when the limiter's backing store is unreachable?" Good follow-ups surface gaps the happy-path tests missed.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

When the diff holds up, commit with a clear message. A practical habit is to ask Opus to summarize the change and the reasoning behind the design choice; that summary makes a good commit body and a good PR description. Now you have not just a change but a record of why it was made — which is what turns a one-off into a maintainable contribution.

Frequently asked questions

Should I let Opus write code on the first prompt?

Usually no. Ask for a plan first. Reviewing and correcting a plan is far cheaper than unwinding a wrong implementation, and it surfaces misunderstandings before any edits land.

How do I stop Opus from going down a wrong path?

Watch the streaming tool trace and interrupt the moment something looks off. Claude Code supports mid-loop steering, so a short corrective instruction redirects the run without a full revert.

Why does giving real test output help so much?

Opus debugs best against ground truth. An actual traceback tells it exactly what failed and where, whereas a paraphrase forces it to guess. A working test command in CLAUDE.md lets it run the suite and fix forward on its own.

Do I still need to review the diff if tests pass?

Yes. Passing tests prove the cases you wrote, not correctness. Read the diff, probe edge cases like failure of backing stores, and confirm no test was weakened to go green.

Bringing agentic AI to your phone lines

This same plan, act, verify rhythm is exactly how a reliable voice agent should operate. CallSphere brings these agentic patterns to voice and chat — assistants that gather context, act through tools mid-call, and confirm outcomes before booking work. See it live at callsphere.ai.


Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.