Skip to content
Agentic AI
Agentic AI7 min read0 views

A Claude Code dynamic workflow walkthrough end to end

A realistic end-to-end Claude Code workflow: from a vague ticket to a merged PR — the plan, subagents, evals that caught a boundary bug, and what shipped.

Most explanations of dynamic workflows stay abstract. Let's stay concrete instead. This is a walkthrough of one realistic piece of work — adding rate limiting to an internal API — taken from a vague ticket all the way to a merged pull request, using dynamic workflows in Claude Code. I will name the decisions a senior engineer makes at each step, where Claude takes over, where the human stays in the loop, and what actually shipped. The point is not that this exact feature matters; it is that the shape of the work generalizes to almost anything you build.

The starting point: a ticket that is barely a spec

The ticket reads: "API is getting hammered by one client, add rate limiting." That is not a specification — it is a symptom and a guess at the fix. Before any agent touches anything, the human's first job is to turn this into something an autonomous workflow can act on and verify. What counts as "hammered"? Per client or global? What happens when a limit is hit — a 429, a queue, a hard block? Should existing well-behaved clients notice anything?

Ten minutes of thinking produces real acceptance criteria: per-API-key sliding-window limit, default 100 requests per minute, configurable per key, return HTTP 429 with a Retry-After header, no impact on clients under the limit, and full test coverage of the boundary cases. That spec is the contract. Everything downstream is judged against it, which is exactly what makes a dynamic workflow trustworthy rather than a guessing game.

Handing it to a dynamic workflow

Now Claude Code takes the spec and assembles its own plan rather than following hardcoded steps. It explores the codebase to learn how middleware is structured, where API keys are validated, and which datastore is available for counters. It proposes an approach — a sliding-window counter in the existing Redis instance, implemented as middleware — and lays out the steps it intends to run. The human reviews this plan, not the code yet: is the approach sound, does it fit the architecture, are the edge cases covered?

flowchart TD
  A["Vague ticket"] --> B["Human writes acceptance criteria"]
  B --> C["Claude explores codebase & proposes plan"]
  C --> D{"Plan sound?"}
  D -->|No| B
  D -->|Yes| E["Implement middleware subagent"]
  E --> F["Write tests subagent"]
  F --> G["Run full test suite"]
  G -->|Fails| E
  G -->|Passes| H["Open PR for human review"]
  H --> I["Merge & deploy to staging"]

This plan-review checkpoint is the most valuable human moment in the whole flow. Catching a wrong approach here costs minutes; catching it after implementation costs an afternoon. The human approves the Redis sliding-window approach but adds one constraint the agent missed: the limit check must fail open, so a Redis outage degrades to allowing traffic rather than blocking every request.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

Implementation and the role of subagents

With the plan approved, the workflow fans out. One line of work implements the middleware: the counter logic, the configurable per-key limits, the 429 response with Retry-After, and the fail-open behavior the human added. A separate effort writes the tests — boundary cases at exactly the limit, just over it, the reset window, and the Redis-down path. Splitting implementation from test authoring is deliberate: tests written with fresh eyes on the spec, rather than to match the implementation, catch more real bugs.

This is where dynamic workflows earn their keep. The human did not write the middleware or the tests. They wrote the spec, reviewed the plan, and added one crucial constraint. Claude handled exploration, implementation, and test authoring, loading whatever project skills exist — say, a skill describing the team's testing conventions — automatically when relevant.

It is not flawless on the first pass. The initial test run reveals the sliding-window counter has an off-by-one at the exact boundary: a client at precisely 100 requests gets blocked when the spec said the 101st should be the first rejected. The workflow catches this itself because the test encodes the boundary, fixes the comparison, and re-runs. No human intervention needed — the eval did its job.

Verification, review, and what shipped

Before the work reaches a human reviewer, the workflow runs the full test suite and confirms every acceptance criterion: per-key limits, the 429 with Retry-After, no impact on under-limit clients, and the fail-open path. Only a green run opens the pull request. The human review that follows is now meaningful rather than exhausting — the reviewer reads a focused diff with passing tests and a clear description, checking judgment and design rather than hunting for typos.

One thing did surface in review that no test caught: the Retry-After header was returning seconds as a float in one path. A small fix, made in seconds, but a reminder that human review still catches things evals miss. What shipped was a tested, scoped, fail-open rate limiter that matched the original intent — produced in a fraction of the time a fully manual build would have taken, with the human's effort concentrated on the parts where human judgment is irreplaceable: defining success, choosing the approach, and the final design check.

What this walkthrough generalizes to

The specifics were rate limiting, but the pattern is the whole point. Every effective dynamic workflow follows the same arc: a human converts ambiguity into verifiable acceptance criteria, Claude proposes a plan the human reviews, the work fans out into implementation and verification, evals catch mechanical errors automatically, and a human does a final judgment-level review before anything irreversible happens. Master that arc and the underlying task barely matters.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Frequently asked questions

Where does the human add the most value in this flow?

Two moments: writing the acceptance criteria up front and reviewing the proposed plan before implementation. Those are the points where a small amount of judgment prevents a large amount of wasted work. The final code review matters too, but catching a wrong approach early is the highest-leverage human action in the entire workflow.

Why split implementation and test writing?

Tests written to mirror the implementation tend to pass for the wrong reasons. Authoring tests directly from the spec, with fresh attention, surfaces boundary bugs the implementation quietly got wrong — like the off-by-one at exactly the rate limit in this walkthrough. The separation makes the verification genuinely independent.

What still needs a human after all the checks pass?

Judgment that tests do not encode: design quality, subtle correctness like the float-seconds header, and whether the solution fits the system's long-term direction. Evals catch mechanical failures; humans catch taste and context. A final focused review on a green PR is fast and still worth it.

How long does a workflow like this actually take?

Far less than a manual build, but the savings come from compression, not magic. The human spends real time on the spec and reviews; Claude compresses the exploration, implementation, and test-writing that used to dominate the clock. The net is a shippable, tested feature in a fraction of the usual time.

From shipped code to answered calls

CallSphere runs this same problem-to-outcome arc on voice and chat — agents that take a messy customer request, use tools to resolve it, and complete real work end to end. See a live walkthrough at callsphere.ai.


Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.