Skip to content
Agentic AI
Agentic AI8 min read0 views

Build a Dynamic Workflow in Claude Code: A Walkthrough

Step-by-step: build a dynamic workflow in Claude Code from setup to tools, skills, hooks, and a first end-to-end run an engineer can follow today.

Reading about dynamic workflows is one thing; standing one up that actually does useful work is another. This walkthrough takes you from an empty repository to a working Claude Code workflow that investigates an issue, proposes a fix, runs checks, and reports back — with you in control at the right moments. I will keep it concrete: every step is something you can do at a terminal, and I will explain why each piece earns its place so you can adapt the shape to your own task.

The example task is deliberately ordinary: "given a failing test, find the cause, fix it, and verify." It is a good first dynamic workflow because it exercises every mechanism you will reuse later — reading code, calling tools, loading a skill, and gating dangerous actions with a hook — without needing exotic infrastructure.

Step 1: Establish project context

The first thing Claude reads on every run is your project's persistent instructions. Create a CLAUDE.md at the repo root that states the stack, the commands to build and test, the conventions, and anything Claude would otherwise have to rediscover each session. Keep it tight and factual — this file is loaded into context every turn, so bloat here costs you on every call.

A good first version is short: the language and framework, the single command that runs the test suite, the directories that matter, and a one-line statement of how you want changes proposed (for example, "explain the root cause before editing"). You are not writing documentation; you are writing the standing orders the agent operates under. This file is the cheapest, highest-leverage thing you will create.

Step 2: Decide what the workflow needs to do

Before adding any tooling, write the goal as a single instruction you would give a competent colleague. For our example: "Run the failing test, read the failure, locate the root cause in the source, make the smallest correct fix, and re-run the test to confirm it passes. Show me the diff before applying it." The clarity of this sentence directly shapes the run, because in a dynamic workflow the instruction is the plan.

flowchart TD
  A["Give Claude the goal"] --> B["Claude runs failing test"]
  B --> C["Reads failure output"]
  C --> D["Searches source for cause"]
  D --> E{"Root cause found?"}
  E -->|No| C
  E -->|Yes| F["Proposes minimal diff"]
  F --> G["Hook gates the edit"]
  G --> H["Re-runs test, reports result"]

Notice the loop back from "root cause found?" to reading more. You are not scripting that loop — Claude generates it because the goal told it to keep investigating until it understands the failure. Your job is to make the success criterion unambiguous so the model knows when to stop.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

Step 3: Give it the right tools

Claude Code ships with the core capabilities a coding workflow needs — running shell commands, reading and editing files, searching the codebase. For our task that is enough; the test runner is just a shell command, and code navigation is built in. Resist the urge to add tools you do not need. Every tool schema in context is something the model has to consider, and a lean toolset produces more focused runs.

If your real workflow touches an external system — a ticket tracker, a database, an internal API — that is where MCP servers come in, and you add them now. But for the first build, prove the loop with built-in tools only. You will learn more from a clean run against your own code than from wiring five integrations you cannot yet debug.

Step 4: Add a skill for the tricky part

Suppose your test failures often involve a domain-specific debugging procedure — say, your project has a particular way of reproducing flaky integration tests. Encode that as a skill: a folder with a short description and a body of instructions Claude loads only when the task is relevant. The description is what Claude sees by default; the full procedure loads on demand, keeping context lean until the moment it is needed.

The discipline here is to put procedural knowledge in skills and standing facts in CLAUDE.md. A skill answers "how do I do this specific thing well"; CLAUDE.md answers "what is always true about this project." Getting this split right is most of what makes a workflow feel like it knows your codebase instead of guessing.

Step 5: Gate dangerous actions with a hook

You asked to see the diff before it applies. Enforce that with a PreToolUse hook on file edits that pauses for your approval, rather than trusting the instruction alone. Hooks run your code deterministically around lifecycle events, so they are the reliable place for "never do X without confirmation" rules. The model's good intentions are not a safety mechanism; the hook is.

Hooks are also where you add logging and policy: record every command Claude runs, block writes outside the repo, or inject a required environment variable. Start with one hook that does one thing well — gate the edit — and add more only as you discover real risks during runs. Over-hooking a workflow before you understand its behavior just makes it harder to follow.

Step 6: Run it and read the transcript

Now give Claude the goal from Step 2 and watch. A well-built first run looks like this: it runs the test, quotes the failure, searches a few files, narrows to a function, explains the cause in a sentence or two, proposes a small diff, waits at your hook, applies on approval, re-runs the test, and confirms green. If it wanders, the transcript tells you why — usually a vague goal, a missing fact in CLAUDE.md, or a tool that returned confusing output.

Iterate on the inputs, not the output. If the run was sloppy, tighten the goal sentence, add the missing fact to CLAUDE.md, or sharpen the skill description. Because the workflow is generated from context, improving context is how you improve behavior. After a few cycles you will have a workflow you trust to run on the next failing test without babysitting.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Step 7: Promote it to a repeatable command

Once the run is reliable, capture the goal as a reusable slash command or saved prompt so you and your team invoke it the same way every time. This is the moment a one-off becomes infrastructure: the workflow is now a named capability — "fix the failing test" — that anyone can run, backed by the same CLAUDE.md, skill, and hook. That repeatability, not any single run, is the payoff.

Common first-build mistakes to avoid

The mistake I see most often is over-specifying the steps. Engineers used to writing pipelines try to script each action — "first do this, then do that, then this" — which fights the whole model. A dynamic workflow wants a goal and a done condition, not a recipe; when you over-specify, you both constrain the model's ability to recover from surprises and make the prompt brittle the moment the situation differs from what you imagined. State the destination and let Claude find the route.

The second common mistake is skipping the hook because the first few runs behaved well. Good behavior on easy inputs is not safety; it is luck that has not run out yet. The run that finally tries something destructive will look exactly like the runs that did not, right up until the edit lands. Put the deterministic gate in before you need it, keep CLAUDE.md lean so it stays cheap on every turn, and resist adding tools you have not yet seen the workflow ask for.

Frequently asked questions

Do I need to write any orchestration code?

No. The whole point of a dynamic workflow is that Claude generates the orchestration at runtime. You supply context (CLAUDE.md, skills), capabilities (tools, MCP servers), and guardrails (hooks), then state the goal. The harness runs the loop. You write configuration and instructions, not a control-flow graph.

How do I keep the first run from doing something destructive?

Use a PreToolUse hook to gate edits and shell commands so they require your approval, and constrain what tools are available. Treat the model's instructions as intent and the hook as enforcement. Start permissive only on read-only actions and tighten around anything that writes or executes.

What goes in CLAUDE.md versus a skill?

CLAUDE.md holds standing facts that are always true about the project — stack, commands, conventions — and loads every turn, so keep it short. A skill holds a procedure for a specific situation and loads only when relevant, so it can be longer and more detailed. Standing facts in CLAUDE.md; how-to procedures in skills.

Bringing agentic AI to your phone lines

The same build-then-run discipline powers CallSphere, where voice and chat agents follow goals, call tools mid-conversation, and book work 24/7 instead of executing a rigid script. Watch a dynamic workflow answer a live call at callsphere.ai.


Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.