Migrating an existing workflow onto Claude Code safely
A staged, reversible playbook for moving a legacy workflow onto Claude Code: shadow runs, canary cutover, gradual ramp, and instant rollback.
Most teams don't start with a blank page. They already have a workflow that works — a brittle pile of scripts, a manual runbook, a scheduled job that someone babysits — and the question isn't whether an agentic approach could do it better, it's how to get there without breaking the thing that's currently keeping the business running. A big-bang rewrite where you flip from the old system to a Claude Code workflow overnight is the most common way these migrations fail. The safe path is staged, reversible, and boring, and that's exactly what makes it work.
This post lays out a migration playbook that treats the existing workflow as the source of truth you're trying to match before you're allowed to replace it. The whole strategy is built so that at every step, you can prove the new approach is at least as good before you trust it, and back out instantly if it isn't.
Start by writing down what the old workflow actually does
The most under-appreciated migration risk is that nobody fully knows what the current workflow does. It has accreted edge-case handling over years — a special rule for one customer, a retry that exists because of an outage in 2024, a silent assumption about input format. If you migrate only the documented behavior, you'll faithfully reproduce the happy path and quietly drop the hard-won corner cases, and those corner cases are where the production incidents live.
So the first step is archaeology. Read the existing code or runbook end to end and write down every behavior, including the ones that look like accidents — they're often load-bearing. This artifact does double duty: it's the spec for the new workflow, and it's the basis for the eval suite you'll use to prove equivalence. Interestingly, Claude Code itself is useful here; pointed at a legacy script, it can trace and summarize what the code actually does, surfacing behaviors the team forgot were there.
Resist the urge to "improve" behavior during this phase. The goal of the migration is to match the existing workflow, not to redesign it. Mixing migration with redesign means that when the output differs, you can't tell whether it's a migration bug or an intended improvement — and that ambiguity is what turns a clean cutover into a week of confused debugging.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Run the new workflow in shadow before it touches anything
Once you have a candidate Claude Code workflow, the next stage is to run it in shadow mode: it processes real inputs alongside the old system, but its outputs are recorded and compared rather than acted upon. The old workflow stays in charge; the new one is auditioning. This is the single highest-value safety technique in the whole migration, because it lets you measure real-world equivalence on real traffic with zero blast radius.
Shadow running surfaces the disagreements that hand-testing never would. You diff the two systems' outputs across hundreds or thousands of real cases and investigate every divergence. Some divergences will be the new workflow being wrong — fix those. Some will be the new workflow being right where the old one was subtly broken — document those as known, intended differences. Either way, you're building justified confidence instead of hoping.
flowchart TD
A["Document old workflow behavior"] --> B["Build Claude Code candidate"]
B --> C["Shadow run on real inputs"]
C --> D{"Outputs match old system?"}
D -->|No| E["Investigate divergence & fix"] --> C
D -->|Yes| F["Canary: route small % live traffic"]
F --> G{"Metrics & evals healthy?"}
G -->|No| H["Rollback to old workflow"]
G -->|Yes| I["Ramp traffic gradually to 100%"]
The diagram is the spine of the whole playbook: document, build, shadow until outputs match, canary a small slice of live traffic, watch the metrics, and either ramp up or roll back. Notice that rollback is a first-class path at every live stage, not an afterthought.
Cut over gradually, never all at once
When shadow runs show the new workflow matching the old one within your tolerance, you move to a canary: route a small fraction of real, acted-upon traffic to the Claude Code workflow while the rest stays on the old system. Start small — a few percent — and keep both the eval metrics and the real business metrics under close watch. The canary is your first taste of the new workflow actually affecting the world, so the stakes go up and the population you're risking should stay deliberately small.
If the canary stays healthy, ramp gradually — ten percent, then half, then full — pausing at each step long enough to be sure the metrics hold. Gradual ramp matters because some failure modes only appear at volume or on rare inputs that a small slice won't surface. Each ramp step is a checkpoint where you confirm health before increasing exposure, which means the worst case at any moment is a small, contained problem rather than a company-wide outage.
Keep the old workflow alive and runnable through the entire ramp, even after you hit a hundred percent, for a deliberate cooling-off period. The cost of keeping a deprecated system warm for a few weeks is trivial next to the cost of discovering a subtle problem after you've deleted your only fallback.
Make rollback instant and boring
The feature that makes the whole staged approach safe is that rollback is fast, tested, and unremarkable. If a single switch — a config flag, a routing rule — can send all traffic back to the old workflow in seconds, then every forward step is low-risk because it's reversible. Teams that treat rollback as an emergency scramble end up hesitating to cut over at all, or worse, riding out a degradation because backing out feels disruptive.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Test the rollback before you need it. Actually flip the switch back during the canary phase and confirm it works cleanly, so that when something goes wrong at 50 percent traffic, reverting is a known-good, muscle-memory move rather than an experiment performed under pressure. The combination of shadow validation, gradual ramp, and instant rollback turns a scary migration into a sequence of small, reversible, well-instrumented steps — which is the only kind of migration that reliably succeeds.
Frequently asked questions
Why not just rewrite the workflow and switch over at once?
Because a big-bang cutover gives you no way to prove the new workflow matches the old one's hard-won edge cases until it's already live, and no graceful way to back out. Staged migration — document, shadow, canary, ramp — lets you validate equivalence on real traffic with the old system still in charge.
What is shadow running and why does it matter so much?
Shadow running means the new Claude Code workflow processes real inputs alongside the old system, but its outputs are recorded and compared rather than acted upon. It surfaces real-world disagreements with zero blast radius, letting you fix genuine bugs and document intended improvements before any live exposure.
Should I improve the workflow's behavior during migration?
No. Match the existing behavior first, then improve later as a separate change. Mixing migration with redesign makes every output difference ambiguous — you can't tell a migration bug from an intended improvement — which turns a clean cutover into confused debugging.
How fast should the traffic ramp be?
Slow enough that each step is a real checkpoint. Start with a few percent, confirm both eval and business metrics hold, then ramp to ten, fifty, and a hundred percent with pauses between. Some failures only appear at volume, so each step trades a little speed for a lot of safety.
Bringing agentic AI to your phone lines
CallSphere migrated its own call handling onto agentic AI the same careful way — shadow, canary, ramp — and now its voice and chat agents answer every call and message, use tools mid-conversation, and book work around the clock. See it live at callsphere.ai.
Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.