Skip to content
Agentic AI
Agentic AI8 min read0 views

Migrating a Workflow to Claude Agents Without Breaking It (Anthropic Economic Index)

Move an existing workflow onto a Claude agent without breaking it — shadow mode, staged rollout, eval baselines, human gates, and one-switch rollback.

The Anthropic Economic Index is, among other things, a map of where agentic AI is taking over real tasks. For an engineering leader, that map raises a concrete decision: there's an existing workflow — a manual process, a brittle script, a rules engine — and the question is whether to move it onto a Claude agent, and if so, how to do it without setting the building on fire. Migrations are where good intentions meet production reality. A big-bang cutover from a working process to an autonomous agent is how you end up rolling back at 2 a.m.

The safe path is incremental: shadow the agent against the current system, compare outputs, expand its autonomy in stages, and keep a fast rollback the whole way. This post is a playbook for moving an existing workflow onto a Claude agent with the risk dialed down to something a cautious team can actually approve.

Key takeaways

  • Never big-bang. Migrate in stages: shadow, then assist, then supervised autonomy, then full autonomy.
  • Shadow mode runs the agent in parallel with the current system and compares outputs without acting — zero production risk.
  • Decompose the workflow into discrete steps and migrate the cheapest, most reversible step first.
  • An eval baseline from the legacy system gives you the bar the agent must clear before promotion.
  • Keep a one-switch rollback at every stage; the old system stays warm until the agent has earned trust.

Map the workflow before you touch it

The first move is not to write a prompt — it's to write down what the current workflow actually does, step by step, including the edge cases everyone handles by reflex. Most legacy processes have undocumented branches: the special-case customer, the manual override, the "if the file is empty, skip it" rule someone added two years ago. An agent that doesn't know about those branches will confidently get them wrong.

Decompose the workflow into discrete, individually testable steps. Each step is a candidate for migration, and you migrate them one at a time — starting with whichever is cheapest to get wrong and easiest to reverse. Classification and drafting steps are usually safe early targets; anything that moves money or sends customer-facing messages comes last, behind a human gate.

This decomposition also gives you the natural unit for evals: each step gets its own golden set drawn from real historical runs of the legacy system, which conveniently already produced thousands of labeled examples.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

The staged rollout, visualized

flowchart TD
  A["Legacy workflow in production"] --> B["Agent runs in shadow, no actions"]
  B --> C{"Outputs match legacy & pass evals?"}
  C -->|No| D["Fix prompt/tools, stay in shadow"]
  C -->|Yes| E["Assist mode: agent suggests, human acts"]
  E --> F{"Quality & trust holding?"}
  F -->|No| G["Roll back one stage"]
  F -->|Yes| H["Supervised autonomy with approval gates"]
  H --> I["Full autonomy, legacy on standby"]
  D --> B
  G --> B

Shadow mode: the cheapest way to build confidence

Shadow mode is the single most valuable migration technique. You run the Claude agent on the same live inputs as the production system, capture what it would have done, and compare against what the legacy system actually did — but the agent's actions are never executed. There's zero production risk, and within days you have a real, distribution-accurate measure of where the agent agrees with the incumbent and where it diverges.

The divergences are gold. Some are agent bugs you fix; some are cases where the agent is actually right and the legacy system was quietly wrong. Either way you learn before any customer is affected. Here's the shape of a shadow harness wrapping an Agent SDK run.

async function shadowRun(input) {
  const legacy = await legacySystem.run(input);   // the action of record
  const agent  = await claudeAgent.run(input, { execute: false });
  await log.compare({
    input,
    legacyResult: legacy,
    agentProposed: agent.proposedAction,
    matched: deepEqual(legacy, agent.proposedAction)
  });
  return legacy; // legacy still drives production
}

Run this for a few weeks, track the match rate over time, and only promote to assist mode once the agreement rate clears the bar your evals define. The agent has to earn each step of autonomy with evidence.

Promotion, gates, and rollback

After shadow comes assist mode: the agent's proposal is shown to a human who accepts, edits, or rejects it. This generates a second wave of labeled data — human corrections — that both improves the agent and proves its trustworthiness. From there you move to supervised autonomy, where the agent acts on low-risk cases automatically but routes high-risk or low-confidence cases to a human gate. Full autonomy comes last, and even then the legacy system stays on standby.

The non-negotiable companion to every promotion is a fast rollback. A single config flag should be able to drop the agent back to the previous stage — or all the way to the legacy system — without a deploy. Keep the old system warm and exercised, not decommissioned, until the agent has weeks of clean production behavior behind it.

It helps to define promotion criteria as numbers, agreed in advance, so the decision to advance a stage isn't a judgment call made under pressure. For shadow-to-assist, that might be a match rate above a set threshold across a minimum number of real cases. For assist-to-supervised, a human acceptance rate that holds steady over a couple of weeks. For supervised-to-full, a stretch of clean operation with escalations trending down and zero customer-impacting incidents. Writing these thresholds down before you start does two things: it prevents optimism from rushing a promotion, and it gives stakeholders a concrete, auditable reason to trust each step. Migration becomes a sequence of small, evidence-backed decisions rather than one terrifying leap of faith.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Common pitfalls

  • Big-bang cutover. Replacing a working system with an autonomous agent overnight gives you no comparison data and no graceful failure path.
  • Skipping shadow mode. Without it you discover divergences in production, on real customers, instead of in logs.
  • Ignoring undocumented edge cases. The legacy system's reflexive special-case handling is exactly what the agent will miss. Mine it before migrating.
  • Decommissioning the old system too early. Once you can't roll back, every agent bug is an incident. Keep the incumbent warm.
  • No baseline. If you didn't measure the legacy system's quality, you can't prove the agent is at least as good.

Migrate a workflow in six steps

  1. Document the current workflow end to end, including undocumented edge cases and overrides.
  2. Decompose it into discrete steps and rank them by reversibility and risk.
  3. Build an eval baseline from historical legacy runs for each step.
  4. Run the agent in shadow mode and track the match rate until it clears the bar.
  5. Promote step by step — assist, then supervised, then full autonomy — with a human gate on risky actions.
  6. Keep a one-switch rollback and the legacy system warm until the agent has earned full trust.

What each rollout stage gives you

StageAgent acts?RiskPromote when
ShadowNoNoneMatch rate clears eval bar
AssistHuman acts on suggestionLowHuman acceptance rate high
SupervisedYes, gated on riskMediumFew escalations, no incidents
FullYesContained by rollbackWeeks of clean behavior

Frequently asked questions

What is shadow mode in an agent migration?

Shadow mode runs the Claude agent on the same live inputs as the production system and records what it would have done, without executing any action. It gives you a real, risk-free comparison against the legacy system so you can measure divergence before the agent touches anything.

How do I decide which workflow step to migrate first?

Start with the step that is cheapest to get wrong and easiest to reverse — usually a classification or drafting step. Save anything that moves money or contacts customers for last, behind a human approval gate, once the agent has proven itself on lower-risk steps.

When is it safe to give the agent full autonomy?

After it has cleared shadow mode, performed well in assist and supervised stages, and accumulated weeks of clean production behavior — with a one-switch rollback and the legacy system still on standby. Autonomy is earned with evidence, not granted on day one.

Do I really need to keep the old system running?

Yes, until the agent has a long, clean track record. Keeping the incumbent warm and exercised is what makes rollback a non-event instead of an incident, and it's cheap insurance against the bug you didn't anticipate.

Migrate your phone lines to agentic AI, safely

The shadow-mode and staged-rollout discipline that de-risks a workflow migration is exactly how you move customer calls onto AI without gambling on day one. CallSphere brings these agentic-AI patterns to voice and chat — assistants you roll out in stages, with human gates and instant rollback. See it live at callsphere.ai.


Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.