Migrating a Finance Workflow to Claude Cowork Safely
A phased plan to move an existing finance workflow onto Claude Cowork plugins: shadow mode, parallel runs, scoped rollout, and a real rollback path.
Most finance teams do not start from a blank page. They have a working month-end close, a variance process, an AP routine — built over years out of spreadsheets, macros, and tribal knowledge. Moving that onto a Claude Cowork plugin is not a greenfield build; it is a migration, and migrations of money-touching workflows go wrong in predictable ways when teams rush. The instinct to flip from the old process to the agent in one cutover is the single most dangerous choice you can make, because the first time the agent disagrees with reality, it does so in production on a real close.
The safe path is phased and boring: shadow the agent against the existing process, run them in parallel and reconcile the differences, roll out to a narrow scope first, and keep an instant rollback for as long as it takes to earn trust. This post lays out that migration plan in detail, with the checkpoints that tell you it is safe to advance and the rollback triggers that tell you to stop.
Key takeaways
- Never cut over a money-touching workflow in one step; migrate in phases with a rollback path at each stage.
- Start in shadow mode — the agent runs and produces output, but the human process remains the source of truth and nothing the agent says is acted on.
- Move to parallel runs where you reconcile the agent's output against the trusted process and investigate every divergence.
- Roll out by scope: one entity or one low-risk sub-task first, expanding only as the eval pass rate and parallel-run agreement hold.
- Keep the old workflow runnable and the rollback one command away until the agent has earned several clean closes.
Map the existing workflow before you automate it
The first phase has nothing to do with Claude. You cannot safely migrate a process you have not made explicit, and finance workflows are full of undocumented judgment — the analyst who knows that entity 12's intercompany always lags a day, the rule that a particular vendor's credits post to a specific sub-account. Write the workflow down as discrete steps with inputs, outputs, the tools each step touches, and the decision points where human judgment enters. This map becomes both your plugin's design and your eval's golden cases.
Pay special attention to the irreversible steps and the judgment steps. Irreversible steps (posting an entry, initiating a payment) are where you will keep humans in the loop longest. Judgment steps are where the agent will most often diverge from the old process, and you want to know in advance which divergences are the agent being wrong versus the agent surfacing something the old process quietly got wrong.
flowchart TD
A["Map existing workflow"] --> B["Phase 1: Shadow mode"]
B --> C{"Agent output sane?"}
C -->|No| A
C -->|Yes| D["Phase 2: Parallel runs"]
D --> E{"Outputs reconcile?"}
E -->|Divergence| F["Investigate: agent or process wrong?"]
F --> D
E -->|Agreement holds| G["Phase 3: Scoped rollout"]
G --> H{"Eval & agreement hold?"}
H -->|No| I["Rollback to prior process"]
H -->|Yes| J["Expand scope"]Phase one: shadow mode
In shadow mode the plugin runs on real inputs and produces real output, but it is wired to do nothing. No journal entries, no payments, no writes — the connectors are read-only and the agent's conclusions are logged for review, not acted on. The trusted human process remains the source of truth and the close proceeds exactly as it always has. Shadow mode is cheap insurance: it lets you watch the agent's behavior on real data, accumulate transcripts, and catch the obvious failure modes before any consequence is possible.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
The exit criterion for shadow mode is qualitative but firm: the agent's outputs should be sane and its tool-call traces clean on a representative set of real closes. If it is looping, calling wrong tools, or producing numbers that are wildly off, you stay in shadow and fix the plugin. You do not advance to touch anything until shadow output is consistently reasonable.
Phase two: parallel runs and reconciliation
Once shadow output looks good, run the agent and the human process in parallel on the same period and reconcile the two outputs line by line. This is the phase where you learn the most. Every divergence is a signal: either the agent is wrong (a bug to fix, a new eval case to add) or the agent is right and the old process had a latent error (a genuine win, and a reason to trust the agent more). Resist the urge to assume the human process is always correct — part of the value of the migration is finding where it was not.
Set a quantitative bar for advancing: the agent and the trusted process must agree on the material numbers across several consecutive closes, with every divergence explained and resolved. Track this as an agreement rate alongside your eval pass rate. When both hold steady at a high level, you have evidence — not hope — that the agent is ready to do real work.
Phase three: scoped rollout with rollback
Do not turn the agent loose on the entire close at once. Pick the narrowest valuable scope — one low-risk entity, or one sub-task like classifying transactions while humans still post entries — and let the agent actually do that work in production. Keep the irreversible steps behind human approval even now. Expand scope one increment at a time, advancing only when the eval pass rate and parallel-run agreement continue to hold at the new scope.
Throughout, the rollback path must be real and fast. "Rollback" means the old workflow is still runnable and switching back is a single, documented action, not a weekend of reconstruction. Define explicit rollback triggers in advance: a divergence above a threshold, a guardrail violation, an eval regression. The snippet below captures the kind of config-level kill switch worth wiring in so a rollback is one flag flip:
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
{
"plugin": "month_end_close",
"mode": "scoped_rollout", // shadow | parallel | scoped_rollout | full
"scope_entities": ["12"],
"write_enabled": false, // irreversible steps still human-gated
"rollback_to": "manual_process_v3",
"rollback_triggers": { "divergence_pct": 0.5, "guardrail_violation": true }
}Common pitfalls
- Big-bang cutover. Flipping the whole close to the agent at once means the first failure happens in production. Phase it.
- Assuming the old process is ground truth. Some divergences are the agent catching a real error. Investigate every difference rather than auto-trusting the human path.
- No real rollback. If switching back takes days, you do not have a rollback — you have a hope. Keep the old workflow runnable and the switch one action away.
- Skipping the workflow map. Automating an undocumented process bakes in its hidden judgment as guesswork. Make every step and decision explicit first.
- Granting write access in shadow mode. Shadow means do-nothing. Read-only connectors only until you have earned the right to act.
Migrate your workflow in five steps
- Document the existing workflow as explicit steps with inputs, outputs, tools, and the human judgment points, and turn it into golden eval cases.
- Run the plugin in shadow mode with read-only connectors, acting on nothing, until its output and tool traces are consistently sound.
- Run agent and human process in parallel, reconcile every divergence, and require sustained agreement across several closes before advancing.
- Roll out to one narrow, low-risk scope in production with irreversible steps still human-gated, expanding only as evals and agreement hold.
- Keep the old workflow runnable with documented rollback triggers and a one-action switch back until the agent has earned several clean closes.
| Phase | Agent can act? | Advance criterion |
|---|---|---|
| Shadow mode | No — read-only, logs only | Sane output & clean traces |
| Parallel runs | No — compared, not used | Sustained agreement on numbers |
| Scoped rollout | Yes — narrow scope, gated writes | Evals & agreement hold at scope |
| Full rollout | Yes — with standing human gates | Multiple clean closes |
Frequently asked questions
What is shadow mode in a finance plugin migration?
Shadow mode is the first migration phase where the Claude Cowork plugin runs on real inputs and produces real output, but is wired to do nothing — read-only connectors, no writes, no payments — while the human process stays the source of truth. It lets you observe the agent's behavior and catch failures on real data before any action has consequences.
How long should I run in parallel before letting the agent act?
Long enough to see sustained agreement between the agent and the trusted process across several consecutive closes, with every divergence investigated and resolved. There is no fixed number of days; the signal is a stable, high agreement rate on material numbers plus a passing eval set, not the calendar.
What makes a rollback path real instead of theoretical?
A real rollback keeps the previous workflow fully runnable and makes switching back a single documented action — flipping a mode flag, re-enabling the manual process — rather than a reconstruction effort. Pair it with explicit triggers, such as a divergence threshold or a guardrail violation, so the decision to roll back is defined in advance, not improvised under pressure.
Should I migrate the whole close at once if shadow mode looks perfect?
No. Even with flawless shadow output, expand by scope — one entity or one sub-task at a time — so any surprise surfaces in a contained part of the close rather than across everything. Perfect shadow behavior is necessary but not sufficient evidence to skip scoped rollout.
Bringing agentic AI to your phone lines
CallSphere rolls out voice and chat agents the same careful way — shadowing real conversations and expanding scope as trust is earned — so the agent answers every call and books work without risking the business on a single cutover. See it live at callsphere.ai.
Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.