---
title: "Migrating a Clinical Abstraction Workflow Onto Claude"
description: "Move an existing clinical-abstraction workflow onto a Claude agent safely with shadow mode, phased autonomy, and fast rollback in 2026."
canonical: https://callsphere.ai/blog/migrating-a-clinical-abstraction-workflow-onto-claude
category: "Agentic AI"
tags: ["agentic ai", "claude", "migration", "rollout", "shadow mode", "human in the loop", "clinical nlp"]
author: "CallSphere Team"
published: 2026-04-08T12:32:44.000Z
updated: 2026-06-06T21:47:43.749Z
---

# Migrating a Clinical Abstraction Workflow Onto Claude

> Move an existing clinical-abstraction workflow onto a Claude agent safely with shadow mode, phased autonomy, and fast rollback in 2026.

Every healthcare organization already abstracts charts. Today it's done by trained human abstractors, maybe assisted by rule-based software or an older NLP pipeline. So the realistic question in 2026 isn't "how do I build a Claude abstraction agent from scratch" — it's "how do I move my existing, working abstraction process onto a Claude agent without breaking it, without a compliance incident, and without the team losing trust on day one." Migration, not greenfield, is where most of these projects actually live, and it has its own playbook.

The temptation is to flip a switch: turn off the old process, turn on the agent, declare victory. That's how migrations fail. A safer migration treats the agent as a candidate that has to earn the work, running alongside the incumbent until the data says it's ready, with a clean path back if it isn't. This post walks through that staged approach.

## Map and freeze the current process first

Before introducing the agent, document exactly what exists. What fields are abstracted, from which source documents, under which guideline version, with what edge-case rules the human abstractors apply but never wrote down. That tacit knowledge — "if the discharge summary and the problem list disagree, we trust the discharge summary" — is the hardest part to migrate and the most likely to be lost. Interview the abstractors and capture these rules explicitly, because they become both your agent's instructions and your eval set's adjudication logic.

This mapping also gives you a baseline. You can't claim the agent is as good as the current process if you never measured the current process. Pull a sample, have it independently re-abstracted, and quantify the existing error rate. The agent's target isn't perfection; it's parity-or-better against a baseline you've actually measured.

## Run in shadow mode before it touches anything

The first live phase is shadow mode: the agent abstracts the same charts the humans are working, but its output goes nowhere near production. It writes to a comparison store, not the registry. Now you can measure agreement on real, current data — not a curated test set — across the full messy distribution of charts your facility actually produces. Shadow mode is where you discover the failure modes your gold set never imagined, with zero risk, because nothing the agent emits is used.

```mermaid
flowchart TD
  A["Existing human workflow"] --> B["Shadow: agent abstracts in parallel"]
  B --> C["Compare agent vs human"]
  C --> D{"Agreement >= baseline?"}
  D -->|No| E["Fix agent; stay in shadow"]
  E --> B
  D -->|Yes| F["Assist mode: agent drafts, human confirms"]
  F --> G{"Stable over time?"}
  G -->|Yes| H["Autonomous on low-risk fields"]
```

Set an explicit exit criterion for shadow mode: the agent must hit your agreement target against humans, on the consequential fields, sustained over a meaningful volume of charts — not a lucky week. Where it disagrees, investigate every case. Some disagreements are agent errors; some reveal that a human abstractor was wrong or inconsistent. Both findings are valuable, and shadow mode is the only phase where you can surface them without consequences.

## Move to assist mode, then narrow autonomy

When shadow numbers clear the bar, promote the agent to assist mode: it drafts the abstraction, a human reviews and confirms before anything is finalized. This is where the agent starts saving real time, because reviewing a good draft is far faster than abstracting from scratch — and the human stays firmly in the loop on every record. Track how often reviewers accept the draft unchanged; that acceptance rate is your readiness signal for the next step.

Only after assist mode proves stable should you grant any autonomy, and grant it narrowly. Let the agent finalize the low-risk, high-agreement fields on its own while still routing the consequential or uncertain ones to a human. This field-by-field, risk-graded autonomy is far safer than an all-or-nothing cutover, and it matches automation to the cost of being wrong. The principal diagnosis can stay human-confirmed long after routine fields go autonomous.

## Decide what "good enough to promote" means up front

The single most common migration mistake is promoting on gut feel. Someone watches a few drafts, decides the agent "looks ready," and advances it — and now you're arguing about regressions with no agreed bar. Before shadow mode even starts, write down the promotion criteria for each phase: the exact agreement metric against humans, on which fields, over what minimum volume, sustained for how long. Make those numbers a contract everyone — engineering, the abstraction team, and compliance — signs off on, so promotion becomes a calm check against a threshold rather than a negotiation.

Tie the criteria to the consequence of the field. A field that feeds a regulatory submission should demand a higher, longer-sustained agreement bar than an internal annotation. And include a denominator for edge cases: the agent must hit the bar not just on the easy majority but specifically on the hard slices — missing fields, conflicting documentation — because those are where a clinical abstractor earns their keep and where a premature promotion does the most damage.

## Keep the old path warm and the rollback fast

Throughout migration, the incumbent process stays available. Don't dismantle the human abstraction capacity the moment the agent looks good; keep enough that you can revert to fully human abstraction quickly if the agent regresses — after a model update, a guideline change, or an unforeseen distribution shift. A migration without a rollback plan is a bet, and you don't bet a regulatory submission. Define in advance what triggers a rollback: an agreement metric dropping below a floor, a spike in reviewer corrections, or any incident on a critical field.

Make rollback a routine operation, not a fire drill. Because you ran in parallel and kept the comparison store, you always know what the agent is doing relative to the baseline. If a metric crosses the line, you fall back to assist or shadow mode for the affected fields, fix the agent under the eval loop, and re-promote once it clears the bar again. Reversibility is what lets you move fast safely.

## Sequence the technical cutover field by field

Even within a single workflow, don't migrate all fields at once. Order them by risk and by how cleanly the agent matched humans in shadow mode, and promote them through assist and autonomy on independent schedules. A high-agreement administrative field can be running autonomously while the principal diagnosis is still in assist mode and a thorny comorbidity field is back in shadow. This staggering keeps each promotion small and reversible, and it means a problem with one field never forces you to roll the whole workflow back.

Keep a per-field status board so everyone — engineers, abstractors, and compliance — can see exactly which fields are in which phase and what each one's current agreement metric is. Ambiguity about what is and isn't automated is its own risk: a reviewer who assumes a field is human-confirmed when it's actually autonomous, or vice versa, makes mistakes the technology didn't cause. Explicit, visible state per field is what lets a phased migration stay legible as it grows.

## Communicate the change to the people doing the work

A migration is as much organizational as technical. Abstractors understandably worry the agent is there to replace them; framing matters. In practice the durable model is the agent doing the first-pass labor and the abstractor doing review, adjudication, and the hard cases — which raises throughput without removing the human judgment regulators and clinicians expect. Involve the abstractors in building the gold set and reviewing disagreements; they become the experts who make the agent good, and the migration goes far smoother when they own it rather than fear it.

## Frequently asked questions

### How long should shadow mode run?

Long enough to see the real distribution of charts and to hit your agreement target sustainably — typically until the agent matches or beats the measured human baseline on consequential fields across a representative volume, not just a good week. Resist the urge to cut it short because early numbers look promising.

### Do I need to keep human abstractors after migrating?

Yes, in a changed role. Even with autonomy on routine fields, you want humans for review of consequential fields, adjudication of edge cases, and the ability to roll back to full human abstraction. The role shifts from first-pass labor to oversight and hard cases rather than disappearing.

### What's the safest first field to automate?

A high-volume, low-consequence field where the agent showed near-perfect agreement in shadow and assist modes — something easily corrected downstream if wrong. Earn trust on the easy, safe fields before touching anything that feeds a regulatory or clinical decision.

### How do I handle a guideline change after migrating?

Treat it as a change that re-enters the eval loop. Update the agent's instructions and gold labels for the new guideline, re-run shadow or assist mode for affected fields, and only restore autonomy once the metrics clear the bar again under the new rules.

## Migrate your phone lines the same way

CallSphere rolls **voice and chat** agents into live operations with the same caution — shadow runs, human-in-the-loop, and fast rollback — so the handoff never drops a call. See how at [callsphere.ai](https://callsphere.ai).

---

*Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.*

---

Source: https://callsphere.ai/blog/migrating-a-clinical-abstraction-workflow-onto-claude
