Build a Claude Cowork Sales Agent: Step-by-Step Guide
A buildable walkthrough: scaffold skills, wire a read-only CRM connector, run sub-agents, add a write gate, and roll out writes for a 4,000-account book.
The architecture post in this series stayed at the level of boxes and arrows. This one gets your hands dirty. If you are an engineer who has been handed a 4,000-account sales book and told to "make the AI work it," here is a buildable sequence: what to scaffold first, what to wire next, and how to get to a safe daily run without setting fire to anyone's inbox. I'll assume you have Claude Cowork with a plugin you can edit, and a CRM you can reach over an API.
The guiding principle throughout is to build the safety rails before the firepower. It is genuinely easy to get Claude drafting outreach on day one — and genuinely dangerous to let it send anything before the gate, dedup, and review queue exist. So we build inert, then add the ability to act last.
Step 1: Scaffold the plugin and its skills
Start by creating the plugin's skill folders. A skill in Claude Cowork is a folder containing a SKILL file of instructions plus any scripts or reference files Claude loads when the work is relevant. For a sales book you want three skills at minimum: an account-research skill (how to assemble a fresh picture of one account), a prioritization skill (how to score and segment the book), and a drafting skill (how to write a recommended next action and message). Keep each SKILL file short and imperative — a checklist, not an essay.
Resist the urge to put everything in one giant skill. The reason skills are folders that load on demand is so the model only ever reads the instructions relevant to the task in front of it. A bloated single skill defeats that and pollutes every sub-agent's context. Write the three skills as if three different specialists wrote them, because effectively three different sub-agents will use them.
Step 2: Wire the CRM connector read-only first
Connect your CRM as an MCP connector, but expose read operations only to begin with. You want list_accounts, get_account, and get_recent_activity available before any write tool exists. This lets you build and test the entire research-and-recommend pipeline while the system is physically incapable of changing anything. Run your prioritization skill against the live book and inspect the scores. If the ranking looks wrong, you fix it now — cheaply, with nothing at stake.
flowchart TD
A["Engineer: run daily job"] --> B["Orchestrator reads book (read-only MCP)"]
B --> C["Score & pick working set"]
C --> D["Spawn N research sub-agents"]
D --> E["Each: get_account + get_recent_activity"]
E --> F["Drafting skill produces recommendation"]
F --> G{"Write gate checks"}
G -->|Fail| H["Review queue"]
G -->|Pass| I["Enable write tools later"]The diagram shows the order of construction as much as the order of execution: everything left of the write gate works with read-only access. You can run the full loop end to end and produce drafts into a file or a review queue long before you grant a single write scope. That is the safe path to a 4,000-account book.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Step 3: Implement the orchestrator's working-set selection
The orchestrator should never iterate over all 4,000 accounts with expensive per-account reasoning. Implement a cheap scoring pass first. Pull the lightweight fields for the whole book in one or two paginated calls, compute a score per account from days-since-touch, recent inbound signals, and stage, and take the top slice — start with 100 to 200. Hand only that slice to sub-agents. Everything else waits; its score will rise on its own as time passes or a new signal lands.
Concretely, the orchestrator's output at this step is just a list of account IDs and a one-line reason each was chosen ("45 days no touch + inbound page visit"). Log those reasons. They are your debugging surface: when a stakeholder asks why account X got worked and account Y didn't, you have an answer, and when the selection looks off you can tune the score weights without touching anything downstream.
Step 4: Run research sub-agents with a tight per-account brief
For each account in the working set, spawn a sub-agent whose entire brief is one account. Its instructions: read the freshest record now (do not trust anything the orchestrator handed you beyond the ID), assemble recent activity, apply the research skill, and produce a structured recommendation — recommended action, rationale, and a draft message if outreach is warranted. The sub-agent never sends and never writes; it returns a structured object to the orchestrator.
Cap concurrency to something modest, like 15 to 25 in flight. The model is rarely your bottleneck at this scale; your CRM and enrichment API rate limits are. Batch the working set so you respect those limits, and have each sub-agent fetch its own data at start time so its view is current even if the run takes a while. If a sub-agent errors, retry once, then emit a failure record rather than crashing the batch.
Step 5: Build the write gate before any write tool exists
Now implement the deterministic gate that every recommendation must pass. Write it as plain code, not a model call: reject if a live human email thread exists on the account; reject if the draft references a fact not present in the record; reject if another in-flight recommendation targets the same contact; reject if the action contradicts the account stage. Passing recommendations go to an "approved" list; failing ones go to a review queue with the failing rule attached.
At this point run the whole pipeline daily and read the output by hand for a week. You will catch the model's worst habits — overclaiming, stale references, near-duplicate outreach — entirely on paper, with zero real-world consequences. Every miss you find becomes a new gate rule. This unglamorous week is what earns you the right to let the system act.
Step 6: Grant write scopes and go live, narrowly
Only now do you add write tools to the MCP connector — and add them narrowly. Start with the lowest-stakes write: logging an activity note or updating a task, not sending email. Let the system run for a few days writing only notes and tasks, with the gate enforced, and confirm the CRM stays clean. Then add the ability to draft-into-the-rep's-outbox (not auto-send) so a human still clicks send. Auto-send, if you ever enable it, comes last and only on the segments you trust most.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
This staged rollout is the difference between a tool your sales team trusts and one they quietly disable. Each stage is reversible, observable, and scoped. By the time the system is sending anything autonomously, you have weeks of evidence that its judgment and your gate are sound across the real 4,000-account book.
Frequently asked questions
How long does it take to get a safe daily run working?
The read-only pipeline — scaffold, connector, orchestrator, sub-agents, gate — is a few days of focused work. The real time goes into the week of reading output by hand before granting write scopes. Don't shortcut that week.
What's the smallest viable first version?
Read-only CRM access, a single prioritization skill, one research sub-agent type, and a gate that writes recommendations to a file. That alone delivers a ranked daily action list with rationale and drafts, with zero risk.
Should the orchestrator and sub-agents use the same model?
No. Use a faster, cheaper model for the orchestrator's whole-book scoring pass and a more capable model for the per-account research and drafting where judgment matters. Matching model to task is most of your cost control.
How do I test without touching real prospects?
Keep the system read-only and route every recommendation to a review queue until you trust it. Never validate sends against real prospect addresses — draft-into-outbox so a human clicks send during rollout.
Bringing agentic AI to your phone lines
CallSphere builds on this same staged, gate-first approach — but for voice and chat, with agents that answer every call and message, pull live data mid-conversation, and book work 24/7. See the implementation in action at callsphere.ai.
Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.