Build a Claude Code GTM Workflow: Step by Step
A follow-along guide to building a nightly Claude Code lead-enrichment workflow: scaffold, wire CRM tools, score, guardrail, and ship safely.
The fastest way to understand a Claude Code GTM system is to build one. This post is a concrete, follow-along walkthrough: by the end you'll have a project that takes raw inbound leads, enriches and scores them, drafts outreach, and writes results back to your CRM — runnable nightly and safe to point at real data. I'll assume you're comfortable in a terminal and have a Postgres database and a CRM with an API. Everything else we build as we go.
I'm deliberately not hand-waving the boring parts. The difference between a demo and a workflow you trust on Monday morning is in the schemas, the idempotency, and the guardrails — so we'll spend real time there.
Step 1: Scaffold the project and define the contract
Start by creating a project directory and a single source-of-truth file that describes what the workflow does. In Claude Code, project memory and a clear specification matter more than clever prompting. Create a CLAUDE.md at the repo root stating the goal ("enrich and score new inbound leads nightly"), the data sources, the scoring rubric, and the hard rules (never email anyone, never write a lead with confidence below 0.7 without flagging). This file becomes shared context every run loads.
Next, define the data contract before any code. Write a JSON schema for an EnrichedLead: the fields you require (domain, industry, employee_count, icp_score, confidence, rationale) and their types. Having this contract first means every tool and every model output can be validated against one shape, and it's the anchor the rest of the build hangs on.
Step 2: Wire the CRM and warehouse as tools
Claude Code acts through tools, so the next job is exposing your systems. The cleanest path in 2026 is an MCP server per system. For the CRM, expose three narrow tools: get_new_leads(since), upsert_lead(record), and create_review_task(lead_id, reason). Keep them small and typed; a tool that does one verifiable thing is far easier for the model to use correctly than a sprawling "do everything" endpoint.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
The flow from a single new lead to a written record looks like the diagram below. Notice that the model never writes directly — it proposes a record, validation runs, and only then does an idempotent upsert touch the CRM.
flowchart TD
A["Cron triggers nightly run"] --> B["get_new_leads(since=last_run)"]
B --> C["For each lead: enrichment subagent"]
C --> D["Call enrichment MCP + read website"]
D --> E["Compute icp_score via scoring tool"]
E --> F{"Validate against EnrichedLead schema?"}
F -->|Invalid or low confidence| G["create_review_task"]
F -->|Valid| H["upsert_lead (idempotent by domain)"]
H --> I["Draft outreach & record run log"]Idempotency is the detail that lets you re-run safely. Make upsert_lead key on a stable identifier — the email domain or CRM ID — so running the workflow twice produces one record, not two. Without this, a single retry doubles your pipeline and the revenue team stops trusting the system within a week.
Step 3: Make scoring deterministic
Resist the urge to let Claude "just score" each lead in prose. Instead, write the scoring as a small function exposed as a tool: it takes the enriched fields and returns a number plus a short rationale. Maybe it weights industry match, company size band, and observed buying signals. Because it's code, it's testable and reproducible — two runs on the same inputs give the same score.
What Claude does handle is the part code can't: reading a company's homepage to infer what they actually sell, reconciling an enrichment vendor that says "50 employees" against a LinkedIn signal that says "200", and writing the one-line rationale a human will read. The model fills the schema fields that require judgment; the deterministic tool turns those fields into a score. Keep that division crisp and your scoring stays auditable.
Step 4: Add the guardrails
Now make it safe to point at production. First, a confidence gate: any lead the model isn't sure about — conflicting signals, missing domain, ambiguous industry — gets routed to create_review_task instead of an upsert. Second, a dry-run mode controlled by an environment flag, so your first real executions write to a staging table and you can diff the output before going live.
Third, and non-negotiable for GTM: nothing leaves the building automatically on day one. The workflow drafts outreach into a queue; a human approves before anything sends. You can relax this later for low-risk segments, but starting with a human in the loop is how you catch the model confidently emailing a competitor or a churned customer. A hook that blocks any external-send tool unless an APPROVED flag is set enforces this at the system level rather than relying on the prompt.
Step 5: Run it, observe it, and iterate
Trigger the first run manually over a handful of leads, not the whole list. Watch the run log: which tools were called, what each subagent decided, where confidence dipped. Claude Code's transparency here is the point — you can read the reasoning trail and find the prompt or schema gap that caused a bad score, then fix it before scaling.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Once a small batch looks right, widen the input and move the trigger to a nightly cron. Add run metadata to a table so you can answer "how many leads did we process, how many went to review, what was the average confidence" without re-running anything. Treat the first two weeks as a tuning period: adjust the scoring weights, tighten the schema, and lower the review rate as you gain confidence. The workflow that survives is the one you measured into shape, not the one you trusted blindly.
Step 6: Hand it off without it rotting
A workflow only counts if it keeps working after you stop watching. Document the contract in CLAUDE.md, pin the model tiers you chose, and write a short runbook: how to re-run a failed night, how to read the review queue, how to roll back a bad batch. Because writes are idempotent and gated, an on-call engineer who has never seen the code can safely re-run a failed job — which is the real test of whether you built a workflow or just a clever script.
Frequently asked questions
Do I need MCP servers, or can I use local scripts?
Both work. Local scripts exposed as tools are fine for systems you fully control and run on the same host. MCP servers are the better choice when you want a reusable, auth-bounded interface to an external system like a CRM, because the schema and error handling live in one place every agent shares.
How do I keep a nightly run from creating duplicate leads?
Make your write tool an upsert keyed on a stable identifier such as email domain or CRM record ID. Idempotent writes mean a retry or an overlapping run converges to a single record instead of doubling your pipeline.
Should Claude score the leads directly?
No. Compute the numeric score in a small deterministic tool so it's testable and reproducible. Let Claude fill the judgment fields — inferred industry, reconciled headcount, the rationale — and feed those into the scoring function.
How do I avoid the agent emailing the wrong people?
Keep a human in the loop for external sends at first: the workflow drafts into an approval queue, and a hook blocks any send tool unless an approval flag is set. Relax this only for low-risk segments once you've watched it behave.
Bringing agentic AI to your phone lines
The same build pattern — typed tools, idempotent writes, human approval gates — powers CallSphere's voice and chat agents, which answer every call and message, use tools mid-conversation, and book work 24/7. See a live version at callsphere.ai.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.