Claude Cowork Patterns for Prompts, Tools, and Context
Reusable patterns for a Claude Cowork sales book: single-job skills, loud idempotent tools, budgeted context, structured handoffs, and evals as regression tests.
Once your Claude Cowork sales system runs end to end, the next problem is keeping it maintainable. A 4,000-account book is not a one-shot project; it is software you will edit weekly for a year. The teams that succeed treat their skills, tool schemas, and context assembly as a real codebase with patterns and conventions — not as a pile of clever prompts. This post collects the reusable patterns I keep reaching for, the ones that survive contact with a book that keeps changing under you.
I'll organize these around three surfaces you control: how you write the prompts and skills, how you shape the tools the model sees, and how you assemble context for each sub-agent. Get these three right and most of the system's reliability follows. Get them wrong and you'll be debugging the same hallucination three different ways forever.
Pattern 1: One skill, one job, one verb
The strongest structural pattern is ruthless single-responsibility for skills. Each skill should do one thing and be nameable with a verb: score-account, research-account, draft-outreach, check-hygiene. When a skill starts needing the word "and" to describe it, split it. This matters because skills load into context on demand, and a skill that does five things drags five things' worth of instructions into every sub-agent that needs only one of them.
Within a skill, write imperatively and lead with the output contract. The first lines should state exactly what structured object the skill must produce — fields, types, and what each means. Models follow an explicit output contract far more reliably than a buried one, and a sharp contract is also what lets your downstream code parse results deterministically. End the skill with the failure behavior: what to emit when the data is missing rather than inventing it.
Pattern 2: Tools that fail loud and act once
Tool schemas are the model's interface to reality, and two properties make them safe at scale. First, tools should fail loud: return a clear, structured error the model can read and reason about, never a silent empty result that looks like "no data." A model that sees {error: "rate_limited, retry_after: 30"} can wait; a model that sees [] assumes the account has no activity and acts on a lie.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
flowchart TD
A["Sub-agent needs data"] --> B["Call tool with idempotency key"]
B --> C{"Tool result?"}
C -->|Structured data| D["Apply skill, build recommendation"]
C -->|Loud error| E{"Retryable?"}
E -->|Yes| F["Wait + retry once"] --> B
E -->|No| G["Emit failure record"]
D --> H["Return structured object"]Second, write tools should act once. Every state-changing tool call carries an idempotency key derived from the account and action, so a retry after a timeout cannot create a duplicate note or send a second email. The diagram above shows the shape: the key rides along on the call, retryable errors loop once, and non-retryable errors become a clean failure record instead of a guess. This is the single most valuable pattern for sleeping soundly while an agent works a large book.
Pattern 3: Context as a budget, not a dump
The reusable insight on context is to treat the window as a budget you allocate, not a bucket you fill. For each sub-agent, ask: what is the minimum this agent needs to do this one account well? Usually that is the account's recent activity, the single relevant skill, the tool schemas, and a compact playbook snippet — not the whole CRM history and not the other 3,999 accounts. A definition worth quoting: context engineering is the practice of deliberately selecting the minimal, highest-signal information an agent needs for a task, and excluding everything else to preserve attention and reduce error.
Concretely, build a small context-assembly function that each sub-agent calls at start: it fetches the fresh record, trims activity to the last N relevant events, selects the one skill that matches the task, and stops. Make exclusion a first-class step. The instinct to "give the model everything just in case" is the most common cause of expensive, distracted, error-prone agents on a big book.
Pattern 4: Structured handoffs between agents
The orchestrator and sub-agents communicate only through structured objects, never free-form prose. A sub-agent returns something like a recommendation with fields for account ID, action, confidence, rationale, draft, and the facts it relied on. The orchestrator never re-reads a sub-agent's natural-language musings to figure out what happened; it reads fields. This makes the boundary between agents a real API, which means you can test each side independently and change one without breaking the other.
The "facts it relied on" field is quietly powerful. By forcing each recommendation to enumerate the specific record facts that justify it, you give the write gate something concrete to verify against the live record, and you give humans a fast way to audit. A recommendation that can't name its supporting facts is a recommendation the gate should reject — and the structured handoff is what makes that check trivial to implement.
Pattern 5: Evals as regression tests for prompts
Treat changes to skills and prompts like code changes: gate them with evals. Keep a fixed set of representative accounts — a few dozen spanning your real segments and edge cases — with known-good expected recommendations. When you edit a skill, re-run it over that set and diff the outputs. A change that improves one segment while quietly breaking another is invisible without this, and on a 4,000-account book a silent regression can mis-handle hundreds of accounts before anyone notices.
Your eval set should grow from real failures. Every time the review queue catches a bad recommendation, add that account to the eval set with the correct expected output. Over months this becomes the most valuable asset in the system — a precise, accumulating specification of what "good" means for your book that no prompt rewrite can accidentally undo.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Pattern 6: Make the cheap model do the bulk work
A cost-and-latency pattern that pays off everywhere: push as much volume as possible to a smaller, faster model and reserve the most capable model for genuine judgment. Whole-book scoring, simple field extraction, and hygiene checks run fine on a fast model. Per-account research synthesis and outreach drafting — where nuance matters — earn the capable model. Wiring this routing in as a per-skill model preference, rather than hardcoding one model everywhere, is a small change that compounds across thousands of daily account touches.
Frequently asked questions
How granular should skills be?
One verb, one job. If you need "and" to describe a skill, split it. Granular skills keep each sub-agent's context lean because skills load on demand, and they're far easier to eval and edit in isolation.
Why return structured errors from tools instead of empty results?
An empty result is indistinguishable from "truly no data," so the model acts on a false assumption. A structured error tells the model what went wrong and whether to retry, which is the difference between a safe agent and a confidently wrong one.
What belongs in a sub-agent's context and what doesn't?
Include the fresh account record (trimmed to relevant recent activity), the single matching skill, tool schemas, and a compact playbook snippet. Exclude other accounts, full history, and unrelated skills. Treat the window as a budget you allocate deliberately.
How do I keep prompt edits from causing silent regressions?
Run an eval set of representative accounts with expected outputs and diff every skill change against it. Grow the set from real review-queue failures so it becomes a precise, accumulating spec of correct behavior.
Bringing agentic AI to your phone lines
These patterns — single-job skills, loud-and-idempotent tools, budgeted context, structured handoffs — are exactly how CallSphere keeps its voice and chat agents reliable while they answer every call, use tools mid-conversation, and book work 24/7. See the patterns at work at callsphere.ai.
Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.