Prompt and Context Design for Claude Code GTM Agents
How to budget context for Claude Code GTM agents: what to include, what to leave out, just-in-time retrieval, and instruction layering.
Two engineers can wire the same CRM tools to Claude Code and get wildly different results, and the difference is almost never the tools — it's what they put in the context window. Prompt and context design is the most underrated skill in agentic GTM engineering. Pour in too little and the agent guesses; pour in too much and it drowns, latency climbs, and your token bill triples for worse answers. This post is about that balance: what belongs in context, what doesn't, and the reasoning behind each call.
I'll be opinionated, because vague advice here is useless. The governing idea is that context is a scarce, expensive resource you budget deliberately, the same way you'd budget memory in a constrained system. Every token you add should earn its place.
Context is a budget, not a bucket
The first mental shift is to stop treating the context window as a place to dump everything that might be relevant. Even with a 1M-token window, more context is not free: it raises cost, slows responses, and — counterintuitively — can lower quality by burying the signal the model needs under noise it has to wade through. The discipline is to ask of every chunk: does the model need this to make the next decision? If not, it stays out, available behind a tool call when needed.
A clean GTM agent context has three tiers. The instruction tier — the goal, the rules, the output schema — is small and stable. The working tier holds the specific account being processed right now. The reference tier is everything else, kept out of context and pulled in just in time. Conflating these is the root of most overload problems: when the rules, twelve accounts, and an entire knowledge base share one window, the model's attention is spread too thin to do any of it well.
What to put in context, and what to leave out
Put in: the task framed precisely, the output schema, the small set of hard rules (never email churned accounts, flag confidence below 0.7), and the single account's relevant data. Leave out: your entire CRM, the full enrichment-vendor docs, every historical interaction across all accounts, and long boilerplate the model doesn't need to reason over. The leave-out list is usually longer than the put-in list, and that asymmetry is the whole point.
The decision flow below shows how to route a piece of information — inline, retrieved on demand, or omitted entirely.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
flowchart TD
A["Candidate context item"] --> B{"Needed for the next decision?"}
B -->|No| C["Leave out entirely"]
B -->|Yes| D{"Needed every run?"}
D -->|Yes| E["Inline in instruction tier"]
D -->|No| F{"Account-specific?"}
F -->|Yes| G["Retrieve just in time via tool"]
F -->|No| H["Put behind a skill, load on demand"]This routing is what keeps context lean. Stable rules live inline; account detail is fetched per account; large bodies of knowledge live in skills or behind retrieval tools and load only when relevant. The agent assembles exactly the context each decision needs and nothing more.
Just-in-time retrieval over pre-loading
The single highest-leverage pattern is just-in-time retrieval. Rather than stuffing every account's history into the orchestrator's context, give the orchestrator a thin index — domains and one-line summaries — and let each subagent pull its own account's full detail when it starts working. This keeps the orchestrator's context clean for planning and lets retrieval happen in parallel at the leaves.
The payoff compounds at scale. Scoring 400 accounts by pre-loading all their data into one window is both expensive and bad; the model loses track. Scoring them with per-account just-in-time retrieval keeps each subagent's context focused on one account, makes runs cheaper, and parallelizes cleanly. Retrieval also keeps the agent current: it reads the account's latest CRM state at run time rather than a stale snapshot baked into a prompt written days ago.
Instruction design: rules the model can actually follow
How you phrase instructions changes adherence. Vague directives like "be thoughtful about which leads to contact" give the model room to drift. Concrete, checkable rules — "never draft outreach to an account with status=churned; if status is unknown, route to review" — are followed reliably because they're unambiguous. Write rules as conditions and actions, the way you'd write code, not as aspirations.
Order and emphasis matter too. Put the non-negotiables where they're hard to miss and state them positively where you can ("do X") alongside the prohibition ("never Y"). And keep the rule set small: a focused set of five rules the model honors every time beats thirty rules it partially tracks. When you find yourself adding the eleventh rule, ask whether it belongs in a deterministic tool instead — often the cleanest place to enforce a constraint isn't the prompt at all, it's a hook or a validation gate that the model can't talk its way around.
The output schema as context discipline
An often-missed point: the output schema is itself a powerful piece of context. By telling the model precisely what shape to produce — the exact fields, types, and constraints — you implicitly tell it what to pay attention to. A schema requiring icp_score, confidence, rationale, and signals focuses the model's reasoning on exactly those, and discourages the rambling, unstructured output that bloats responses and resists validation.
This is why structured output and lean context go together. A tight schema means the model doesn't need to be told in prose what to emphasize; the shape carries that signal. It also makes the agent's output machine-checkable, so context design and reliability reinforce each other — you spend fewer tokens explaining the task because the schema already encodes most of the intent.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Diagnosing context problems in production
When a GTM agent underperforms, context is the first place to look. Symptoms map to causes. If it ignores a rule, the rule is buried or vaguely worded — move it up and sharpen it. If answers are generic or hedged, the working-tier context is too thin — the account detail it needs isn't being retrieved. If runs are slow and expensive for mediocre output, the context is bloated — something that should be behind retrieval is being pre-loaded.
Treat context as something you tune with evidence, not vibes. Read the run logs, see what was in the window when a bad decision happened, and adjust the tiers. More often than not the fix is removing context, not adding it — the discipline of leaving things out is what separates an agent that scales from one that limps along expensive and unreliable.
Frequently asked questions
Doesn't a 1M-token window mean I can just include everything?
No. A large window raises the ceiling but doesn't remove the cost: more context means higher token spend, slower responses, and often worse quality as the signal gets buried in noise. Budget context deliberately and pull reference material in just in time rather than pre-loading it.
What's the difference between inline context and retrieved context?
Inline context is the small, stable instruction tier loaded every run — the goal, rules, and output schema. Retrieved context is account-specific or reference material fetched on demand through tools, so each decision sees only the data it needs and the agent reads the latest state rather than a stale snapshot.
How do I write rules the agent reliably follows?
Phrase rules as concrete conditions and actions, the way you'd write code — "if status=churned, route to review" rather than "be careful about contacting churned accounts." Keep the rule set small and prominent; if a constraint must hold absolutely, enforce it in a validation gate or hook rather than relying on the prompt.
Why does the output schema count as context?
Because specifying the exact output shape tells the model what to pay attention to and discourages rambling, unstructured responses. A tight schema focuses reasoning on the required fields and makes output machine-checkable, so you spend fewer tokens explaining the task in prose.
Bringing agentic AI to your phone lines
Lean context and sharp instructions are exactly how CallSphere's voice and chat agents stay fast and accurate while answering every call, retrieving data mid-conversation, and booking work 24/7. Try the live experience at callsphere.ai.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.