Claude Managed Agent Patterns: Tools, Prompts, and Context
Reusable patterns for tools, prompts, and context in Claude managed agents: verb tools, discovery-first, layered prompts, and context budgeting.
The first managed agent you build works because you watched it closely. The tenth one works because you stopped reinventing the structure each time. After enough of these, the same patterns keep paying off: how you name and shape tools, how you layer the prompt, how you decide what stays in context and what gets summarized away. This post is a field guide to those reusable patterns — the code-level conventions that make managed agents in self-hosted sandboxes predictable instead of artisanal.
Key takeaways
- Name tools as verbs over nouns and make one tool do one decision-sized job, not a dozen.
- Layer prompts into stable instructions, dynamic task context, and ephemeral tool results — each changes at a different cadence.
- Use a discovery-then-act pattern so the agent fetches valid identifiers instead of guessing them.
- Return results as compact, typed structures; push summarization into the tool, not the model.
- Treat context as a budget you actively manage, evicting stale observations before they crowd out the goal.
Pattern 1: Tools as verbs, sized to one decision
The single most useful tool-design rule is that a tool should map to one decision the agent makes, named as an action. list_overdue_invoices, refund_payment, open_ticket — each is a thing the agent decides to do. The anti-pattern is the mega-tool: a single invoice_ops with a mode argument that branches into ten behaviors. The model has to reason about that branching in prose, which it does poorly, and your schema can no longer constrain arguments per behavior.
Small, verb-named tools also make traces readable and make capabilities auditable. When every tool is one action, the list of tools is literally the list of things the agent can do. That property is worth a few extra tool definitions.
There is a sizing question underneath this too. A tool that is too small forces the agent into long chains of trivial calls, each a round trip through the tunnel; a tool that is too large hides branching logic the model must reason about in prose. The sweet spot is a tool that completes one meaningful unit of work the agent would naturally treat as a single decision. "Get the customer's last three orders" is one decision; "get a row from a table" is a primitive the agent will have to orchestrate, and that orchestration is exactly the part it does least reliably.
Pattern 2: Discovery before action
Agents hallucinate identifiers. The fix is structural, not a stern prompt: never give the agent a tool that requires an id it cannot obtain from another tool. Pair every "act on X" tool with a "find X" tool, and instruct the agent to discover before it acts.
flowchart TD
A["Goal: refund the late order"] --> B{"Have a valid order id?"}
B -->|No| C["Call search_orders"]
C --> D["Pick matching order from results"]
B -->|Yes| E["Call refund_order(order_id)"]
D --> E
E --> F["Confirm result & summarize"]
This pattern turns a guess into a lookup. The agent calls search_orders, reads back real ids, and only then calls refund_order with one it actually saw. You get correctness for free because the only ids in play came from your own data.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
The same idea generalizes beyond ids. Any time a tool needs a value that must match something real — a SKU, a user handle, an enum the agent cannot reliably recall — pair it with a lookup that returns the valid options. This keeps the agent grounded in your data rather than its priors, and it makes the failure mode loud instead of silent: a bad lookup returns nothing the agent can act on, whereas a hallucinated value sails straight into a mutation. Grounding through discovery is cheaper to build than it looks and saves you the entire class of "the agent confidently used an id that does not exist" incidents.
Pattern 3: Layer the prompt by change cadence
A managed-agent prompt is not one blob. Split it by how often each part changes. The system layer — role, boundaries, output format, stop conditions — is stable across runs. The task layer — the specific goal and any run-specific constraints — changes per run. The observation layer — tool results — changes every turn. Keeping these distinct makes prompt caching effective, because the stable prefix can be cached while only the volatile tail varies.
SYSTEM (stable): role, allowed tools, output contract, when to stop
TASK (per run): "Resolve the dispute for order #4821 under policy v3"
OBSERVATIONS: [appended each turn: tool results, errors]
Beyond caching, the layering keeps each part honest. The system layer should never mention a specific order; the task layer should never redefine the agent's role. When they bleed together, prompts drift and become hard to reuse across agents.
Pattern 4: Typed, compact results — summarize in the tool
Where you do the summarizing decides your token bill. If a tool returns a 200-line log and you ask the model to find the error, you pay for those 200 lines on every subsequent turn. If the tool returns { "status": "failed", "error": "timeout", "line": 142 }, you pay for almost nothing and the model reasons better. Push the reduction into the tool, server-side, before it crosses the tunnel.
Type the results too. A consistent shape — same keys, same units — lets the model build a stable mental model across calls and lets you validate outputs. Free-form text results force the model to re-parse natural language each turn, which is both costlier and flakier than reading a typed object.
Pattern 5: Context as a managed budget
Context is finite even at a million tokens, and more relevant: every token you keep is a token the model re-reads every turn. Treat context as a budget you actively curate. Keep the goal and the most recent, decision-relevant observations; evict or compress old tool results that no longer matter. A simple rule that works well: once an observation has been acted on and is no longer referenced, replace it with a one-line summary of what it told you.
This is the difference between an agent that stays sharp over a 30-step run and one that drowns in its own history. The model's attention is a shared resource across everything in context; protect the goal's share of it by being ruthless about what else is allowed to stay.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
A practical way to operationalize this is a small bookkeeping field in your orchestration: track which observations have been referenced since they were added. Anything that has gone several turns untouched is a strong candidate for compression. You do not need anything clever — a last-referenced counter is enough to make eviction a mechanical step rather than a judgment call you forget to make under deadline.
Pattern 6: Make failure a first-class result
Agents recover well when failures are structured and badly when they are a stack trace dumped into context. Return errors as data the model can act on: a code, a human-readable reason, and ideally a hint about the next step. { "error": "not_found", "hint": "call search_orders to get a valid id" } turns a dead end into a recovery path. The model reads the hint and self-corrects on the next turn, which is exactly the behavior you want.
Apply these patterns in 5 steps
- Rewrite any mega-tool into several verb-named tools, one per decision.
- For each "act" tool, add the matching "find" tool and require discovery first.
- Split your prompt into stable system, per-run task, and per-turn observation layers.
- Move all summarization into the tools so results cross the tunnel already compact and typed.
- Add a context-eviction rule that compresses acted-on observations into one-line summaries.
Pattern tradeoffs
| Choice | Cheap option | Robust option |
|---|---|---|
| Tool granularity | One mega-tool | Many verb tools |
| Identifiers | Pass ids in prompt | Discover via tool |
| Result shape | Raw text/logs | Typed, summarized |
| Context | Keep everything | Evict & compress |
| Errors | Stack trace | Coded result + hint |
Frequently asked questions
Why prefer many small tools over one flexible tool?
Small verb-named tools let the JSON schema constrain arguments per action, make traces readable, and make the agent's capabilities equal to the list of tools — which is far easier to audit and reason about than a branchy mega-tool.
How do I stop an agent from hallucinating ids?
Use discovery-before-action: never expose an act tool whose id the agent cannot obtain from a find tool, and instruct it to look ids up first. The only ids in play then come from your own data.
Where should summarization happen — in the model or the tool?
In the tool, server-side, before results cross the tunnel. That keeps every later turn cheap, since results stay in context for the rest of the run, and gives the model a clean typed object to reason over.
Does a 1M-token window mean I can stop managing context?
No. Every retained token is re-read each turn and competes for the model's attention, so curating context still improves both cost and reliability on long runs.
Bringing agentic AI to your phone lines
CallSphere applies these tool, prompt, and context patterns to voice and chat agents that answer every call, use tools mid-conversation, and book work 24/7. See the patterns in production at callsphere.ai.
Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.