Reusable Claude Agent Patterns: Prompts, Tools, Context (Claude Api Skill Ecosystem)

Anyone can get a Claude agent working once. The hard part is building one that stays reliable across a hundred different inputs, a thousand turns, and six months of feature creep. That reliability doesn't come from a clever prompt — it comes from patterns: repeatable ways to structure the prompt, the tool surface, and the context that you apply the same way every time. This post collects the code-level patterns I reach for on every Claude agent, the ones that turn a fragile script into something a team can maintain.

Pattern 1: A frozen system prompt with volatile data pushed down

The most consequential structural decision is where dynamic data lives. The instinct is to interpolate everything into the system prompt — current date, user name, the request ID. Resist it. The system prompt sits at the front of the cache prefix, so a timestamp there invalidates the cache on every single request. The pattern is a frozen system prompt that never changes byte-for-byte, with all volatile context pushed into the message array.

system = [{"type": "text", "text": STABLE_INSTRUCTIONS,
          "cache_control": {"type": "ephemeral"}}]
messages = [
    {"role": "user", "content": f"Today is {today}. {user_question}"}
]

The date goes in a user turn, not the system prompt. A message at turn five invalidates nothing before turn five, while a changed system prompt re-bills your entire history at full price. Treat the system prompt as a compiled artifact: write it once, keep it deterministic, and verify with usage.cache_read_input_tokens that hits actually accrue.

Pattern 2: Tool descriptions that say when, not just what

The default failure mode of an agent is calling the wrong tool, or no tool, at the wrong moment. The fix is almost never more tools — it's better descriptions. The pattern is to make each tool's description prescriptive about its trigger condition, baked into the schema rather than the system prompt.

{
  "name": "search_docs",
  "description": "Search the internal knowledge base. Call this whenever "
      "the answer depends on company-specific policy, pricing, or "
      "product details not present in the conversation. Do NOT call "
      "it for general knowledge questions.",
  "input_schema": {"type": "object",
    "properties": {"query": {"type": "string"}},
    "required": ["query"]}
}

On recent Opus models, which reach for tools more conservatively, this when-to-call language gives a measurable lift in should-call rate. Putting the trigger in the tool's own description rather than a sprawling system-prompt rulebook also keeps the rule co-located with the tool, so it survives refactors.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

flowchart TD
  A["Incoming request"] --> B{"Needs company data?"}
  B -->|No, general knowledge| C["Answer directly"]
  B -->|Yes| D["search_docs(query)"]
  D --> E["Structured snippets returned"]
  E --> F{"Enough to answer?"}
  F -->|No| D
  F -->|Yes| G["Compose grounded answer"]

Pattern 3: Bash for breadth, dedicated tools for control

A recurring design question is whether to give the agent a single bash tool or a fleet of narrow ones. The pattern is to start with bash for breadth and promote actions to dedicated tools when you need to gate, render, audit, or parallelize them. Bash gives the model broad leverage but hands your harness an opaque string. A dedicated send_email tool gives the harness a typed hook it can intercept and require confirmation for; bash -c "curl -X POST ..." gives it nothing.

Reversibility is the criterion I use. Hard-to-reverse actions — sending messages, deleting data, hitting external APIs — earn a dedicated tool so they can be gated. Read-only operations like search can stay in bash or be marked parallel-safe. The shape of your tool surface is really a map of your trust and safety boundaries.

Pattern 4: Programmatic tool calling for chained work

When an agent must chain several tool calls — read a profile, then look up orders, then check inventory — standard tool use makes each a round trip, and every intermediate result lands in the context window whether you need it or not. The pattern for this is programmatic tool calling: the model writes a short script that invokes tools as functions inside the code-execution container. Intermediate results stay in the running script; only the final output returns to the model's context.

The payoff is concrete. Three sequential lookups become one script execution instead of three round trips, and a thousand-row intermediate result gets filtered down before it ever costs you context tokens. Reach for this whenever you see the agent making many sequential calls or hauling large payloads it immediately discards.

Pattern 5: Structured outputs at every boundary

Free text is fine for a chat reply and a liability everywhere else. The pattern is to constrain output with a schema at any boundary where another system consumes the result — using output_config.format for the response shape and strict: true on tools whose arguments must validate. With messages.parse() and a typed model, the SDK validates for you and hands back a real object.

This isn't just hygiene. A schema is a contract that lets you change the prompt freely without breaking the downstream consumer, and it surfaces refusals and truncation explicitly — a refusal or max_tokens stop reason tells you the output won't match the schema before you try to parse it.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Pattern 6: Context lifecycle as code

Long-running agents need a deliberate context strategy, not hope. The pattern is to pick the right tool for the right horizon: context editing to prune stale tool results within a session, compaction to summarize when you near the window (appending the compaction block back verbatim), and memory for state that must survive across sessions. Many durable agents use all three, layered. Writing this as explicit policy — "clear tool results older than N turns, compact at 150K tokens, persist user preferences to memory" — turns context management from an emergency into a configuration.

Frequently asked questions

Why does my prompt cache hit rate keep dropping to zero?

Something volatile is sitting in your prefix — a datetime.now() in the system prompt, an unsorted json.dumps of your tools, or a per-user ID interpolated early. Diff the rendered prompt bytes between two requests, move the volatile piece after the last cache breakpoint, and confirm with cache_read_input_tokens.

How do I keep an agent from over-calling a tool?

Soften the language. Aggressive instructions like "ALWAYS use this tool" overtrigger on recent models that follow the system prompt closely. Replace "you MUST call X" with "call X when Y," and let the trigger condition do the gating.

When should I split work across subagents?

When the work fans out into independent streams, or when a sub-task wants a cheaper model. A subagent keeps the main loop's cache and model stable while the sub-task runs in isolation. Don't spawn one for a single file read or a sequential step — that just burns tokens.

Do strict tools and structured outputs slow things down?

A new schema incurs a one-time compilation cost on first use, then is cached for 24 hours. The validation guarantee is almost always worth the first-call latency, especially at a system boundary where a malformed payload would otherwise crash a downstream service.

Bringing agentic AI to your phone lines

These same patterns — frozen prompts, prescriptive tools, disciplined context — are what keep CallSphere's voice and chat agents reliable across thousands of live conversations, using tools mid-call to look things up and book work without a human stepping in. See the patterns running in production at callsphere.ai.

Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.

Reusable Claude Agent Patterns: Prompts, Tools, Context (Claude Api Skill Ecosystem)

Pattern 1: A frozen system prompt with volatile data pushed down

Pattern 2: Tool descriptions that say when, not just what

Pattern 3: Bash for breadth, dedicated tools for control

Pattern 4: Programmatic tool calling for chained work

Pattern 5: Structured outputs at every boundary

Pattern 6: Context lifecycle as code

Frequently asked questions

Why does my prompt cache hit rate keep dropping to zero?

How do I keep an agent from over-calling a tool?

When should I split work across subagents?

Do strict tools and structured outputs slow things down?

Bringing agentic AI to your phone lines

Try CallSphere AI Voice Agents

Related Articles You May Like

Where Claude Cowork is heading and how to prepare

Where Claude Code GTM engineering is heading next

Measuring Claude Cowork success: metrics that prove it

How to measure success of Claude Code GTM workflows

Claude Cowork walkthrough: from problem to shipped

End-to-end Claude Code GTM workflow: a real rebuild