Reusable Patterns for Claude Code Security Agents
Code-level patterns for prompts, tools, and context in a Claude Code threat-detection agent that stays reliable.
The first version of any security agent works because you babied it. The tenth version works because you found the patterns that hold up under change. After building a few Claude Code threat-detection agents, the same handful of structural decisions keep separating the systems that stay maintainable from the ones that calcify into an unreadable mega-prompt. This post is a catalog of those reusable patterns — how to structure the prompts, the tools, and the context so the agent stays reliable as it grows.
None of these patterns are exotic. They're the equivalent of good function decomposition: boring, durable, and the reason your future self isn't paged at 3 a.m. trying to figure out why the agent suddenly trusts a known-bad IP.
Pattern 1: The investigation procedure as a skill
Don't pack every detection's logic into one giant system prompt. Instead, express each investigation type as an Agent Skill — a folder of instructions and helper scripts that Claude Code loads dynamically when an alert of that type arrives. The impossible-travel procedure lives in one skill; the privilege-escalation procedure lives in another. The orchestrator's job shrinks to routing a seed to the right skill.
This pattern pays off three ways. New detection logic is a new folder, not a risky edit to a shared prompt. The agent only loads the procedure relevant to the current alert, keeping its working context lean. And you can version and test each skill independently, so improving privilege-escalation triage can't accidentally regress your login analysis.
Pattern 2: Tools return evidence, the model returns judgment
Draw a hard line between gathering and deciding. Tools should be dumb and deterministic — fetch logins, look up an IP, read an asset record — and return clean structured data. The model should be the only place where judgment happens. The failure mode to avoid is a "smart" tool that pre-decides ("this login looks suspicious") because now two systems are reasoning and you can't audit which one was wrong.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Concretely, every tool's output schema should be facts: timestamps, counts, reputation scores, sensitivity labels. The verdict, the confidence, and the recommended action only ever come from the model, and they must cite the specific tool facts that justify them. This separation makes the whole system explainable, because a verdict is always traceable to the evidence the tools returned.
flowchart TD
A["Seed arrives"] --> B["Router selects investigation skill"]
B --> C["Skill loads procedure & criteria"]
C --> D["Tools fetch structured evidence"]
D --> E{"Enough evidence to decide?"}
E -->|No| F["Call one more targeted tool"]
F --> E
E -->|Yes| G["Model emits verdict + cited evidence"]
G --> H["Schema validation before handoff"]Pattern 3: Constrain output with a verdict schema
Free-text verdicts rot. The pattern that scales is a strict verdict schema the model must fill: verdict from a fixed enum, confidence as a number, evidence as a list where each item references a tool result, and recommended_action from a fixed enum. Validate this schema before anything downstream touches it. If the model returns malformed output, you reject and retry rather than letting a half-formed verdict flow into your governance gate.
The enums matter more than they look. By forcing verdict to be one of a small, fixed set, you make every downstream consumer — dashboards, routing rules, the approval gate — trivial to write and impossible to surprise. A free-text verdict means every consumer has to interpret natural language, which is exactly the fragility you were trying to remove.
Pattern 4: Budget the investigation
Agents will happily loop forever chasing one more lookup. Give every investigation an explicit budget: a maximum number of tool calls and a maximum number of subagents. When the agent hits the budget without reaching confidence, it must emit a verdict of needs_human rather than spinning. This single constraint prevents the two worst production incidents — runaway token spend and a quiet self-inflicted denial-of-service against your own MCP servers — and it encodes a healthy humility: sometimes the right answer is "escalate, I can't decide."
Pair the budget with a confidence threshold. The procedure should aim to reach a confident verdict within budget; if it can't, escalating to a human is the correct, designed behavior, not a failure.
Pattern 5: Idempotent, replayable cases
Treat every investigation as a pure-ish function of its seed plus the tool results at investigation time. Persist the seed, the tool results, and the verdict together as a case record. Two benefits follow. First, replays are deterministic enough to test against: rerun a historical case through a new prompt and compare verdicts. Second, if the same alert fires twice, keying on the case ID lets you continue the existing investigation rather than starting a duplicate, which keeps your audit trail clean and your costs down.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Pattern 6: Subagents for independent hypotheses only
Multi-agent fan-out is powerful and expensive — multi-agent runs typically burn several times more tokens than a single agent. So apply it surgically: spawn subagents only when an investigation genuinely has independent hypotheses to chase in parallel, like "is this credential theft?" versus "is this an automated scanner?" Each subagent investigates one hypothesis with its own tool budget and returns structured findings; the orchestrator reconciles. For linear investigations with a single thread of reasoning, a single agent is cheaper and just as accurate. The pattern is to make fan-out a deliberate choice in the routing layer, never the default.
Frequently asked questions
Where should detection logic live — prompt or code?
The decision criteria and procedure live in the skill's instructions so the model can reason over them; the deterministic facts live in code behind tools. Keep judgment in the prompt and data-gathering in code, and never blur the two.
How do I stop the agent from looping?
Give every investigation a hard tool-call and subagent budget, plus a needs_human escape verdict. When the agent can't reach confidence within budget, escalating is the designed outcome, which kills both runaway cost and self-DoS against your tools.
Are these patterns specific to security?
The shapes — skill-per-procedure, evidence-vs-judgment separation, schema-constrained output, budgets, replayable cases — generalize to most production Claude Code agents. Security just makes the cost of getting them wrong unusually visible.
Bringing agentic AI to your phone lines
CallSphere builds on these same patterns — schema-constrained outputs, tool budgets, and clean evidence-to-judgment separation — for voice and chat agents that handle every call and message and book work 24/7. See the patterns at work at callsphere.ai.
Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.