Claude Opus Security Agent Patterns You Can Reuse

The difference between a security agent demo and a security agent you trust at 3 a.m. is rarely the model — it is the patterns wrapped around it. After you build your first Claude Opus triage agent, you start seeing the same structural problems recur: context that bloats, tools that overlap, verdicts that are confident but unsupported. This post collects the reusable patterns that fix those problems, written at the level of how you actually structure prompts, tools, and context in code.

These are not abstract principles. Each one is a concrete shape you can copy into your own agent, and together they form a house style that keeps a security agent accurate, cheap, and explainable as it grows.

Pattern: the evidence-ledger prompt

Free-form reasoning is where security agents quietly go wrong — they assert a conclusion without anchoring it to data. Counter this with an evidence-ledger pattern. Instruct the model, in the system prompt, to maintain an explicit ledger: every claim it makes about the incident must be paired with the tool call and field that supports it. Before producing a verdict, it restates the ledger. "Host WIN-204 ran encoded PowerShell — source: edr_process_tree, pid 8821. The hash is flagged — source: intel_lookup, verdict malicious."

This single structural rule changes the model's behavior. Unsupported claims become visibly awkward to write, so the model stops making them, and your audit log gains a built-in citation trail. When a human reviews the verdict, they check the ledger, not the prose.

Pattern: narrow, verb-named tools with strict schemas

Resist the urge to ship one giant run_query tool that does everything. Broad tools force the model to construct complex arguments and make it hard to reason about least privilege. Instead, define narrow tools named for exactly what they do — edr_process_tree, intel_lookup_hash, siem_failed_logins — each with a strict JSON schema and a description that reads like guidance to a junior analyst. The schema is part of the prompt; a vague schema produces vague tool calls.

flowchart TD
  A["Tool result returns"] --> B{"Result size?"}
  B -->|Large| C["Haiku subagent: summarize + extract IOCs"]
  B -->|Small| D["Pass through unchanged"]
  C --> E["Structured digest"]
  D --> E
  E --> F["Append to evidence ledger"]
  F --> G["Opus reasons over compact context"]

Strict schemas also give you a free safety net: arguments that fail validation never reach your backend, so a malformed model output is rejected at the boundary instead of producing a bad query against your SIEM.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

Pattern: the compaction subagent

Security data is enormous and context is finite. The compaction pattern keeps your main loop lean: when a tool returns a large payload, do not dump it into Opus's context. Hand it to a cheap Haiku or Sonnet subagent whose only job is to summarize and extract structured indicators, then feed the compact digest back to Opus. The orchestrating model reasons over a clean evidence ledger while the messy log-wrangling happens in a disposable side context.

This pattern is also where you control cost. Opus tokens are expensive; spending them to read raw logs is wasteful. By isolating bulk text into Haiku subagents, you keep Opus focused on judgment and your bill proportional to reasoning, not volume. The same multi-agent design that costs several times more when used carelessly becomes a savings when you push the right work down to cheaper models.

Pattern: hypothesis-first triage

Structure the agent's investigation as hypothesis-driven rather than data-driven. The system prompt tells it: before querying anything, state two or three competing explanations — benign admin activity, misconfiguration, genuine intrusion — and then query specifically to confirm or rule out each one. This prevents the aimless "pull everything and hope" behavior that burns tokens and produces mushy conclusions.

Hypothesis-first triage mirrors how strong human analysts work, and it makes the transcript readable. A reviewer can see the agent considered and dismissed the benign explanation with specific evidence, which is exactly the reasoning a SOC lead wants to audit.

Pattern: the refusal-and-escalate contract

Encode a hard contract for uncertainty. When evidence is contradictory or insufficient, the agent must stop, summarize what it knows and what it could not determine, and escalate to a human — never fabricate a clean story. Implement this as an explicit instruction plus a structured escalate output the model can emit, which your control layer routes to an analyst queue. Calibrated escalation is a feature, not a failure; it is what lets you trust the verdicts the agent does deliver confidently.

Pattern: context layering — stable, task, and ephemeral

Organize the context window into three deliberate layers. The stable layer is the system prompt and the relevant skill — your role contract and runbook, loaded once. The task layer is the alert and the evolving evidence ledger. The ephemeral layer is raw tool output that gets compacted and discarded. Keeping these mentally and structurally separate stops the common failure where a giant raw payload shoves your runbook out of the effective context and the agent forgets its own rules.

In code, this means you never append raw tool results straight to the running transcript. You append digests. The stable layer is reconstructed deterministically each turn so the agent's core instructions are always present and never crowded out by data.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Pattern: deterministic post-checks on every verdict

Do not trust the model to enforce your hardest rules — verify them in code. After the agent produces a verdict, run deterministic checks: does a "malicious" verdict cite at least two sources in its ledger? Does any recommended write action map to an allowed, gated tool? If a check fails, reject the verdict and send it back or escalate. This belt-and-suspenders pattern catches the rare case where the model's prose drifts from your policy, and it gives compliance a guarantee the model alone cannot.

Frequently asked questions

Why force an evidence ledger instead of trusting the reasoning?

Because prose can sound right while being unsupported. Requiring every claim to cite a tool call and field makes unsupported assertions structurally awkward, turns the audit log into a citation trail, and gives reviewers a fast way to verify a verdict. It is the single highest-leverage pattern for trustworthy security triage.

When should I split work into subagents?

Push high-volume, low-judgment work — summarizing logs, extracting IOCs from a blob — to cheap Haiku or Sonnet subagents, and keep correlation and final judgment on Opus. This keeps the main context clean and your cost proportional to reasoning. Avoid multi-agent designs where a single Opus pass would do; they cost several times more tokens.

How do I keep the runbook from getting crowded out of context?

Use context layering. Reconstruct the stable layer — system prompt plus the relevant skill — deterministically each turn, and never append raw tool output to the transcript. Compact large payloads into digests first so the ephemeral data never displaces your role contract and runbook.

Are deterministic post-checks really necessary if the prompt is good?

Yes. Prompts shape behavior probabilistically; your hardest safety rules need a guarantee. A short code-level check — two sources for a malicious verdict, write actions limited to gated tools — catches the rare drift the prompt misses and gives compliance a deterministic backstop.

Bringing agentic AI to your phone lines

These same patterns — evidence trails, narrow tools, layered context — power CallSphere's voice and chat agents, which reason carefully, cite what they used, and act only within bounds. Hear one in action at callsphere.ai.

Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.

Claude Opus Security Agent Patterns You Can Reuse

Pattern: the evidence-ledger prompt

Pattern: narrow, verb-named tools with strict schemas

Pattern: the compaction subagent

Pattern: hypothesis-first triage

Pattern: the refusal-and-escalate contract

Pattern: context layering — stable, task, and ephemeral

Pattern: deterministic post-checks on every verdict

Frequently asked questions

Why force an evidence ledger instead of trusting the reasoning?

When should I split work into subagents?

How do I keep the runbook from getting crowded out of context?

Are deterministic post-checks really necessary if the prompt is good?

Bringing agentic AI to your phone lines

Try CallSphere AI Voice Agents

Related Articles You May Like

Where Claude Code GTM engineering is heading next

Where Claude Cowork is heading and how to prepare

Measuring Claude Cowork success: metrics that prove it

How to measure success of Claude Code GTM workflows

Claude Cowork walkthrough: from problem to shipped

End-to-end Claude Code GTM workflow: a real rebuild