Build a Claude Opus Security Triage Agent: Walkthrough

You have read the architecture posts and the marketing decks. Now you actually have to build the thing. This is the hands-on walkthrough: a step-by-step path from an empty directory to a working Claude Opus triage agent that can investigate a real alert, query your SIEM, and recommend a response — with a human gate in front of anything dangerous. I will keep it concrete enough that an engineer can follow along and end up with something running.

We will use the Claude Agent SDK, which gives you the agent loop, tool execution, and MCP wiring as primitives so you do not reinvent them. The example is intentionally small but production-shaped: the patterns scale straight up to a real deployment.

Step 1 — Scaffold the project and pin the model

Start a fresh service with one responsibility: run the agent loop. Add the Agent SDK as a dependency, set your ANTHROPIC_API_KEY in the environment, and pin the model id explicitly to claude-opus-4-8 rather than an alias. Pinning matters in security tooling — you want your agent's behavior to change only when you deliberately upgrade and re-test, never silently.

Create three top-level folders: tools/ for your MCP server definitions, skills/ for runbooks, and policy/ for the gate logic. This structure mirrors the four-layer architecture and keeps each concern reviewable on its own. Commit it before you write any logic so your audit history starts clean.

Step 2 — Write the system prompt as a role contract

The system prompt is where you define what kind of analyst the agent is. Be specific and bounded. State the role ("You are a Tier-1 SOC triage analyst"), the goal ("determine whether each alert is benign, suspicious, or malicious, and recommend a next action"), and the hard rules ("never claim a host is compromised without evidence from at least two sources; never recommend isolation without citing the indicator that justifies it").

Crucially, tell the model how to behave when it is uncertain: escalate to a human with a clear summary rather than guessing. In security, a calibrated "I am not sure, here is why" is worth more than a confident wrong verdict. This single instruction prevents a whole class of dangerous over-confidence.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

flowchart TD
  A["Receive alert payload"] --> B["Opus: form hypotheses"]
  B --> C["Call SIEM tool: get context"]
  C --> D["Call EDR tool: process tree"]
  D --> E{"Evidence sufficient?"}
  E -->|No| B
  E -->|Yes| F["Classify: benign / suspicious / malicious"]
  F -->|malicious| G["Draft containment + gate for approval"]
  F -->|benign| H["Close with note in ticket"]
  G --> I["Write audit record"]
  H --> I

Step 3 — Define your first tool: SIEM query

Your agent is blind until you give it a tool. Define a single MCP tool called siem_query with a tight JSON schema: a query string in your SIEM's syntax, an optional time_range, and a max_results cap. The schema description is part of the prompt — write it as if explaining to a junior analyst, because that is effectively what you are doing. Note in the description that results are truncated to max_results and that the agent should narrow its query rather than asking for everything.

Implement the handler to run the query against a read-only SIEM credential, then return structured JSON. Always return a small, normalized shape — event time, source, host, user, a short message — not the raw firehose. The model reasons far better over clean structured data than over megabytes of unformatted logs, and you save enormous token cost.

Step 4 — Add the EDR tool and a threat-intel lookup

Repeat the pattern for two more read tools. edr_process_tree takes a host and a process id and returns the parent/child chain. intel_lookup takes a hash, domain, or IP and returns a reputation verdict from your threat-intel feed. Keep every one of these read-only. At this stage your agent can investigate thoroughly but cannot change anything — which is exactly where you want to be before adding write capability.

Test each tool in isolation first, outside the agent. Call the handler directly with known inputs and confirm the JSON shape. Tool bugs masquerade as model bugs and waste hours; verifying the boundary first saves you that pain.

Step 5 — Gate the one write action

Now add a single write tool, isolate_host, and wrap it in policy. The handler does not call the EDR directly. Instead it writes an approval request to a queue and returns a status of pending_approval to the model. The model reports to the analyst that containment is staged and awaiting sign-off. A human reviews the request — with the full reasoning attached — and either approves it, at which point the real isolation fires, or rejects it. This is how you get the agent's speed without surrendering authority over destructive actions.

Step 6 — Load a runbook as a skill

Create your first skill: a folder named ransomware-triage with an instructions file describing exactly which log sources to check, what early indicators look like in your environment, and your escalation thresholds. The Agent SDK loads this skill into context only when the agent's task matches it, so you can write a thorough runbook without bloating every prompt. Now when Opus recognizes a ransomware pattern, it follows your playbook instead of a generic one.

Step 7 — Run a live investigation and read the transcript

Feed the agent a real (or replayed) alert and watch it work. It should form a hypothesis, call siem_query and edr_process_tree, fold the results back in, and reach a classification. Open the transcript and read every tool call and the model's reasoning between them. This is your most valuable debugging artifact. If a verdict is wrong, you will see exactly where — a missing log source, an ambiguous schema description, a runbook gap — and you fix that one thing.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

A common first-run issue is the agent asking the SIEM for too much and overflowing context. The fix is in the tool description and the system prompt: instruct it to scope queries tightly and iterate. Another is over-eager isolation; tighten the system prompt's evidence rule. Each fix is small, local, and testable.

Step 8 — Add budgets before you ship

Finally, cap the loop. Set a maximum number of tool calls and a token budget per investigation in the SDK configuration. If a confused run hits the cap, it stops and escalates to a human instead of spinning indefinitely. For cost, route pure summarization sub-tasks to Haiku or Sonnet via subagent calls and keep Opus on the correlation work. Now you have a bounded, auditable, genuinely useful triage agent.

Frequently asked questions

Do I need the Claude Agent SDK to build this?

No, but it saves you from reimplementing the agent loop, tool execution, and MCP wiring. The SDK is built on Claude Code primitives, so the same patterns — tools, skills, subagents, budgets — carry over directly. You can build the loop by hand against the API, but expect to rebuild a lot of plumbing.

How many tools should the first version have?

Three or four reads and exactly one gated write is plenty for a first working agent. More tools mean a larger decision surface and more ways to go wrong. Get a tight set working end to end, read the transcripts, and add capability only when a real investigation needs it.

What is the single most important system-prompt rule?

Tell the agent to escalate with a clear summary when uncertain rather than guess. Calibrated uncertainty is the behavior that makes a security agent trustworthy. Pair it with an evidence rule — no compromise claim without two sources — and you eliminate most dangerous failure modes.

How do I test without risking production?

Use read-only credentials for every data tool and route the one write action through an approval queue that you control. Replay historical alerts so the agent investigates real data with zero ability to change anything until a human approves.

Bringing agentic AI to your phone lines

CallSphere builds the same kind of bounded, tool-using agents for voice and chat — assistants that investigate, look things up mid-conversation, and take action only where allowed. Watch one work at callsphere.ai.

Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.

Build a Claude Opus Security Triage Agent: Walkthrough

Step 1 — Scaffold the project and pin the model

Step 2 — Write the system prompt as a role contract

Step 3 — Define your first tool: SIEM query

Step 4 — Add the EDR tool and a threat-intel lookup

Step 5 — Gate the one write action

Step 6 — Load a runbook as a skill

Step 7 — Run a live investigation and read the transcript

Step 8 — Add budgets before you ship

Frequently asked questions

Do I need the Claude Agent SDK to build this?

How many tools should the first version have?

What is the single most important system-prompt rule?

How do I test without risking production?

Bringing agentic AI to your phone lines

Try CallSphere AI Voice Agents

Related Articles You May Like

Where Claude Code GTM engineering is heading next

Where Claude Cowork is heading and how to prepare

Measuring Claude Cowork success: metrics that prove it

How to measure success of Claude Code GTM workflows

Claude Cowork walkthrough: from problem to shipped

End-to-end Claude Code GTM workflow: a real rebuild