---
title: "Build an AI Triage Agent for Your SOC: A Walkthrough"
description: "Build a Claude-powered security triage agent step by step: schemas, MCP tools, the run loop, approvals, and the evals you need before you trust it."
canonical: https://callsphere.ai/blog/build-an-ai-triage-agent-for-your-soc-a-walkthrough
category: "Agentic AI"
tags: ["agentic ai", "claude", "soc automation", "claude agent sdk", "security engineering", "mcp"]
author: "CallSphere Team"
published: 2026-04-10T08:23:11.000Z
updated: 2026-06-06T21:47:43.546Z
---

# Build an AI Triage Agent for Your SOC: A Walkthrough

> Build a Claude-powered security triage agent step by step: schemas, MCP tools, the run loop, approvals, and the evals you need before you trust it.

You have read the architecture posts. Now you have a ticket to actually build the thing: an agent that takes a raw alert, investigates it, and hands a human analyst a ranked verdict with evidence. This walkthrough is the part that usually gets hand-waved. We will go from an empty repository to a working Claude triage agent, in the order an engineer actually builds it, with the decisions called out at each step.

The goal is narrow on purpose: one detection type — suspicious OAuth application grants — investigated end to end. Narrow scope is how you ship something real instead of a six-month platform that never goes live. Once one detection works, the second is mostly configuration.

## Step 1: Define the alert contract and the verdict contract

Before any model code, pin down two schemas. The **input contract** is the normalized alert: an event id, the actor identity, the resource, raw fields, and a timestamp. The **output contract** is the verdict the agent must produce: a classification (benign / suspicious / malicious), a confidence score, a list of evidence items each with a source, and a recommended action. Forcing structured output is what turns a chatty model into a pipeline component.

Write these as JSON Schemas and validate both directions. If the agent returns malformed output, you reject and retry rather than passing garbage downstream. This single discipline — strict contracts at the boundary — prevents most of the flakiness people blame on "the model being unreliable."

## Step 2: Stand up the tools the agent will need

List the questions a human analyst would ask for an OAuth grant: who is this user, is this app known, what scopes did it request, has this app appeared in intel, what did the user do right after. Each becomes a tool. Build them as MCP server endpoints so they are reusable and audited. Start strictly read-only: `get_user_context`, `lookup_oauth_app`, `check_threat_intel`, `get_recent_activity`. The one state-changing tool, `revoke_grant`, lives on a separate server you will not even register until step 6.

```mermaid
flowchart TD
  A["Normalized OAuth-grant alert"] --> B["Validate against input schema"]
  B --> C["Claude agent run loop"]
  C --> D{"Need more evidence?"}
  D -->|Yes| E["Call read-only MCP tool"]
  E --> F["Append result to context"]
  F --> C
  D -->|No| G["Emit verdict (validate output schema)"]
  G --> H{"Verdict = malicious & high confidence?"}
  H -->|Yes| I["Queue revoke_grant for human approval"]
  H -->|No| J["File ranked ticket for analyst review"]
```

Notice the loop in the diagram between the agent and the read-only tools. That is the investigation. The agent decides which tool to call next based on what it has learned, not a fixed script — but every tool it can reach is safe to call as many times as needed.

## Step 3: Write the system prompt as an operating procedure

Treat the system prompt like an SOP you would hand a new analyst, not a personality. State the agent's single job, the exact tools available and when to use each, the evidence standard required before declaring something malicious, and the hard rule that it must never recommend revocation without at least two corroborating evidence items. Define the verdict format inline and give one fully worked example of a benign case and one malicious case so the model calibrates its thresholds.

Crucially, instruct it on uncertainty: when evidence is thin, the correct output is low confidence and a request for human review, not a confident guess. Security triage rewards calibrated humility far more than bravado, and the prompt is where you encode that value.

## Step 4: Implement the run loop

The loop is mechanical. Send the system prompt plus the alert, expose the read-only tools, and let the model run. On each turn, if it requests a tool, execute it, validate the result, append it, and continue. Cap the loop — say eight tool calls — so a confused run cannot spin forever. When the model emits a verdict, validate it against the output schema. If validation fails, send the error back and let it correct once before you escalate to a human.

Use the Claude Agent SDK or a thin wrapper over the Messages API; either way the structure is the same: tool definitions, a turn loop, structured-output validation, and a hard call budget. Log every turn — prompt, tool calls, results, final verdict — to your audit store. You will need that log for the eval step and for any incident postmortem.

## Step 5: Build an eval set before you trust it

Pull thirty to fifty historical OAuth-grant alerts with known ground truth — confirmed benign, confirmed malicious, and the genuinely ambiguous ones. Run the agent against all of them offline and score precision and recall on the malicious class. The number that matters most is false-negative rate: a missed malicious grant is the failure that hurts. Tune the prompt and evidence thresholds until the agent reliably escalates the bad ones, even at the cost of a few extra false positives that humans can dismiss quickly.

Keep this eval set in version control and re-run it on every prompt change. Without it, every edit is a guess, and you will silently regress the cases you cannot see. The eval is not a one-time gate; it is the regression test that lets you change the agent with confidence.

## Step 6: Add the one dangerous tool, behind approval

Only now register `revoke_grant`. It does not auto-execute. When the agent recommends revocation, the action is written to an approval queue with the full evidence chain attached, and a human clicks to confirm. Make the tool idempotent — revoking an already-revoked grant is a no-op that returns success — so a retry never causes a double action. Once the team trusts the verdicts after weeks of supervised operation, you can promote a narrow, well-bounded subset to auto-execute. Earn automation; never assume it.

## Frequently asked questions

### Which model should the triage agent use?

Sonnet 4.6 is the right default for high-volume triage: strong judgment at a cost that survives SOC alert volumes. Escalate genuinely ambiguous or high-stakes cases to Opus 4.8, and use deterministic rules or Haiku 4.5 to filter obvious noise before the agent runs.

### How do I stop the agent from looping forever?

Cap the number of tool calls per investigation (eight is a reasonable start) and validate output on each completion. If it has not produced a valid verdict by the budget, escalate to a human with whatever evidence it gathered rather than letting it spin.

### Do I really need structured output schemas?

Yes. Strict input and output contracts are what make the agent a reliable pipeline stage instead of a chatbot. Validate both directions and retry on schema violations; this removes most of the perceived flakiness.

### When is it safe to auto-execute containment?

After the agent has run supervised against real traffic long enough to prove its precision, and only for narrow, reversible, idempotent actions on low-blast-radius units. Until then, every state change goes through a human approval queue.

## Bringing agentic AI to your phone lines

This same build pattern — strict contracts, scoped tools, a bounded run loop, evals before trust — is exactly how CallSphere ships reliable **voice and chat** agents that investigate, decide, and act mid-conversation. See the live product at [callsphere.ai](https://callsphere.ai).

---

*Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.*

---

Source: https://callsphere.ai/blog/build-an-ai-triage-agent-for-your-soc-a-walkthrough
