---
title: "Build a Threat-Detection Agent with Claude Code: A Walkthrough"
description: "Step-by-step walkthrough to build a threat-detection triage agent on Claude Code, from ingestion to a working loop."
canonical: https://callsphere.ai/blog/build-a-threat-detection-agent-with-claude-code-a-walkthrough
category: "Agentic AI"
tags: ["agentic ai", "claude", "claude code", "threat detection", "implementation", "tutorial", "security"]
author: "CallSphere Team"
published: 2026-05-12T08:23:11.000Z
updated: 2026-06-06T21:47:42.587Z
---

# Build a Threat-Detection Agent with Claude Code: A Walkthrough

> Step-by-step walkthrough to build a threat-detection triage agent on Claude Code, from ingestion to a working loop.

Reading about agent architecture is one thing; getting a cursor blinking and an agent actually triaging your first alert is another. This walkthrough is for the engineer who wants to build, not theorize. We'll stand up a working threat-detection triage agent on Claude Code in concrete steps — define the alert it consumes, give it the tools it needs, write the system prompt that drives it, and close the loop so it produces an auditable verdict. I'll assume you've used Claude Code before and have a SIEM or log store you can query.

The trick to a successful build is to resist the urge to make the agent omniscient on day one. Start with one alert type, two tools, and one decision. Get that loop trustworthy, then widen it.

## Step 1: Pick one alert and define its seed

Choose a single, well-understood detection to start with — say, impossible-travel logins. Don't feed the agent the whole log stream. Instead, define a compact *seed object* that your existing pipeline emits when this detection fires: the user identity, the two login locations, the timestamps, and a case ID. Everything else the agent will fetch itself. Keeping the seed small is deliberate; it forces the agent to gather context through tools, which is exactly the behavior you can audit later.

Write the seed as a strict JSON shape and version it. When you later add new alert types, each gets its own seed schema, and your orchestrator routes on the schema name. This is the cheapest decision you'll make and the one you'll thank yourself for in six months.

## Step 2: Stand up two MCP tools

An investigation needs to ask questions. Give the agent exactly two MCP-exposed tools to begin: a `query_logins` tool that returns a user's recent authentication history, and a `lookup_ip` tool that returns geolocation and any threat-intel reputation for an address. Define each tool's input and output schema tightly. The output of `query_logins` should be a structured list of login records, not a prose summary — the model reasons far better over clean structured data than over a paragraph your tool generated.

Resist adding a containment tool yet. At this stage the agent only reads and reasons. You want to trust its verdicts before you hand it any capability that changes the world.

```mermaid
flowchart TD
  A["Detection fires: impossible travel"] --> B["Emit seed: user, IPs, timestamps, case_id"]
  B --> C["Claude Code agent receives seed"]
  C --> D["Call query_logins for user"]
  D --> E["Call lookup_ip for each address"]
  E --> F{"Evidence consistent with travel?"}
  F -->|Yes, benign| G["Verdict: low risk, close case"]
  F -->|No, anomalous| H["Verdict: high risk + evidence chain"]
  H --> I["Write case record & notify analyst"]
```

## Step 3: Write the system prompt as a job description

The system prompt is where most builds go wrong. Don't write a vague "you are a security expert." Write a job description with a procedure. Tell the agent: its role is to investigate one alert at a time; it must call `query_logins` before forming any opinion; it must classify the verdict as one of `benign`, `suspicious`, or `malicious`; and it must justify the verdict by citing specific records it retrieved. Then give it the decision criteria explicitly — for impossible travel, the questions are whether the travel is physically possible, whether the second location appears in the user's history, and whether either IP has bad reputation.

Constrain the output format. Ask for a small JSON object: `verdict`, `confidence`, `evidence` (a list of cited facts), and `recommended_action`. Structured output is what lets the next stage — your governance gate and your dashboards — consume the agent's work mechanically instead of parsing prose.

## Step 4: Run the loop and watch the tool calls

Now wire the seed into Claude Code and run it against a handful of real, already-resolved alerts where you know the answer. Watch the transcript. You're checking three things: does it call the tools in a sensible order, does it ask for the right user and IPs, and does its verdict match what the human analyst concluded last week? When it diverges, the fix is almost always in the prompt's decision criteria or in a tool returning ambiguous data, not in the model's intelligence.

This replay-against-known-cases step is your unit test. Build a small set of labeled historical alerts and keep rerunning the agent against them every time you change the prompt or a tool. It's the difference between an agent you hope works and one you've actually measured.

## Step 5: Add the verdict record and the human handoff

An investigation that vanishes when the process ends is useless. After the agent returns its structured verdict, persist a case record: the seed, every tool call and its result, and the final verdict. This record is your audit trail and your replay corpus. For any verdict above your review threshold, push a notification to an analyst with the evidence chain attached, so the human starts from the agent's reasoning instead of from scratch. The agent's job is to compress an hour of manual lookups into a one-paragraph case the human can confirm in seconds.

## Step 6: Widen carefully

With one alert type working and measured, widening is mechanical. Add a second seed schema and the tools it needs. Introduce a thin orchestrator that routes a seed to the right investigation procedure. Only once read-only triage is trustworthy across several alert types should you introduce action tools — and even then, route every action through an approval gate keyed on asset sensitivity. The discipline of widening slowly is what keeps the platform debuggable as it grows.

## Frequently asked questions

### How long does a first working agent take?

For one alert type with two read-only tools, a focused engineer can have a measurable triage loop running in a few days. Most of the time goes into clean tool schemas and a labeled set of historical alerts to test against, not into the agent wiring itself.

### Should I let the agent take containment actions early?

No. Keep the first iterations read-only. Earn trust with verdicts that match your analysts, then add action tools behind a human approval gate. An agent that can disable accounts before you've measured its accuracy is a liability, not a feature.

### What's the most common implementation mistake?

Returning prose from tools instead of structured data. When `query_logins` hands back a paragraph, the model has to re-parse it and errors creep in. Tight output schemas make the agent's reasoning sharper and your verdicts more consistent.

## Bringing agentic AI to your phone lines

The same build pattern — small seed, tight tools, measured loop — is how CallSphere ships **voice and chat** agents that answer every call, pull context mid-conversation, and book work 24/7. Watch it in action at [callsphere.ai](https://callsphere.ai).

---

*Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.*

---

Source: https://callsphere.ai/blog/build-a-threat-detection-agent-with-claude-code-a-walkthrough
