Build a Threat-Detection Agent with Claude Code: A Walkthrough
Step-by-step walkthrough to build a threat-detection triage agent on Claude Code, from ingestion to a working loop.
Reading about agent architecture is one thing; getting a cursor blinking and an agent actually triaging your first alert is another. This walkthrough is for the engineer who wants to build, not theorize. We'll stand up a working threat-detection triage agent on Claude Code in concrete steps — define the alert it consumes, give it the tools it needs, write the system prompt that drives it, and close the loop so it produces an auditable verdict. I'll assume you've used Claude Code before and have a SIEM or log store you can query.
The trick to a successful build is to resist the urge to make the agent omniscient on day one. Start with one alert type, two tools, and one decision. Get that loop trustworthy, then widen it.
Step 1: Pick one alert and define its seed
Choose a single, well-understood detection to start with — say, impossible-travel logins. Don't feed the agent the whole log stream. Instead, define a compact seed object that your existing pipeline emits when this detection fires: the user identity, the two login locations, the timestamps, and a case ID. Everything else the agent will fetch itself. Keeping the seed small is deliberate; it forces the agent to gather context through tools, which is exactly the behavior you can audit later.
Write the seed as a strict JSON shape and version it. When you later add new alert types, each gets its own seed schema, and your orchestrator routes on the schema name. This is the cheapest decision you'll make and the one you'll thank yourself for in six months.
Step 2: Stand up two MCP tools
An investigation needs to ask questions. Give the agent exactly two MCP-exposed tools to begin: a query_logins tool that returns a user's recent authentication history, and a lookup_ip tool that returns geolocation and any threat-intel reputation for an address. Define each tool's input and output schema tightly. The output of query_logins should be a structured list of login records, not a prose summary — the model reasons far better over clean structured data than over a paragraph your tool generated.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Resist adding a containment tool yet. At this stage the agent only reads and reasons. You want to trust its verdicts before you hand it any capability that changes the world.
flowchart TD
A["Detection fires: impossible travel"] --> B["Emit seed: user, IPs, timestamps, case_id"]
B --> C["Claude Code agent receives seed"]
C --> D["Call query_logins for user"]
D --> E["Call lookup_ip for each address"]
E --> F{"Evidence consistent with travel?"}
F -->|Yes, benign| G["Verdict: low risk, close case"]
F -->|No, anomalous| H["Verdict: high risk + evidence chain"]
H --> I["Write case record & notify analyst"]Step 3: Write the system prompt as a job description
The system prompt is where most builds go wrong. Don't write a vague "you are a security expert." Write a job description with a procedure. Tell the agent: its role is to investigate one alert at a time; it must call query_logins before forming any opinion; it must classify the verdict as one of benign, suspicious, or malicious; and it must justify the verdict by citing specific records it retrieved. Then give it the decision criteria explicitly — for impossible travel, the questions are whether the travel is physically possible, whether the second location appears in the user's history, and whether either IP has bad reputation.
Constrain the output format. Ask for a small JSON object: verdict, confidence, evidence (a list of cited facts), and recommended_action. Structured output is what lets the next stage — your governance gate and your dashboards — consume the agent's work mechanically instead of parsing prose.
Step 4: Run the loop and watch the tool calls
Now wire the seed into Claude Code and run it against a handful of real, already-resolved alerts where you know the answer. Watch the transcript. You're checking three things: does it call the tools in a sensible order, does it ask for the right user and IPs, and does its verdict match what the human analyst concluded last week? When it diverges, the fix is almost always in the prompt's decision criteria or in a tool returning ambiguous data, not in the model's intelligence.
This replay-against-known-cases step is your unit test. Build a small set of labeled historical alerts and keep rerunning the agent against them every time you change the prompt or a tool. It's the difference between an agent you hope works and one you've actually measured.
Step 5: Add the verdict record and the human handoff
An investigation that vanishes when the process ends is useless. After the agent returns its structured verdict, persist a case record: the seed, every tool call and its result, and the final verdict. This record is your audit trail and your replay corpus. For any verdict above your review threshold, push a notification to an analyst with the evidence chain attached, so the human starts from the agent's reasoning instead of from scratch. The agent's job is to compress an hour of manual lookups into a one-paragraph case the human can confirm in seconds.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Step 6: Widen carefully
With one alert type working and measured, widening is mechanical. Add a second seed schema and the tools it needs. Introduce a thin orchestrator that routes a seed to the right investigation procedure. Only once read-only triage is trustworthy across several alert types should you introduce action tools — and even then, route every action through an approval gate keyed on asset sensitivity. The discipline of widening slowly is what keeps the platform debuggable as it grows.
Frequently asked questions
How long does a first working agent take?
For one alert type with two read-only tools, a focused engineer can have a measurable triage loop running in a few days. Most of the time goes into clean tool schemas and a labeled set of historical alerts to test against, not into the agent wiring itself.
Should I let the agent take containment actions early?
No. Keep the first iterations read-only. Earn trust with verdicts that match your analysts, then add action tools behind a human approval gate. An agent that can disable accounts before you've measured its accuracy is a liability, not a feature.
What's the most common implementation mistake?
Returning prose from tools instead of structured data. When query_logins hands back a paragraph, the model has to re-parse it and errors creep in. Tight output schemas make the agent's reasoning sharper and your verdicts more consistent.
Bringing agentic AI to your phone lines
The same build pattern — small seed, tight tools, measured loop — is how CallSphere ships voice and chat agents that answer every call, pull context mid-conversation, and book work 24/7. Watch it in action at callsphere.ai.
Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.