Skip to content
Agentic AI
Agentic AI7 min read0 views

Code-Level Patterns for LLM Source-Code Security Agents

Reusable prompt, tool, and context patterns for Claude security agents: candidate-and-confirm, evidence-gated findings, focused slices, deterministic severity.

Once you have built one LLM security reviewer, you start to notice that the same handful of patterns keep deciding whether it works. The agents that ship findings developers trust are not the ones with the cleverest prompt; they are the ones whose structure forces good behavior. This post collects the reusable patterns — at the level of prompts, tools, and context shaping — that I reach for every time I build a Claude agent to secure source code. None of them are exotic. The skill is in applying them consistently.

The single most important pattern is to split the work into a cheap generator and an expensive judge. A deterministic pass — a taint engine, a Semgrep ruleset, even a set of grep patterns — nominates candidate sinks with high recall and low precision. Claude then confirms or rejects each candidate with full reasoning. This beats asking the model to "find all vulnerabilities" in two ways: it bounds cost, because reasoning runs only on candidates, and it bounds recall variance, because the deterministic pass does not forget to look in a file the way an open-ended agent might.

In code terms, your loop iterates over candidates, and each iteration is a fresh, focused agent task: "Here is a candidate sink and its location. Determine whether untrusted input can reach it and whether that is exploitable." Keeping each task small and independent also makes the workload trivially parallel — you can fan out candidates across subagents and reassemble the confirmed findings.

Pattern 2: evidence-gated reporting

The second pattern is a hard rule in the output contract: a finding is not allowed to exist without evidence. Concretely, your finding schema makes the evidence field required, and the only ways to fill it are a passing reproduction from the sandbox or a precisely specified exploit input with the exact tainted path. The model is told, in the system prompt and reinforced by examples, that an unproven suspicion must be emitted as a low-confidence note, never as a finding.

flowchart TD
  A["Candidate sink"] --> B["Assemble focused slice"]
  B --> C{"Reachable from untrusted input?"}
  C -->|No| D["Reject"]
  C -->|Yes| E["Draft exploit hypothesis"]
  E --> F{"Evidence: PoC passes?"}
  F -->|Yes| G["Emit finding + proof"]
  F -->|No| H["Emit low-confidence note"]

This pattern is what makes the output usable at scale. When every finding carries a reproduction, triage collapses from "is this real?" to "how soon do we fix it?" And because the schema mechanically separates findings from notes, your CI gate can act on findings while a human skims notes, without the two ever being confused.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

Pattern 3: structure the prompt as method, contract, and calibration

A reusable security-agent prompt has three parts and they do different jobs. The method section describes how to think — trace untrusted data from a trust boundary to a sink, confirm reachability, attempt reproduction. The contract section is the rigid output schema: exact field names, severity scale, what counts as evidence. The calibration section is two or three contrasting examples — a true positive and a look-alike false positive — that teach the boundary. Keeping these separated means you can tune one without disturbing the others; you will iterate on calibration examples far more often than on the contract.

Resist the urge to grow the method section into a vulnerability encyclopedia. The model already knows what SQL injection and SSRF are. What it needs from you is the procedure for confirming them in this codebase and the discipline of not reporting without proof. A short, sharp method section outperforms a long checklist.

Pattern 4: package context, do not stream files

How you shape context determines how well the model reasons. The pattern that works is the focused slice: for each candidate, assemble exactly the code needed to decide it — the sink function, the caller chain up to the nearest trust boundary, the definitions of helpers that touch the value, and the framework configuration that governs escaping or parameterization. Label each piece ("SINK", "CALLER PATH", "CONFIG") so the model knows what role it plays. A labeled, complete slice produces far sharper reasoning than a larger but unstructured dump.

The corollary is to make missing context retrievable rather than guessed. If a helper definition is absent, the model should call read_definition for it, not assume its behavior. Encode this in the prompt: "If you cannot see whether a function sanitizes its input, fetch its definition before concluding." This one instruction removes a whole class of hallucinated findings that come from the model inventing the behavior of code it never saw.

Pattern 5: deterministic severity, model-supplied rationale

Let the model explain a finding, but do not let it freelance the severity number. Compute severity from facts the verification step established: was the path reachable from an external entry point, was it authenticated, did the payload survive encoding, what is the impact class. Feed those facts into a fixed scoring function. The model writes the human-readable rationale; the system assigns the score. This keeps prioritization stable across runs and prevents the model's tone from inflating or deflating urgency.

The same separation applies to deduplication. The model is good at describing a bug but unreliable at remembering whether it already reported the same one; a deterministic fingerprint over (file, sink, tainted-source) collapses duplicates so a recurring pattern lands as one finding, not twenty.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Putting the patterns together

Stacked, these patterns produce an agent that behaves the same way every run: a deterministic pass nominates candidates, each candidate becomes a small focused reasoning task with a labeled context slice, the model confirms reachability and reproduces the exploit, evidence-gated reporting filters out hunches, severity is scored deterministically, and duplicates are fingerprinted away. None of the pieces is fancy on its own. Together they turn an unpredictable model into a dependable component you can wire into a release pipeline.

Frequently asked questions

What is the candidate-and-confirm pattern?

It splits vulnerability detection into a cheap, high-recall generator and an expensive, high-precision judge. A deterministic engine nominates candidate sinks, and Claude reasons about each one in isolation to confirm or reject it. This bounds cost and prevents the recall gaps you get when an open-ended agent simply forgets to inspect part of the codebase.

Why force evidence into the finding schema?

Making evidence a required field mechanically prevents the model from reporting unproven suspicions as findings. The only valid evidence is a passing reproduction or a precisely specified exploit, so triagers never have to ask "is this real?" — that question was already answered before the finding was emitted.

Should the model assign severity scores?

No. Let the model write the rationale, but compute severity deterministically from established facts — reachability, authentication, encoding survival, impact class. Deterministic scoring keeps prioritization consistent across runs and stops the model's wording from inflating or deflating how urgent a bug appears.

Bringing agentic AI to your phone lines

These same patterns — candidate-and-confirm, evidence before action, labeled context — shape how CallSphere's voice and chat agents handle real conversations: they gather facts with tools, confirm before they commit, and stay consistent call after call, 24/7. See it live at callsphere.ai.


Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.