Skip to content
Agentic AI
Agentic AI7 min read0 views

How an LLM Source-Code Security Scanner Works

Inside the architecture of a Claude-powered code security agent: ingestion, retrieval, reasoning, and verification that turns hunches into proven findings.

Static analyzers have been finding security bugs for two decades, yet the average engineering team still drowns in noise. Pattern-based scanners flag every eval() and every string concatenation near a SQL call, and most of those flags are false positives. The interesting shift in 2026 is that you can put a model like Claude Opus 4.8 in the loop to reason about whether a flagged path is actually reachable, actually attacker-controlled, and actually dangerous. But reasoning alone is not a product. The hard part is the architecture around the model — how code gets in, how the model gets the right slice of context, what tools it can call, and how a raw hunch becomes a verified finding. This post walks the whole pipeline end to end.

Why a single prompt is the wrong unit of design

The naive design is to paste a file into a prompt and ask "is this secure?" That fails for three structural reasons. First, security bugs are rarely local; an injection sink in one file is only exploitable because of a source in another file three call-frames away, and the model cannot see what is not in its context. Second, a one-shot answer has no way to confirm itself — the model will confidently describe a vulnerability that a five-second test would disprove. Third, a real repository is far larger than any context window you want to pay for, even at a million tokens, so you must select what the model sees.

The right unit of design is therefore not a prompt but an agent loop: a controller that gives the model a goal, a set of tools, and a budget, then lets it iterate — fetch a function, follow a call edge, read a config file, run a query — until it has enough evidence to commit to a finding or drop it. The model supplies judgment; the surrounding system supplies memory, retrieval, and verification.

The four layers of the pipeline

It helps to think in four layers. The ingestion layer turns a repository into searchable artifacts: a parsed syntax tree per file, a symbol index, and ideally a call graph and a coarse dataflow graph built by an existing engine like a language server or Semgrep. The retrieval layer answers questions the agent asks — "where is this symbol defined," "who calls this function," "show me the route handler for this path." The reasoning layer is Claude inside an agent loop, deciding what to look at next and forming hypotheses. The verification layer takes a hypothesis and tries to break or confirm it before anything is shown to a human.

flowchart TD
  A["Repo snapshot"] --> B["Ingestion: AST + symbol & call index"]
  B --> C["Candidate sinks: taint engine pre-pass"]
  C --> D{"Claude agent loop: reachable & tainted?"}
  D -->|Need more code| E["Retrieval tool: fetch callers / defs"]
  E --> D
  D -->|Hypothesis| F["Verification: write & run a PoC or test"]
  F -->|Confirmed| G["Finding with evidence & severity"]
  F -->|Refuted| H["Drop as false positive"]

The pre-pass in layer two matters more than it looks. You do not want Claude reading every file at random; you want a cheap deterministic engine to nominate candidate sinks — places where untrusted data could reach a dangerous operation — and then spend the expensive reasoning budget only on confirming or rejecting those candidates. This keeps token cost bounded and gives the agent a concrete starting point instead of an open-ended hunt.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

How context gets assembled for each candidate

For every candidate, the controller builds a focused context package rather than dumping the file. A good package contains the sink function in full, the chain of callers up to the nearest trust boundary (an HTTP handler, a queue consumer, a CLI entry point), the definitions of any helper that touches the tainted value, and the relevant framework configuration — for example whether an ORM auto-parameterizes or whether a template engine auto-escapes. The model then reasons over a tight, complete slice instead of a noisy ocean.

The agent fills gaps by calling retrieval tools mid-reasoning. If Claude sees a value passed into render(template, data) and cannot tell whether render escapes, it calls a tool to fetch that definition. This is exactly the pattern the Model Context Protocol formalizes. Model Context Protocol is an open standard that lets a model call external tools and data sources through a uniform interface, so the same agent can talk to a code-search server, a git-blame server, and a sandboxed test runner without bespoke glue for each.

Turning a hunch into a verified finding

Verification is the layer that separates a toy from something you would put in CI. Once the model commits to a hypothesis — "the id query parameter flows unsanitized into a raw SQL string in getOrder" — the controller asks it to produce a concrete proof: a minimal request, a unit test, or a script that demonstrates the issue in a sandbox. If the proof runs and confirms the behavior, the finding ships with that evidence attached. If the proof fails, the finding is dropped or downgraded. This loop is what crushes false positives, because a model that has to show the exploit cannot get away with a plausible-sounding but wrong claim.

Severity and exploitability are computed here too, not guessed. The verification step knows whether the path was actually reachable from an external entry point, whether authentication gated it, and whether the payload survived any encoding. Those facts drive the severity score, which means two structurally identical bugs can land at different priorities because one sits behind an admin-only route and the other is public.

Where the architecture earns its keep

The payoff of all this structure is signal. A pattern scanner might emit four hundred raw hits on a mid-sized service; the candidate pre-pass narrows that to maybe sixty plausible sinks; the agent loop confirms a handful with running proofs and explains each one in plain language with the exact tainted path. Engineers triage findings that come with a reproduction far faster than they triage a regex match. The architecture also degrades gracefully — if the test runner is unavailable, findings still ship but flagged as unverified, so a sandbox outage never silently hides bugs.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

The other quiet benefit is auditability. Because every finding carries the retrieval calls, the reasoning trace, and the proof, a security reviewer can replay exactly how the agent reached its conclusion. That trace is also your debugging surface: when the agent misses a real bug, you can see whether ingestion failed to index the file, retrieval returned the wrong definition, or reasoning made a bad call — and fix the right layer.

Frequently asked questions

Does the LLM replace static analysis tools?

No. The most reliable designs keep a deterministic taint or pattern engine as a cheap candidate generator and use Claude as a reasoning and verification layer on top. The engine guarantees broad recall at low cost; the model supplies precision by confirming reachability and exploitability. Replacing the engine entirely with a model is both slower and less complete.

How is context-window cost kept under control?

By never feeding the whole repository. A deterministic pre-pass selects candidate sinks, and the agent assembles a focused slice — the sink, its caller chain to a trust boundary, and relevant config — pulling additional code only through retrieval tools when reasoning demands it. Even with a large context window, disciplined retrieval keeps per-finding cost predictable.

What stops the agent from hallucinating a vulnerability?

The verification layer. A hypothesis is not a finding until the agent produces a runnable proof — a request, test, or script — that demonstrates the issue in a sandbox. Hypotheses that fail their proof are dropped, so confident-but-wrong claims are filtered out before any human sees them.

Bringing agentic AI to your phone lines

The same loop-and-verify architecture that hardens source code powers CallSphere's voice and chat agents — assistants that reason mid-conversation, call tools to check real data, and confirm before they act, so they answer every call and message and book work 24/7. See it live at callsphere.ai.


Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.