How a PM Ships an App with Claude Code: The Architecture

Six weeks. That's how long it took a product manager with no formal engineering background to take a vague idea — "a portal where our field techs log jobs and customers see status" — and turn it into a deployed web app real people used. No bootcamp, no co-founder who codes. The whole thing ran through Claude Code, and the part that surprised her most wasn't that the model wrote code. It was that the system around the model behaved like a disciplined senior engineer who never got tired. If you want to understand how that's possible, you have to look past the chat box and into the architecture.

This post walks the full stack end to end: the agent loop at the center, the tools that give it hands, the Model Context Protocol servers that connect it to the outside world, and the context engineering that keeps it grounded. Understanding how these pieces fit is what separates a PM who ships from one who generates a pile of code that never runs.

What is actually running when you type a prompt

When our PM typed "add a login screen with email and password," she imagined Claude reading her words and emitting a file. What really happened was a loop. Claude Code is an agentic runtime: it wraps a Claude model in a perceive-decide-act cycle, where the model can read files, run commands, and inspect results, then decide its next move — repeating until the task is done. The model is the brain; the runtime is the body and the spinal reflexes.

Concretely, each turn the runtime assembles a context window — the system prompt, the conversation so far, the contents of relevant files, and the list of available tools — and sends it to a Claude model like Opus 4.8 or Sonnet 4.6. The model responds either with text for the human or with a structured tool call: "read this file," "run this test," "edit these lines." The runtime executes the tool, captures the real output, appends it to the context, and loops again. This is the heartbeat. Nothing about it requires the PM to know what a closure is; it requires her to describe outcomes and react to what comes back.

The critical architectural insight is that the model never edits your repo directly. It requests actions, and the runtime mediates every one. That mediation layer is where safety, permissions, and reversibility live — which is exactly what let a non-engineer move fast without nuking the project.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

The four planes the system is built from

It helps to think of the architecture as four planes stacked on top of each other. The reasoning plane is the Claude model. The action plane is the tool layer — file edits, shell, search. The integration plane is MCP, which reaches into databases, issue trackers, and deployment platforms. The grounding plane is context: what the runtime chooses to show the model on each turn. A request flows down through these planes and the results flow back up.

flowchart TD
  A["PM types an outcome"] --> B["Agent loop assembles context"]
  B --> C["Claude model decides next action"]
  C --> D{"Action type?"}
  D -->|Edit/Run| E["Built-in tools touch the repo"]
  D -->|External| F["MCP server calls DB or deploy API"]
  E --> G["Real output captured"]
  F --> G
  G --> H{"Goal met & tests green?"}
  H -->|No| B
  H -->|Yes| I["Change committed & shown to PM"]

What makes this diagram more than a box-drawing exercise is the feedback edge from H back to B. The agent doesn't fire one shot and hope. It runs the test suite, reads the failure, and re-enters the loop with that failure now in context. For a non-technical builder this is enormous: the system self-corrects against reality instead of against the PM's untrained intuition about whether code is right.

How the repo becomes the source of truth

One reason the six-week build held together is that the architecture treats the filesystem as durable memory. The conversation in the chat is ephemeral and gets compacted, but the repository — the actual code, the tests, the config, and a project memory file the PM and Claude maintained together — persists across sessions. Each morning she'd open Claude Code fresh, and the runtime would re-ground itself by reading that memory file and the current state of the code.

This matters because context windows, even at a million tokens, are finite and the model has no memory of yesterday beyond what's written down. The PM learned to ask Claude to record decisions: "write down in the project notes that we chose Postgres and why." Those notes became load-bearing architecture. The next session's agent loop read them and stayed consistent. The repo, not the chat, is the system of record — a pattern any engineer would recognize, surfaced to someone who'd never thought about it.

Where MCP servers fit into the picture

Pure code generation only gets an app to localhost. To become a real product it needed a database, authentication, and a way to deploy. That's the integration plane, and it runs on the Model Context Protocol. The Model Context Protocol is an open standard, introduced in late 2024, that lets Claude connect to external tools and data through MCP servers exposing typed actions and resources. Each server advertises a schema of what it can do; the runtime makes those actions available to the model as tools.

In practice the PM connected an MCP server for her hosting platform and one for her database. When she said "deploy this to staging," Claude didn't hallucinate a deploy — it called the platform's MCP tool with structured arguments, got back a real URL or a real error, and reported it. The architecture cleanly separates knowing what to do (the model) from being able to do it safely (the MCP server with its own auth and validation). Skills sit alongside MCP here: where MCP gives Claude the capability, a Skill gives it the procedure — the team's specific way of using that capability.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

The guardrails that let a non-engineer move fast

None of this would be responsible without guardrails, and they're architectural, not optional. Permission prompts gate dangerous actions: the runtime can be configured so destructive shell commands or pushes require a human yes. Every change lands as a discrete, reviewable edit, and because the repo is under version control, any bad turn is one revert away. The PM didn't read every line of code, but she read every diff summary and every test result, and the architecture made those the natural checkpoints.

There's also a quieter guardrail: the loop's reliance on real feedback. Because the agent runs tests and reads actual errors rather than asserting success, whole classes of confident-but-wrong output get caught inside the loop before the PM ever sees them. The architecture is biased toward truth from the environment over fluency from the model. That bias is precisely what makes it trustworthy in non-expert hands.

Frequently asked questions

Do you need to read the code Claude Code writes?

You don't need to read every line, but you do need to engage with the checkpoints the architecture surfaces: diffs, test results, and the project memory file. The PM treated those like a product spec review — judging outcomes and behavior rather than syntax. The agent loop's test-and-correct cycle handles correctness; the human handles intent.

What's the difference between a tool and an MCP server here?

Built-in tools are the local hands of the runtime — read, edit, run, search — operating on your project directly. MCP servers are pluggable connectors to external systems like databases, deployment platforms, or trackers, each exposing typed actions. Both appear to the model as callable tools; MCP is how you extend the reach of the action plane without modifying the runtime.

Can the architecture really run for weeks without drifting?

It can, but only because state lives in durable places: the repository and a written project memory, not the chat history. Each session re-grounds from those artifacts. Drift happens when teams keep important decisions only in the conversation, which gets compacted away. Write decisions down and the architecture stays coherent across many sessions.

Bringing the same architecture to your phone lines

CallSphere runs this exact agentic blueprint — a model in a tool-using loop, MCP connectors, and context grounding — but for voice and chat: assistants that answer every call, pull live data mid-conversation, and book the job. See it working at callsphere.ai.

Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.

How a PM Ships an App with Claude Code: The Architecture

What is actually running when you type a prompt

The four planes the system is built from

How the repo becomes the source of truth

Where MCP servers fit into the picture

The guardrails that let a non-engineer move fast

Frequently asked questions

Do you need to read the code Claude Code writes?

What's the difference between a tool and an MCP server here?

Can the architecture really run for weeks without drifting?

Bringing the same architecture to your phone lines

Try CallSphere AI Voice Agents

Related Articles You May Like

Where Claude Code GTM engineering is heading next

Where Claude Cowork is heading and how to prepare

Measuring Claude Cowork success: metrics that prove it

How to measure success of Claude Code GTM workflows

Claude Cowork walkthrough: from problem to shipped

End-to-end Claude Code GTM workflow: a real rebuild