Ship an App with Claude Code: A Step-by-Step Walkthrough
A concrete week-by-week walkthrough of how a non-technical PM built, tested, and deployed a real app with Claude Code and MCP in six weeks.
Plenty of articles tell you Claude Code is powerful. Far fewer show you the actual sequence of moves a real person made to go from empty folder to deployed app. This is that walkthrough. It reconstructs how a non-technical product manager shipped a working field-service portal in six weeks, with enough specificity that an engineer could reproduce the path or hand it to a less-technical colleague as a runbook.
The throughline is deliberately unglamorous: scaffold a skeleton, get one vertical slice working end to end, then widen. No big-bang generation, no "build me the whole app" prompt. Each step produces something runnable and tested before the next begins. That discipline is what kept six weeks from collapsing into six weeks of debugging.
Week 0: set up the workspace and the rules
Before any feature, she set up the ground. She installed Claude Code, opened it in an empty project directory, and spent the first session not building but configuring. She had Claude initialize a project memory file describing the goal, the intended stack, and the constraints: "Next.js, Postgres, keep it boring and well-tested, never push without my okay." This file becomes the runtime's standing brief, re-read at the start of every future session.
She also turned on permission gates so destructive shell commands and any deploy would pause for her approval. That single setting is what made it safe for a non-engineer to let the agent run. The first real instruction was small and verifiable: "scaffold a Next.js app with a single page that says Hello, and run it so I can see it in the browser." Claude scaffolded, started the dev server, and reported the local URL. She opened it. Green light. Step one done — and crucially, proven done.
Week 1: the first vertical slice
The temptation for non-engineers is to ask for everything. The move that worked was the opposite: pick one user can do one complete thing, and make only that work top to bottom. She chose "a tech logs in and sees a list of jobs." That single slice touches the database, authentication, an API route, and a page — a representative cross-section of the whole app.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
She drove it as a conversation of outcomes, not implementations. "Create a jobs table with these fields." "Add a page that lists jobs from the database." "Now put it behind a login." After each, Claude made the change, wrote or updated a test, ran it, and showed her the result in the browser. When a test failed, the agent read the failure and fixed it inside the same loop. Her job was to look at the screen and say whether it matched what she meaning.
flowchart TD
A["Pick one vertical slice"] --> B["Describe the outcome to Claude"]
B --> C["Claude edits code & writes a test"]
C --> D["Runtime runs the test suite"]
D --> E{"Green?"}
E -->|No| F["Claude reads error & retries"]
F --> D
E -->|Yes| G["PM checks it in the browser"]
G --> H{"Matches intent?"}
H -->|No| B
H -->|Yes| I["Commit & move to next slice"]That loop is the whole method in miniature. Notice it has two distinct gates: an automated one (do the tests pass?) and a human one (does it match intent?). The agent owns the first; the PM owns the second. Keeping those separate is what lets a non-engineer contribute exactly the judgment they're qualified to give without pretending to review code.
Weeks 2–3: widen the app, one slice at a time
With login-and-list working, she repeated the slice pattern across features: creating a job, updating its status, a customer-facing read-only status page, email notifications. Each followed the identical rhythm — describe, edit, test, eyeball, commit. Because the first slice had established the real patterns (how auth works, how the database is accessed, how pages are structured), later slices got faster; Claude reused the conventions already in the codebase rather than inventing new ones each time.
This is the compounding return of going slice-by-slice. The codebase itself becomes a form of context: when she asked for the status-update feature, Claude read the existing job-creation code and matched its style. She reinforced this by occasionally asking, "keep this consistent with how we already do API routes." The repo grew more internally coherent over time instead of more chaotic — the opposite of what usually happens when you generate features in isolation.
Week 4: connect the outside world with MCP
localhost wasn't a product. To get real, she wired in external systems through MCP servers. She connected an MCP server for her managed Postgres provider and one for her hosting platform. Now "run the migration on staging" or "deploy the current branch" became real, typed tool calls the agent could make — returning actual URLs and actual errors, not optimistic prose.
The first deploy failed: a missing environment variable. This is the moment that scares non-engineers, and it's where the agent loop earned its keep. The deploy MCP tool returned the real error; Claude read it, explained in plain language what was missing, added the variable to the config, and redeployed. She approved each step at the permission prompt. Within an hour the app was live on a staging URL she could send to a colleague. The architecture turned a cryptic failure into a guided fix.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Weeks 5–6: harden, test for real, and launch
The final stretch was the unsexy work that separates a demo from a product. She asked Claude to add input validation, handle the empty states, and write tests for the failure paths, not just the happy path. She had it run through the app as a user would and list anything broken, then fix the list. She asked specifically, "what happens if two techs update the same job at once?" — and the agent surfaced and handled a race condition she'd never have known to ask about, prompted only by her describing the real-world scenario.
Launch itself was anticlimactic, which is the goal. Because every change had shipped through tests and a real staging environment, promoting to production was one more permission-gated MCP call. Six weeks after the empty folder, field techs were logging jobs and customers were watching status update live. The walkthrough's real lesson isn't any single command; it's the cadence — small verifiable steps, automated tests plus human intent checks, and real environments early.
Frequently asked questions
What's the single most important first step?
Set up the project memory file and permission gates before building anything. The memory file gives every future session a consistent brief, and the gates make it safe to let the agent act. Skipping this and jumping to features is the most common reason non-technical builds spiral out of control.
How small should each step be?
Small enough that you can verify it in the browser or in a test result in under a minute. "Add a login page" is a good step; "build the whole auth system, dashboard, and billing" is not. Small steps keep the human-intent gate meaningful and keep failures isolated and easy to fix.
When should you bring in MCP servers?
Once your vertical slice runs locally and you need real persistence or a real deployment. Wiring MCP early, before you have anything working, adds moving parts with nothing to connect them to. Get one slice green on localhost first, then connect the database and deploy targets through MCP.
Bringing this build cadence to voice and chat
CallSphere uses the same slice-by-slice agentic approach for phone and chat assistants that take real calls, call tools mid-conversation, and book work around the clock. Watch one handle a live call at callsphere.ai.
Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.