Inside Claude Code: Session and 1M Context Architecture
How Claude Code manages sessions, assembles turns, and feeds a 1M-token context window end to end — the internals engineers need to push it hard.
Open a terminal, type a request, and Claude Code starts reading your files, running commands, and editing code. It feels like one continuous conversation, but underneath there is a careful pipeline moving bytes between your disk, a context window, and the model. If you want to push Claude Code hard — long refactors, multi-hour sessions, a 1M-token context window full of source — it pays to understand the machinery. This post walks the architecture end to end: what a session actually is, how turns are assembled, where the token budget goes, and how the pieces fit together.
What a Claude Code session really is
A Claude Code session is a durable, append-only transcript of everything the agent has seen and done, paired with a live working state for your project. When you start the tool, it loads project configuration, discovers available tools and MCP servers, reads any project memory files, and opens a connection to the model. Each thing you type becomes a user turn appended to the transcript; each model response — text, tool calls, or both — becomes an assistant turn. The session is the source of truth, and the context window is a rendered view of it.
That distinction matters. The full session history can be far larger than what is sent to the model on any given turn. Claude Code keeps the canonical log (so it can resume, replay, and summarize) separate from the prompt it constructs for each request. A session is therefore best understood as a state machine: it accumulates events, and on every turn a builder selects, orders, and trims those events into a single prompt that fits the budget.
Resumption falls out of this design. Because the transcript is persisted, you can close the terminal and pick the same session back up later, or fork it. The agent rehydrates the working state — open files, recent tool results, accumulated decisions — from the log rather than from your memory.
How a single turn is assembled
Every turn, Claude Code builds a prompt from layered components: a system prompt that defines the agent's behavior and tool-use contract, the tool and MCP schemas, project context and memory, then the conversation history, and finally your newest message. The model reads all of it, decides whether it needs a tool, and either answers or emits a tool call. If it calls a tool, Claude Code executes it, appends the result, and loops — the agent keeps going until it produces a final answer or hits a stop condition.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
flowchart TD
A["User message"] --> B["Append to session log"]
B --> C["Context builder selects & orders events"]
C --> D{"Fits token budget?"}
D -->|No| E["Compact: summarize old turns, drop stale tool output"]
E --> C
D -->|Yes| F["Send prompt to Claude"]
F --> G{"Tool call needed?"}
G -->|Yes| H["Run tool / MCP server, append result"] --> C
G -->|No| I["Stream final answer, persist turn"]
The agentic loop in the diagram is the heart of the system. A single user request can spin through many internal iterations — read a file, grep the codebase, run tests, edit, re-run — each adding events to the session before any text comes back to you. This is why Claude Code can complete substantial work from one prompt: the loop, not the single forward pass, is the unit of work.
The context builder sits in the critical path of every iteration. It is the component deciding what the model is allowed to see this turn, and it is where the 1M-token window becomes either a superpower or a liability depending on how it is filled.
The 1M-token context window and the budget
The context window is the maximum number of tokens the model can attend to in a single request. Claude Code's 1M-token window is enormous — comfortably enough to hold a mid-sized service's entire source tree, long tool transcripts, and a detailed system prompt at once. But it is a budget, not free space. Every token you spend on stale file dumps is a token not spent on the reasoning you actually want, and very large prompts cost more and respond more slowly than tight ones.
Claude Code treats the budget as a finite resource and actively manages it. Newer, more relevant material — your latest message, recently touched files, fresh tool results — is favored. Older, lower-value content is candidate for compaction: the agent summarizes earlier stretches of the conversation into compact notes and drops verbose tool output it no longer needs. The session log keeps the originals; the prompt carries the summary.
Prompt caching is the other half of the economics. The stable front of the prompt — system instructions, tool schemas, durable project context — changes little between turns, so it can be cached and reused, cutting cost and latency on the parts that repeat. The practical takeaway: keep the stable prefix actually stable, and let the volatile material live at the end, so the cache stays warm.
Compaction, memory, and staying coherent
Long sessions live or die on compaction. When the assembled prompt approaches the budget, Claude Code compacts: it replaces a span of older turns with a faithful summary that preserves decisions, file paths, and open threads while discarding the raw bytes. Done well, the agent stays coherent across hours of work without ever overflowing the window. Done poorly, it forgets why it made a choice three files ago.
Project memory complements compaction. Persistent instruction files give the agent durable facts — conventions, architecture notes, do-not-touch zones — that survive every compaction because they are re-injected as stable context rather than ephemeral conversation. Treat these files as the long-term memory and the transcript as the short-term memory; the combination is what keeps a marathon session on the rails.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Subagents and the shared context picture
Claude Code can spawn subagents to work in parallel — one explores tests, another reads the data layer — each with its own context window. This is a deliberate context-management move: instead of cramming everything into one window, the orchestrator delegates a focused slice to a subagent, which returns a condensed result. The parent's window stays lean while the work fans out. The trade-off is tokens: parallel agents consume several times more than a single agent, so they are worth it for genuinely separable exploration, not for trivial tasks.
Frequently asked questions
Is the whole session sent to the model every turn?
No. The persisted session log is the canonical record, but each turn Claude Code builds a fresh prompt that selects and orders the most relevant events and may summarize or drop older ones. The prompt is a view of the session shaped to fit the token budget, not the entire history verbatim.
What is the 1M-token context window in Claude Code?
The 1M-token context window is the maximum amount of text — measured in tokens — that Claude can consider in a single request, large enough to hold an entire mid-sized codebase plus the conversation. It is a budget the agent manages, not unlimited free space.
How does Claude Code avoid running out of context on long tasks?
It compacts. As the prompt nears the budget, older turns are summarized into compact notes and verbose tool output is dropped, while project memory files re-inject durable facts. This keeps long sessions coherent without overflowing the window.
Do subagents share the main context window?
No. Each subagent runs with its own context window and returns a condensed result to the orchestrator. This keeps the parent's window lean, at the cost of higher overall token usage when running agents in parallel.
Bringing agentic AI to your phone lines
The same session and context discipline that keeps Claude Code coherent over a long coding run is what CallSphere brings to voice and chat — agents that hold context across a whole conversation, call tools mid-call, and book real work around the clock. See it live at callsphere.ai.
Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.