---
title: "MCP Architecture Explained: Hosts, Clients & Servers"
description: "How Model Context Protocol works internally — hosts, clients, servers, transports, and the JSON-RPC handshake that connects Claude to external tools."
canonical: https://callsphere.ai/blog/mcp-architecture-explained-hosts-clients-servers
category: "Agentic AI"
tags: ["agentic ai", "claude", "model context protocol", "mcp", "architecture", "anthropic", "ai engineering"]
author: "CallSphere Team"
published: 2026-02-12T08:00:00.000Z
updated: 2026-06-06T21:47:44.524Z
---

# MCP Architecture Explained: Hosts, Clients & Servers

> How Model Context Protocol works internally — hosts, clients, servers, transports, and the JSON-RPC handshake that connects Claude to external tools.

The first time you wire Claude into a real toolset, the abstractions feel deceptively simple: point the agent at a server, expose some tools, watch it work. Then a tool call hangs, a server silently drops a session, and you realize you have no mental model of what is actually moving across the wire. Understanding the Model Context Protocol architecture — not just its API surface but its moving parts — is the difference between an agent you can debug and one that fails in ways you can only guess at.

This piece walks the full stack of MCP: the host process, the client it spawns, the servers it talks to, the transports that carry messages, and the JSON-RPC dialogue that ties them together. By the end you should be able to picture every hop a tool call takes from Claude's reasoning to an external system and back.

## What Model Context Protocol actually is

Model Context Protocol is an open standard, introduced by Anthropic in November 2024, that defines how an AI application connects a model like Claude to external tools, data sources, and prompts through a uniform client-server interface. Instead of every integration being a bespoke glue script, MCP gives you one protocol that any compliant server can speak and any compliant host can consume. The payoff is composability: a server someone wrote for Postgres works in Claude Code, Claude Cowork, or your own SDK app without modification.

The mental shortcut people reach for is "MCP is USB-C for AI tools," and it is a fair one. The protocol does not care what your tool does; it standardizes how the tool advertises itself, how it is invoked, and how results come back. The model never talks to your database directly. It talks to a host, the host talks to a client, and the client talks to a server that wraps your database. Every layer has a job, and keeping them distinct is what makes the architecture debuggable.

## The three roles: host, client, server

MCP defines three architectural roles, and conflating them is the most common source of confusion. The **host** is the application the user actually runs — Claude Code in your terminal, Claude Cowork, or a process you built on the Agent SDK. The host owns the model loop, the conversation, and the user's trust boundary. The **client** is a connector the host instantiates, one per server, that manages a single stateful session. The **server** is the external program that exposes capabilities: tools (functions the model can call), resources (readable data the model can pull in), and prompts (reusable templates).

The key invariant is the one-to-one pairing between client and server. If your host connects to three servers — a filesystem server, a GitHub server, and an internal CRM server — the host spins up three clients, each holding its own session, its own capability list, and its own lifecycle. This isolation matters: a misbehaving CRM server cannot corrupt the GitHub session, and the host can tear down one connection without disturbing the others.

```mermaid
flowchart TD
  U["User prompt to Claude"] --> H["Host (Claude Code / SDK app)"]
  H --> CL["MCP Client (one per server)"]
  CL -->|JSON-RPC over transport| S["MCP Server"]
  S --> EXT["External system (DB / API / files)"]
  EXT --> S
  S -->|structured result| CL
  CL --> H
  H --> M["Claude composes answer"]
```

## How a session begins: the initialize handshake

Nothing useful happens until the client and server agree on terms. When the host launches a client, the client opens the transport and sends an `initialize` request. This is a negotiation, not a hello. The client announces the protocol version it speaks and the capabilities it supports; the server replies with its own protocol version and the feature set it offers — whether it exposes tools, whether it can stream resource updates, whether it supports sampling. If versions are incompatible, the connection fails fast and loudly, which is exactly what you want.

After the server responds, the client sends an `initialized` notification to confirm the session is live. Only now does the host ask the server what it can do. A `tools/list` request returns each tool's name, human-readable description, and a JSON Schema for its input. The host hands those schemas to Claude as available functions. This is why the model can call a tool it has never seen before: the schema arrived at runtime, not at training time. The handshake turns a generic model into one that knows your specific CRM has a `create_ticket` tool taking a `priority` enum.

## Transports: stdio versus streamable HTTP

The protocol is transport-agnostic, but in practice you will use one of two. **stdio** runs the server as a child process of the host and exchanges newline-delimited JSON-RPC messages over standard input and output. It is the simplest possible wiring — no network, no ports, no auth — and it is the default for local servers like a filesystem or git wrapper. Because the server is a subprocess, its lifecycle is bound to the host: when the host exits, the server dies with it.

The other option is **streamable HTTP**, used for remote servers that live behind a URL. Here the client makes HTTP requests and the server can push messages back over a server-sent-events stream, which is what enables long-running operations and server-initiated notifications across a network boundary. Streamable HTTP is where authentication enters the picture — OAuth tokens, API keys, session headers — because now you are crossing a trust boundary the stdio model never had to consider. Choosing the wrong transport is a frequent early mistake: people reach for HTTP for a purely local tool and inherit auth complexity they did not need.

## What flows during a tool call

Trace a single call end to end. Claude, mid-reasoning, decides it needs data and emits a tool-use request naming a tool and arguments that conform to the schema it received. The host intercepts this, routes it to the correct client by matching the tool to its originating server, and the client sends a `tools/call` JSON-RPC request. The server executes — querying a database, hitting an API — and returns a structured result, typically content blocks the model can read. The client passes the result up to the host, the host feeds it back into the conversation as a tool result, and Claude continues reasoning with the new information in context.

Two architectural details deserve emphasis. First, results are structured, not free text: a server can return text, images, or embedded resources, and the model treats them accordingly. Second, the host is a control point. It can require human approval before a call executes, redact arguments, rate-limit, or refuse. Because every external action funnels through the host, you have one place to enforce policy rather than scattering checks across servers. That centralization is a feature of the architecture, not an afterthought.

## Why this separation pays off

The layered design seems like ceremony until you operate it. Because servers are decoupled from the host, the same server runs unchanged across different hosts — write a Stripe MCP server once and it works in Claude Code today and your SDK agent tomorrow. Because clients are isolated per server, you can reason about failures locally. Because capabilities are negotiated at runtime, you upgrade a server's toolset without redeploying the agent. And because the protocol is open, the ecosystem of servers grows independently of any one vendor, which is the entire point of a standard.

The flip side is that you must respect the boundaries. Reaching around the host to call a server directly, or sharing one client across servers to "save resources," breaks the guarantees the architecture gives you. The discipline of the three roles is what keeps a ten-server agent comprehensible.

## Frequently asked questions

### Is MCP specific to Claude?

No. MCP is an open standard, and any model application can implement a host. Anthropic introduced it and ships first-class support across Claude Code, Cowork, and the Agent SDK, but servers are model-agnostic — the same server can serve a Claude host or any other compliant client. That portability is precisely why teams invest in writing servers rather than one-off integrations.

### What is the difference between a tool and a resource?

A tool is a function the model invokes to take an action or fetch dynamic data, and the model decides when to call it. A resource is readable data the host can pull into context, identified by a URI, and it is typically surfaced under application control rather than invoked autonomously. Tools are verbs; resources are nouns.

### Does the model see the raw JSON-RPC?

No. The JSON-RPC layer lives entirely between client and server. Claude sees tool descriptions and schemas as available functions and emits tool-use requests in its own format; the host translates those into protocol messages. The wire format is an implementation detail the model is insulated from.

### How many servers can one host connect to?

There is no hard protocol limit; hosts routinely run many servers at once, each in its own client session. The practical ceiling is context and clarity — every server's tools add descriptions to the model's prompt, so connecting dozens of servers can crowd context and degrade tool selection. Curate the set to what the task needs.

## Bringing agentic AI to your phone lines

CallSphere takes the same host-client-server discipline behind MCP and applies it to **voice and chat** — agents that answer every call, pull live data through tools mid-conversation, and book work around the clock. See the architecture in action at [callsphere.ai](https://callsphere.ai).

---

*Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.*

---

Source: https://callsphere.ai/blog/mcp-architecture-explained-hosts-clients-servers
