By Sagar Shankaran, Founder of CallSphere
The November 2025 MCP spec mandates OAuth 2.1 with PKCE for public remote MCP servers. We unpack RFC 9728, S256 challenge, resource indicators, and the patterns to ship safely.
Key takeaways
TL;DR — The MCP November 2025 spec mandates OAuth 2.1 with PKCE for any public remote MCP. PKCE S256, RFC 9728 protected-resource metadata, RFC 8414 authorization-server metadata, and resource indicators are the four primitives you need. Token hygiene is non-negotiable.
Auth-wise, an MCP server is a resource server: it accepts bearer tokens issued by a separate authorization server, validates them, and decides which tools the token can call. The spec moves authorization out of the MCP server itself — your MCP doesn't run a login UI; it points clients at an OAuth 2.1 server.
flowchart LR
A[Client] -->|GET /.well-known/oauth-protected-resource| B[MCP Server]
B -->|RFC 9728 metadata| A
A -->|GET /.well-known/oauth-authorization-server| C[Auth Server]
C -->|RFC 8414 metadata| A
A -->|PKCE S256 + auth code| C
C -->|access token| A
A -->|Bearer token + tools/call| B
B -->|validate| C
Stdio servers don't need auth — process boundary is the auth. Streamable HTTP servers MUST implement OAuth 2.1 + PKCE if they're public. The S256 code-challenge method is required when the client is technically capable. Resource indicators (RFC 8707) bind the token to a specific MCP audience so a token for mcp.stripe.com can't be replayed against mcp.github.com.
Our internal MCPs all sit behind WorkOS for OAuth 2.1. The flow:
mcp.callsphere.ai./.well-known/oauth-protected-resource → discovers WorkOS as the auth server./.well-known/oauth-authorization-server → gets endpoints.code_challenge=...&code_challenge_method=S256.audience=mcp.callsphere.ai.This pattern is the same one WorkOS, Auth0, Clerk, and Cognito support natively. Picking any of them is fine; building it yourself is mostly a bad idea in 2026.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
/.well-known/oauth-protected-resource returning the auth server URL + audience (RFC 9728).POST /oauth/introspect) or verify the JWT signature with the provider's JWKS.Why PKCE if I already have a client secret? PKCE protects the code exchange against interception even when the client is public (mobile, desktop). MCP clients are often desktop apps, so PKCE is non-optional.
Why S256 not plain? plain is no longer allowed when S256 is technically capable. Modern crypto in every runtime — there's no excuse.
Token lifetime? Short — 5–15 minutes for access tokens, longer refresh tokens behind tight rotation.
Resource indicators? Yes — set audience and validate. Without it, tokens are replayable across MCP servers.
Scopes per tool? The 2026 spec push is for fine-grained scopes mapped to specific tools. Implement it now or pay later.
Trial auth-equipped CallSphere agents? Yes — every MCP we expose is OAuth-gated.
MCP Server Auth in 2026: OAuth 2.1, PKCE S256, and the New Authorization Spec is also a cost-per-conversation problem hiding in plain sight. Once you instrument tokens-in, tokens-out, tool calls, ASR seconds, and TTS seconds against booked-revenue per call, the right tradeoff between Realtime API and an async ASR + LLM + TTS pipeline becomes obvious — and it's almost never the same answer for healthcare as it is for salons.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
The big fork is managed (OpenAI Realtime, ElevenLabs Conversational AI) versus self-hosted on GPUs you operate. Managed wins on cold-start, model freshness, and zero-ops; self-hosted wins on unit economics past a certain conversation volume and on data residency for regulated verticals. CallSphere runs hybrid: Realtime for live calls, self-hosted Whisper + a hosted LLM for async, both routed through a Go gateway that enforces per-tenant rate limits.
Latency budgets are non-negotiable on voice. End-to-end target is sub-800ms ASR-to-first-token and sub-1.4s first-audio-out; anything beyond that and turn-taking feels stilted. GPU residency in the same region as your TURN servers matters more than choosing a slightly bigger model.
Observability is the unglamorous backbone — every conversation produces logs, traces, sentiment scoring, and cost attribution piped to a per-tenant dashboard. HIPAA + SOC 2 aligned isolation keeps healthcare traffic separated from salon traffic at the storage layer, not just the API.
How does this apply to a CallSphere pilot specifically? Setup runs 3–5 business days, the trial is 14 days with no credit card, and pricing tiers are $149, $499, and $1,499 — so a vertical-specific pilot is a same-week decision, not a quarterly project. For a topic like "MCP Server Auth in 2026: OAuth 2.1, PKCE S256, and the New Authorization Spec", that means you're not starting from scratch — you're configuring an agent template that's already been hardened across thousands of conversations.
What does the typical first-week implementation look like? Day one is integration mapping (scheduler, CRM, messaging) and prompt tuning against your top 20 real call transcripts. Day two through five is shadow-mode running, where the agent transcribes and recommends but a human still answers, so you can compare side-by-side. Go-live is the moment your eval pass-rate clears your internal bar.
Where does this break down at scale? The honest answer: it scales until your tool catalog gets stale. The agent is only as good as the integrations it can actually call, so the operational discipline is keeping schemas, webhooks, and fallback paths green. The platform handles the rest — observability, retries, multi-region routing — without your team owning the GPU layer.
Want to see how this maps to your stack? Book a live walkthrough at calendly.com/sagar-callsphere/new-meeting, or try the vertical-specific demo at escalation.callsphere.tech. 14-day trial, no credit card, pilot live in 3–5 business days.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
How to design a multi-agent system using MCP for tools and A2A for cross-vendor coordination, with a CallSphere voice agent as a participating node.
MCP is agent-to-tool. A2A is agent-to-agent. Here is a clear 2026 decision guide for builders choosing between (and combining) the two protocols.
Google's May 2026 MCP 1.0 + A2A developers guide is the cleanest protocol picker we have seen. The takeaways, in plain English, with a CallSphere lens.
A2A unlocks cross-vendor agent coordination, but most enterprise voice/chat workloads still ship faster on a single-vendor stack. Here is how to choose.
The Official MCP Registry hit API freeze v0.1. Smithery has 7,000+ servers, mcp.so has 19,700+, PulseMCP is hand-curated. We compare discovery, install, and security across the major catalogs.
The public MCP registry crossed 9,400 servers in April 2026. Here is a curated walkthrough of the SaaS MCP servers CallSphere mounts in production, with OAuth 2.1 PKCE patterns.
© 2026 CallSphere LLC. All rights reserved.