By Sagar Shankaran, Founder of CallSphere
Pydantic AI v1.85 ships with MCP, dependency injection, and 16k+ stars. We break down where strict typing pays for itself and where it gets in the way of shipping.
Key takeaways
TL;DR — Pydantic AI treats every LLM output as a Pydantic model. That sounds boring until you've shipped your fourth agent and realized validation, dependency injection, and structured outputs were the actual hard parts. v1.85.1 (April 22, 2026) makes it production-default for type-first Python teams.
flowchart LR
User --> Triage["Triage / Supervisor"]
Triage -->|tool A| A["Specialist A"]
Triage -->|tool B| B["Specialist B"]
Triage -->|tool C| C["Specialist C"]
A --> Mem[(Shared memory · mem0/Letta)]
B --> Mem
C --> Mem
Mem --> Final["Final response"]Most agent failures in production are not "the model hallucinated a tool name" — those get caught instantly. The failures that hurt are:
amount field is a string instead of a Decimal.Pydantic AI's pitch is that the same library you already use for FastAPI request validation and DB schemas should validate LLM outputs too. Define a pydantic.BaseModel, hand it to the agent, and the framework retries the model on validation failure with a clear error message.
The library reached its 1.x stable API in late 2025 and now sits at v1.85.1 (April 22, 2026) with 16.5k+ GitHub stars. Highlights:
deps_type parameter — type-safe context for tools and unit tests.MCPServerStreamableHTTP, MCPServerSSE, MCPServerStdio.Financial/health/regulated workflows. When the cost of a malformed response is a wrong charge or a wrong dose, Pydantic AI's "validate or retry" loop is the cheapest insurance you can buy.
Legacy Python codebases. If your team already runs Pydantic v2 in FastAPI services, the agent code reads like normal application code. No new mental model.
Evaluation pipelines. Because every output is a typed model, snapshot testing and contract testing are trivial.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Voice / streaming. The validate-or-retry loop adds latency that voice agents can't afford. CallSphere's voice surface uses OpenAI Realtime with light-weight Zod validation in TypeScript instead.
Heavy code-gen agents. If your agent's "output" is hundreds of lines of Python, validating with a Pydantic schema is the wrong tool. Use smolagents or deepagents instead.
Highly experimental loops. When you don't yet know what the output should look like, strict typing fights you. Prototype with raw openai or anthropic SDKs first, then port to Pydantic AI when the schema stabilizes.
For our healthcare and behavioral-health verticals, the non-voice parts of the workflow — intake form parsing, insurance eligibility extraction, prior-auth letter drafting — run on Pydantic AI agents. The validators reject any output where, say, policy_number doesn't match a known carrier's format, and the agent retries with a corrective system message. That's saved us from at least three "we wrote the wrong number to the EHR" incidents during pilot.
Our Sales browser dialer post-call summary pipeline is also Pydantic AI: the model emits a typed CallSummary with required disposition, next_action, mentioned_competitors[], and buying_signals[]. Downstream systems (Salesforce, HubSpot, our internal pipeline) consume the validated object directly.
Pricing: $149 Starter / $499 Growth / $1499 Scale. 14-day trial. 22% affiliate.
pip install "pydantic-ai[mcp]".BaseModel with fields, enums, and validators.agent = Agent("openai:gpt-5", output_type=CallSummary, deps_type=Deps).@agent.tool decorator and RunContext[Deps].agent.override(model=TestModel()) — no API calls needed.agent = Agent(..., toolsets=[MCPServerStreamableHTTP(url=...)]).from pydantic import BaseModel
from pydantic_ai import Agent
class CallSummary(BaseModel):
disposition: Literal["qualified", "not_interested", "callback", "voicemail"]
next_action: str
mentioned_competitors: list[str]
buying_signals: list[str]
follow_up_at: datetime | None = None
agent = Agent("openai:gpt-5", output_type=CallSummary,
system_prompt="Summarize the call into the typed schema.")
result = await agent.run(transcript)
assert isinstance(result.output, CallSummary)
Less talked about: Pydantic AI also lets your own application become an MCP server. You write tools as typed Python functions, expose them over MCPServerStreamableHTTP or MCPServerStdio, and any MCP-aware client (Claude Desktop, Cursor, Cline, your own agent) can mount them.
For CallSphere this means we can ship a "CallSphere MCP server" to customers' internal agents. The server exposes typed tools like get_call_summary(call_id), book_callback(at, contact_id), update_disposition(call_id, value) — same Pydantic models we use internally, automatically reflected as MCP tool schemas. Customers point their Claude Desktop or their internal agents at the server and our entire surface becomes available with strict typing.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
from pydantic_ai.mcp import MCPServerStreamableHTTP
server = MCPServerStreamableHTTP("/mcp")
@server.tool
async def get_call_summary(call_id: str) -> CallSummary:
"""Return the validated summary for a CallSphere call."""
return await db.fetch_call_summary(call_id)
The big benefit: every MCP client gets the same JSON Schema we use server-side, with no hand-written schema duplication. That alone has saved us hours per integration.
Pydantic AI models can also be used within MCP servers, allowing agents to use sampling via MCPSamplingModel to make LLM calls through the MCP client. This is the inverse pattern: the MCP server delegates LLM calls to whatever model the client chose. Useful when you want to ship tools but stay model-agnostic — your tool's logic uses the user's API key, not yours.
Does it work with Anthropic prompt caching? Yes — the Anthropic provider sets cache breakpoints automatically around stable system prompts and tool definitions.
How does the dependency injection compare to FastAPI's? Same mental model. deps flows through the run, RunContext[Deps] is your handle inside tools and validators.
Can I stream typed outputs? Yes via agent.run_stream — partial validation runs as fields arrive.
How do I demo this on CallSphere? Pick the 14-day trial, choose the Sales product, and the post-call summary pipeline is the demo.
Does it support graph workflows like LangGraph? No — Pydantic AI is single-agent first. For graph topology, mix Pydantic AI agents inside LangGraph nodes. We do this in our healthcare deployment.
What's the testing story? First-class. Use agent.override(model=TestModel()) to swap a deterministic test model in unit tests, no API calls.
Is it FastAPI-friendly? Extremely. Pydantic models flow seamlessly between FastAPI endpoints and Pydantic AI agents — same dependency-injection mental model.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
How to design a multi-agent system using MCP for tools and A2A for cross-vendor coordination, with a CallSphere voice agent as a participating node.
MCP is agent-to-tool. A2A is agent-to-agent. Here is a clear 2026 decision guide for builders choosing between (and combining) the two protocols.
Google's May 2026 MCP 1.0 + A2A developers guide is the cleanest protocol picker we have seen. The takeaways, in plain English, with a CallSphere lens.
A2A unlocks cross-vendor agent coordination, but most enterprise voice/chat workloads still ship faster on a single-vendor stack. Here is how to choose.
The Official MCP Registry hit API freeze v0.1. Smithery has 7,000+ servers, mcp.so has 19,700+, PulseMCP is hand-curated. We compare discovery, install, and security across the major catalogs.
The public MCP registry crossed 9,400 servers in April 2026. Here is a curated walkthrough of the SaaS MCP servers CallSphere mounts in production, with OAuth 2.1 PKCE patterns.
© 2026 CallSphere LLC. All rights reserved.
Watch how CallSphere handles real customer calls, schedules appointments, and processes payments — live.