By Sagar Shankaran, Founder of CallSphere
AI SDK 5 ships fully typed chat for React, Svelte, Vue, and Angular plus first-class agent loop primitives. Here are the patterns that matter for shipping in 2026.
Key takeaways
TL;DR — Vercel AI SDK 5 (released July 31, 2025) is the TypeScript-first answer to agent loops.
stopWhencontrols when a tool-calling loop ends;prepareStepmutates the next step's settings; the redesigneduseChathook ships modular transports for WebSockets and SSE. The result is the cleanest TS path from prototype to production agent in 2026.
flowchart TD
Client[MCP client · Claude Desktop] --> MCP[MCP server]
MCP --> Tool1[Tool: Calendar]
MCP --> Tool2[Tool: CRM]
MCP --> Tool3[Tool: KB search]
Tool1 --> SaaS1[(Calendly)]
Tool2 --> SaaS2[(Salesforce)]
Tool3 --> SaaS3[(Notion)]The headline numbers from the v5 release:
useChat with three extensibility patterns: flexible transports (WebSockets, SSE, custom), client-only support, and end-to-end typed message envelopes.stopWhen and prepareStep.The classic SDK loop runs generateText (or streamText) with a tools object. The model emits a tool call, the SDK runs the tool, appends the result, runs the model again. Repeat until the model emits text without a tool call — or until you stop it.
stopWhen controls the stop condition. Built-ins:
stepCountIs(20) — stop after 20 steps. Default safety.hasToolCall("submit") — stop when a specific tool is called.isLoopFinished() — never trigger; let the agent run to natural completion.You can compose them: stopWhen: [stepCountIs(20), hasToolCall("submit")] — stop on whichever fires first.
prepareStep is the per-step config callback. It runs before each step and can:
This is your hook for adaptive cost and quality control mid-loop.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
import { generateText, stepCountIs, hasToolCall } from "ai";
const result = await generateText({
model: openai("gpt-5"),
tools: { lookup, schedule, submit },
messages,
stopWhen: [stepCountIs(15), hasToolCall("submit")],
prepareStep: async ({ stepNumber, messages }) => {
if (stepNumber === 0) {
return { model: openai("gpt-5-mini") }; // cheap first pass
}
if (messages.length > 30) {
return { messages: messages.slice(-20) }; // truncate context
}
return {};
},
});
The first step runs on a cheaper model to triage. Later steps escalate to the bigger model. If history grows beyond 30 messages, we truncate. The loop stops at 15 steps or when submit is called. This pattern can cut your agent's per-conversation cost by 40-60% with no quality drop on common inputs.
The new useChat is transport-agnostic. The default is HTTP streaming. You can plug in:
Each transport ships with full typed message envelopes — no any, no string-typing.
CallSphere's public-facing chat widgets are built on AI SDK 5. The useChat hook in our Next.js 15 App Router pages connects to a Node Edge runtime endpoint that runs the agent loop. stopWhen keeps the loop bounded; prepareStep swaps to a cheaper model for the first triage step.
For our post-call summarization workflow (every voice call ends with a structured summary written to Salesforce / HubSpot), we use generateObject from AI SDK 5 with a Zod schema. The structured output is type-safe end-to-end — the same Zod schema validates on the server and types on the client.
For voice we don't use AI SDK directly — OpenAI Realtime + WebRTC is its own beast — but the non-voice surfaces of CallSphere all run on AI SDK 5.
Pricing: $149 Starter / $499 Growth / $1499 Scale. 14-day trial. 22% affiliate.
npm install ai @ai-sdk/openai zod.tool({ description, parameters: z.object({...}), execute: async (...) => ... }).generateText (one-shot) or streamText (streaming) with your tools.stopWhen to bound the loop.prepareStep for per-step control.useChat to render the streaming response.import { generateObject } from "ai";
import { openai } from "@ai-sdk/openai";
import { z } from "zod";
const CallSummary = z.object({
disposition: z.enum(["qualified", "not_interested", "callback", "voicemail"]),
next_action: z.string(),
mentioned_competitors: z.array(z.string()),
buying_signals: z.array(z.string()),
follow_up_at: z.coerce.date().nullable(),
});
const { object } = await generateObject({
model: openai("gpt-5"),
schema: CallSummary,
prompt: `Summarize: ${transcript}`,
});
await crm.upsertCall(object); // fully typed
AI SDK 5 ships per-tool lifecycle hooks: onInputAvailable, onInputStart, onInputDelta. These let you stream UI updates as the model is building the tool call's arguments, before it actually calls the tool.
Practical use: when the model is mid-way through building a "schedule_meeting" tool call, your UI can show "scheduling..." with the partial arguments rendered as placeholders. The user sees progress instead of a blank pause. We use this in the chat widget for any tool that takes more than a couple of seconds.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
const schedule = tool({
parameters: scheduleSchema,
execute: scheduleMeeting,
onInputStart: ({ toolCallId }) => ui.show(toolCallId, "Building meeting..."),
onInputDelta: ({ toolCallId, inputTextDelta }) => ui.update(toolCallId, inputTextDelta),
onInputAvailable: ({ toolCallId, input }) => ui.previewMeeting(toolCallId, input),
});
A new v5 capability: tools that run on the model provider's side (OpenAI's web search, code interpreter, file search) without round-tripping to your server. webSearch() from @ai-sdk/openai is the most popular — your agent gets web search results without you running a search backend.
For CallSphere this is mostly relevant in our research agents. We let OpenAI's web search hold the search loop while we focus on the synthesis prompt; saves us from operating Brave or SerpAPI integrations.
generateObject and streamObject support output: 'partial-object' mode: the schema validates as fields stream in, and your UI gets typed partial objects to render incrementally. Good for forms, tables, structured output that benefits from "showing up as it's generated."
const { partialObjectStream } = await streamObject({
model: openai("gpt-5"),
schema: CallSummary,
output: "partial-object",
prompt: `Summarize: ${transcript}`,
});
for await (const partial of partialObjectStream) {
ui.render(partial);
}
The new speech and audio APIs give you a single typed interface to OpenAI, ElevenLabs, and DeepGram for both TTS and STT. Switching providers is a config change, not a refactor. For non-realtime audio (voicemail transcription, post-call TTS summaries) this is the cleanest abstraction we've seen.
AI SDK 5 vs OpenAI Agents SDK (TypeScript)? AI SDK 5 is more ergonomic for chat UI and structured outputs; OpenAI Agents SDK is more opinionated for multi-agent topology. We use both — Agents SDK for orchestration, AI SDK for UI hooks.
Does it support MCP? Yes — experimental_createMCPClient lets you mount MCP servers as tool sources.
What models work? Anything in the AI SDK provider ecosystem — OpenAI, Anthropic, Google, Mistral, Bedrock, Azure, Ollama, OpenRouter, and dozens more.
Is v6 out yet? Vercel announced AI SDK 6 development in 2026; v5 is the production stable line as of May 2026.
Where do I see this on CallSphere? Book a demo and we'll walk through the chat widget code.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
GPT-Realtime-2 brings GPT-5-class reasoning into voice. What that means for tool-call reliability, structured output, and production agent design.
The public MCP registry crossed 9,400 servers in April 2026. Here is a curated walkthrough of the SaaS MCP servers CallSphere mounts in production, with OAuth 2.1 PKCE patterns.
Neo4j's agent-memory project ships short-term, long-term, and reasoning memory in one graph. Microsoft Agent Framework and LangChain both wire it in. Here is the production pattern.
Personalizing agents for one user is easy. Personalizing them for a million users is a memory-tier problem. The hot/warm/cold split and what each tier optimizes for.
Long-running agents accumulate noisy state. Five consolidation patterns — summarization, salience scoring, decay, dedup, and refactor — and when each one fits.
Mastra.ai is becoming the go-to TypeScript agent framework in 2026. Workflows, RAG, evals, and an honest comparison with Vercel AI SDK 5 for serious teams.
© 2026 CallSphere LLC. All rights reserved.
Watch how CallSphere handles real customer calls, schedules appointments, and processes payments — live.
Try Live DemoBook a DemoCalculate Your ROI