By Sagar Shankaran, Founder of CallSphere
CAISI announced new agreements with Google DeepMind, Microsoft, and xAI in May 2026. What gets tested, what changes for enterprise AI buyers, what to watch.
Key takeaways
In May 2026, the Center for AI Standards & Innovation (CAISI) announced new agreements with Google DeepMind, Microsoft, and xAI to evaluate frontier models pre-release. This builds on the prior 2024 agreements with OpenAI and Anthropic, which were renegotiated this cycle.
The headline: five of the largest model developers in the US are now under voluntary pre-release evaluation arrangements with a federal standards body. For enterprise buyers, this matters in three ways. We will walk through each.
CAISI's evaluation scope (as publicly described) covers:
The evaluations happen before public release. The arrangement is voluntary, not regulatory; it functions as a pre-release checkpoint that participating labs have agreed to.
Three concrete impacts:
If your AI vendor uses GPT, Claude, Gemini, or Grok (xAI), the underlying model has been through a CAISI evaluation cycle. That is not the same as a binding regulatory certification, but it is a floor of independent capability and safety testing.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Procurement teams asking "has this model been independently evaluated?" can now point to CAISI as the answer.
Up until 2025, "did the model developer evaluate safety?" was answered by the developer's own model card. CAISI evaluations provide an external reference. This is useful for procurement, for legal review, and for sectors with regulatory exposure (healthcare, finance, government contracting).
The CAISI evaluations explicitly cover cybersecurity capabilities. Combined with Anthropic's May 2026 Mythos cybersecurity model (with restricted access), the signal to enterprise buyers is clear: cybersecurity-relevant capability is now subject to pre-release scrutiny by both the lab and the standards body.
CallSphere does not develop frontier models; we run them. So CAISI's direct effect on us is indirect — but real.
It is worth being precise about scope. CAISI is not:
It is a standards-and-evaluation body operating through voluntary lab agreements. The value is in the independence of the evaluation, not regulatory force.
Three things will indicate whether CAISI's role grows in 2026–2027:
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Each of these would meaningfully raise the bar for enterprise AI procurement.
In the same window, the Pentagon announced AI deals with 8 Big Tech companies — notably excluding Anthropic. The juxtaposition is interesting: CAISI is testing the same models the Pentagon is now deploying. The federal posture in 2026 is parallel-track: standards body evaluates, procurement body buys, with overlap but not identity in the lab roster.
For enterprise buyers, the takeaway is that the federal signal on AI is now mixed (evaluation + procurement + policy via the AI Action Plan). The lab roster is wider than any single track.
For enterprise buyers evaluating voice and chat agent platforms in 2026:
CallSphere is HIPAA-friendly, runs on models from labs with active CAISI agreements, and exposes the compliance story openly. See pricing at callsphere.ai/pricing.
Q: Is CAISI a US-only body? A: Yes, CAISI is the US center. The UK has its own AI Safety Institute; the EU has its own evaluation tracks. International coordination is in early stages.
Q: Does CAISI evaluate every release of a model, or just major versions? A: The agreements are described as pre-release evaluations. Cadence has not been publicly specified in detail; major capability-jump releases are the most likely candidates.
Q: Are CAISI evaluations published? A: Summary findings have been published for some prior evaluations. The May 2026 agreements have not yet produced public evaluation reports as of writing.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
Robot text to speech in 2026: how I pick TTS APIs, when robotic voices help, and how CallSphere ships 57+ language voice agents. Hands-on guide.
Modern helpdesk solutions answer the phone in 600ms and resolve tickets without humans. Here is how we built ours and what to buy in 2026.
VoIP numbers in 2026: how a founder running 6 AI voice agents buys numbers, ports them, and routes them to AI. Real costs, real providers.
Salesman AI in 2026: a founder's honest take on where AI sales agents win, where humans still win, and how CallSphere's outbound agent works.
Good messaging apps in 2026 ranked by a founder running 6 AI voice agents. Signal, iMessage, WhatsApp, Telegram, and where AI fits.
Group chat apps in 2026 ranked by a founder running a 14-tool AI platform. Slack, Discord, Teams, Telegram, and where AI voice chat fits.
© 2026 CallSphere LLC. All rights reserved.
Watch how CallSphere handles real customer calls, schedules appointments, and processes payments — live.
Try Live DemoBook a DemoCalculate Your ROI