---
title: "CAISI Adds Google, Microsoft, and xAI: What Pre-Release Testing Covers"
description: "CAISI announced new agreements with Google DeepMind, Microsoft, and xAI in May 2026. What gets tested, what changes for enterprise AI buyers, what to watch."
canonical: https://callsphere.ai/blog/tw26w19-caisi-google-microsoft-xai-ai-model-testing-may-2026
category: "Policy & AI"
tags: ["CAISI", "AI Policy", "Enterprise AI", "Compliance", "CallSphere"]
author: "CallSphere Team"
published: 2026-05-07T00:00:00.000Z
updated: 2026-05-11T04:30:37.909Z
---

# CAISI Adds Google, Microsoft, and xAI: What Pre-Release Testing Covers

> CAISI announced new agreements with Google DeepMind, Microsoft, and xAI in May 2026. What gets tested, what changes for enterprise AI buyers, what to watch.

## What CAISI Just Did

In May 2026, the **Center for AI Standards & Innovation (CAISI)** announced new agreements with **Google DeepMind, Microsoft, and xAI** to evaluate frontier models pre-release. This builds on the prior 2024 agreements with **OpenAI and Anthropic**, which were renegotiated this cycle.

The headline: five of the largest model developers in the US are now under voluntary pre-release evaluation arrangements with a federal standards body. For enterprise buyers, this matters in three ways. We will walk through each.

## What CAISI Tests, In Practice

CAISI's evaluation scope (as publicly described) covers:

- **Capability evaluations** — what can the model do, including dual-use capabilities
- **Safety evaluations** — does the model refuse genuinely harmful requests reliably
- **Security evaluations** — robustness to jailbreaks, prompt injection, and exfiltration
- **Cybersecurity-specific** — particularly relevant given Anthropic's Mythos cybersecurity model and similar capability-heavy models
- **Bio and chem** — uplift risk for CBRN scenarios
- **Critical infrastructure** — model-driven attack scenarios

The evaluations happen **before public release**. The arrangement is voluntary, not regulatory; it functions as a pre-release checkpoint that participating labs have agreed to.

## Why This Matters for Enterprise Buyers

Three concrete impacts:

### 1. Pre-release safety signals are now standardized across labs

If your AI vendor uses GPT, Claude, Gemini, or Grok (xAI), the underlying model has been through a CAISI evaluation cycle. That is not the same as a binding regulatory certification, but it is a **floor** of independent capability and safety testing.

Procurement teams asking "has this model been independently evaluated?" can now point to CAISI as the answer.

### 2. Enterprise compliance teams have a new reference point

Up until 2025, "did the model developer evaluate safety?" was answered by the developer's own model card. CAISI evaluations provide an **external** reference. This is useful for procurement, for legal review, and for sectors with regulatory exposure (healthcare, finance, government contracting).

### 3. Cybersecurity evaluation is becoming table-stakes

The CAISI evaluations explicitly cover cybersecurity capabilities. Combined with Anthropic's May 2026 Mythos cybersecurity model (with restricted access), the signal to enterprise buyers is clear: cybersecurity-relevant capability is now subject to pre-release scrutiny by both the lab and the standards body.

## What Changes for CallSphere Customers

CallSphere does not develop frontier models; we run them. So CAISI's direct effect on us is indirect — but real.

- **Model selection rigor.** When we pick a model for a vertical (healthcare, real estate, sales, salon, IT helpdesk, after-hours), the underlying model's CAISI status is part of the evaluation. Models from labs with active CAISI agreements clear that bar by default.
- **Procurement story.** Customers in regulated industries (healthcare especially) can point to CAISI-evaluated underlying models as part of their vendor due diligence. We surface this in our security and compliance documentation.
- **HIPAA-adjacent confidence.** HIPAA does not require CAISI evaluation. But pre-release safety evaluation strengthens the case that the underlying model is fit for use in patient-facing contexts.

## What CAISI Is Not

It is worth being precise about scope. CAISI is **not**:

- A regulatory body — agreements are voluntary
- A pre-approval gate — models can ship without CAISI sign-off
- A consumer protection framework — the focus is capability and safety, not consumer harms
- A single-issue body — bio/chem/cyber/critical infrastructure are all in scope

It is a **standards-and-evaluation** body operating through voluntary lab agreements. The value is in the **independence** of the evaluation, not regulatory force.

## What to Watch Next

Three things will indicate whether CAISI's role grows in 2026–2027:

1. **Coverage** — do Meta, Mistral, and other major labs join?
2. **Disclosure** — does CAISI publish summary evaluation results, or do they stay internal?
3. **International coordination** — does CAISI align with UK AISI, EU evaluation bodies, and similar?

Each of these would meaningfully raise the bar for enterprise AI procurement.

## The Pentagon Context

In the same window, the **Pentagon announced AI deals with 8 Big Tech companies** — notably excluding Anthropic. The juxtaposition is interesting: CAISI is testing the same models the Pentagon is now deploying. The federal posture in 2026 is parallel-track: standards body evaluates, procurement body buys, with overlap but not identity in the lab roster.

For enterprise buyers, the takeaway is that the federal signal on AI is now mixed (evaluation + procurement + policy via the AI Action Plan). The lab roster is wider than any single track.

## What CallSphere Recommends

For enterprise buyers evaluating voice and chat agent platforms in 2026:

- Ask the vendor which underlying model(s) they use
- Confirm whether those models are under a CAISI agreement
- Confirm the vendor's own compliance posture (HIPAA, SOC 2)
- For healthcare, ask about HIPAA-friendly deployment specifically

CallSphere is HIPAA-friendly, runs on models from labs with active CAISI agreements, and exposes the compliance story openly. [See pricing at callsphere.ai/pricing](https://callsphere.ai/pricing).

## FAQ

**Q: Is CAISI a US-only body?**
A: Yes, CAISI is the US center. The UK has its own AI Safety Institute; the EU has its own evaluation tracks. International coordination is in early stages.

**Q: Does CAISI evaluate every release of a model, or just major versions?**
A: The agreements are described as pre-release evaluations. Cadence has not been publicly specified in detail; major capability-jump releases are the most likely candidates.

**Q: Are CAISI evaluations published?**
A: Summary findings have been published for some prior evaluations. The May 2026 agreements have not yet produced public evaluation reports as of writing.

## Sources

- CAISI agreements announcement — May 2026
- America's AI Action Plan (Commerce Secretary Howard Lutnick)
- Pentagon AI procurement announcements — May 2026
- CallSphere product and compliance surface — callsphere.ai

---

Source: https://callsphere.ai/blog/tw26w19-caisi-google-microsoft-xai-ai-model-testing-may-2026
