By Sagar Shankaran, Founder of CallSphere
California's AB 2013 forced training-data disclosure for frontier model providers. What is now public, what is not, and what other states are following.
Key takeaways
California's AB 2013, signed in 2024 and enforced beginning in 2026, requires developers of generative AI models trained or substantially modified in California to publish a high-level summary of training data used to develop their models. The bill was crafted narrowly compared to broader proposals (like the vetoed SB 1047) but has had outsized effect because California is where most US frontier-model developers operate.
This piece walks through what AB 2013 actually requires, what providers have published, and what remains unclear.
flowchart TB
AB[AB 2013] --> D1[Sources of data]
AB --> D2[Whether copyrighted material]
AB --> D3[Whether personal information]
AB --> D4[Whether collected via web crawl]
AB --> D5[Whether purchased or licensed]
AB --> D6[Whether synthetic]
For each model, providers must publish on their website:
The level of detail is "summary," not item-by-item disclosure. Importantly, the law does not require revealing trade secrets — a key concession during drafting.
Frontier providers (OpenAI, Anthropic, Google, Meta, others) have updated their model documentation to include AB 2013 sections. The published summaries are typically:
These summaries are generally less detailed than the EU AI Act training-data summaries that the same providers also publish, because EU and CA expectations differ.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
AB 2013 is civil; the California AG enforces. By April 2026 there have been no public enforcement actions, but the AG has solicited public comments on whether disclosures are adequate. Several civil-society groups have filed complaints arguing that several public summaries are too sparse.
The interpretive direction is unsettled: if the AG decides "high-level" requires more granularity than providers currently give, future updates will need more detail.
flowchart TD
CA[California AB 2013<br/>signed 2024, enforced 2026] --> Spread
Spread --> CO[Colorado AI Act 2024]
Spread --> NY[New York AI deepfake law 2025]
Spread --> TX[Texas data-disclosure proposals]
Spread --> WA[Washington follow-on bill]
Several states have passed or are considering similar transparency provisions. The patchwork is real; most providers have chosen to publish a single global disclosure that satisfies the strictest applicable jurisdiction.
There is no federal training-data disclosure law in 2026. Several proposals exist (including provisions in pending AI bills) but none has passed. The practical pressure from EU AI Act and California AB 2013 has produced de facto federal transparency without explicit federal action.
Open-source frontier model providers (Llama 4, Mistral, DeepSeek for the parts shipped from CA) are in scope. Their AB 2013 disclosures tend to be substantially more detailed than closed-model providers, partly because their training process is more visible anyway.
The interesting outcome: open-source providers are setting a higher transparency bar that may pressure closed-model providers in future regulatory cycles.
If you train a model in California in 2026:
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
California AB 2013 and Frontier Model Transparency: What Changed ultimately resolves into one engineering question: when do you use the OpenAI Realtime API versus an async pipeline? Realtime wins on latency for live calls. Async wins on cost, retries, and structured tool reliability for callbacks and SMS flows. Most teams need both, and the routing layer between them becomes the most load-bearing piece of the stack.
The protocol layer determines what's possible: WebRTC for browser-side widgets, SIP trunks (Twilio, Telnyx) for PSTN voice, WebSockets for the Realtime API streaming session. Each has its own jitter buffer, its own ICE/STUN dance, and its own failure modes when a customer's corporate firewall is hostile.
Front-end is Next.js 15 + React 19 for the marketing surface and the in-app dashboards, with server components used heavily for the SEO-critical pages. Backend splits across FastAPI for the AI worker, NestJS + Prisma for the customer-facing API, and a thin Go gateway that does auth, rate limiting, and routing — letting each service scale on its own characteristics.
Datastores: Postgres as the source of truth (per-vertical schemas like healthcare_voice, realestate_voice), ChromaDB for RAG over support docs, Redis for ephemeral session state. Postgres RLS enforces tenant isolation at the row level so a misconfigured query can't leak across customers.
Is this realistic for a small business, or is it enterprise-only? 57+ languages are supported out of the box, and the platform is HIPAA and SOC 2 aligned, which removes most of the procurement friction in regulated verticals. For a topic like "California AB 2013 and Frontier Model Transparency: What Changed", that means you're not starting from scratch — you're configuring an agent template that's already been hardened across thousands of conversations.
Which integrations have to be in place before launch? Day one is integration mapping (scheduler, CRM, messaging) and prompt tuning against your top 20 real call transcripts. Day two through five is shadow-mode running, where the agent transcribes and recommends but a human still answers, so you can compare side-by-side. Go-live is the moment your eval pass-rate clears your internal bar.
How do we measure whether it's actually working? The honest answer: it scales until your tool catalog gets stale. The agent is only as good as the integrations it can actually call, so the operational discipline is keeping schemas, webhooks, and fallback paths green. The platform handles the rest — observability, retries, multi-region routing — without your team owning the GPU layer.
Want to see how this maps to your stack? Book a live walkthrough at calendly.com/sagar-callsphere/new-meeting, or try the vertical-specific demo at urackit.callsphere.tech. 14-day trial, no credit card, pilot live in 3–5 business days.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
Anthropic publishes Claude's system prompts. What do they encode, what does this say about Anthropic's strategy, and what can enterprise prompt engineers actually learn from them?
Enterprise CIO Guide perspective on The first wave of EU AI Act enforcement landed in 2026 — here is the practical impact on agent deployments.
Model cards graduated from research norm to regulatory expectation in 2026. The new schema, what to disclose, and what to keep proprietary.
Enterprise CIO Guide perspective on NIST's AI Risk Management Framework 2.0 incorporates agentic AI, multi-agent systems, and tool use into its risk taxonomy.
SMB Founder Playbook perspective on The first wave of EU AI Act enforcement landed in 2026 — here is the practical impact on agent deployments.
SMB Founder Playbook perspective on NIST's AI Risk Management Framework 2.0 incorporates agentic AI, multi-agent systems, and tool use into its risk taxonomy.
© 2026 CallSphere LLC. All rights reserved.
Watch how CallSphere handles real customer calls, schedules appointments, and processes payments — live.
Try Live DemoBook a DemoCalculate Your ROI