What AB 2013 Requires

California's AB 2013, signed in 2024 and enforced beginning in 2026, requires developers of generative AI models trained or substantially modified in California to publish a high-level summary of training data used to develop their models. The bill was crafted narrowly compared to broader proposals (like the vetoed SB 1047) but has had outsized effect because California is where most US frontier-model developers operate.

This piece walks through what AB 2013 actually requires, what providers have published, and what remains unclear.

The Specific Disclosures

flowchart TB
    AB[AB 2013] --> D1[Sources of data]
    AB --> D2[Whether copyrighted material]
    AB --> D3[Whether personal information]
    AB --> D4[Whether collected via web crawl]
    AB --> D5[Whether purchased or licensed]
    AB --> D6[Whether synthetic]

For each model, providers must publish on their website:

High-level data sources
Whether the data set includes copyrighted, trademarked, or patented material
Whether the data set includes personal information
Whether collected via web crawling
Whether purchased or licensed from third parties
Whether synthetic
Date the data was collected
Time period the data covers
Whether modifications were made post-collection

The level of detail is "summary," not item-by-item disclosure. Importantly, the law does not require revealing trade secrets — a key concession during drafting.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

What Got Published

Frontier providers (OpenAI, Anthropic, Google, Meta, others) have updated their model documentation to include AB 2013 sections. The published summaries are typically:

A few paragraphs describing categories of sources (web crawl, licensed datasets, code repositories, synthetic data)
Acknowledgment that copyrighted material is in the dataset
Statement that personal data may be in publicly-accessible web data
Date ranges
High-level filtering / curation notes

These summaries are generally less detailed than the EU AI Act training-data summaries that the same providers also publish, because EU and CA expectations differ.

What's Still Held Back

Specific dataset names (most are categorical only: "licensed news data" not "licensed news data from XYZ")
Proportion or weight of each source
Internal evaluation datasets used for training-time decisions
Synthetic data generation pipelines (treated as trade secrets)

The Enforcement Picture

AB 2013 is civil; the California AG enforces. By April 2026 there have been no public enforcement actions, but the AG has solicited public comments on whether disclosures are adequate. Several civil-society groups have filed complaints arguing that several public summaries are too sparse.

The interpretive direction is unsettled: if the AG decides "high-level" requires more granularity than providers currently give, future updates will need more detail.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

What Other States Are Following

flowchart TD
    CA[California AB 2013<br/>signed 2024, enforced 2026] --> Spread
    Spread --> CO[Colorado AI Act 2024]
    Spread --> NY[New York AI deepfake law 2025]
    Spread --> TX[Texas data-disclosure proposals]
    Spread --> WA[Washington follow-on bill]

Several states have passed or are considering similar transparency provisions. The patchwork is real; most providers have chosen to publish a single global disclosure that satisfies the strictest applicable jurisdiction.

Federal Picture

There is no federal training-data disclosure law in 2026. Several proposals exist (including provisions in pending AI bills) but none has passed. The practical pressure from EU AI Act and California AB 2013 has produced de facto federal transparency without explicit federal action.

What This Means for Open-Source Providers

Open-source frontier model providers (Llama 4, Mistral, DeepSeek for the parts shipped from CA) are in scope. Their AB 2013 disclosures tend to be substantially more detailed than closed-model providers, partly because their training process is more visible anyway.

The interesting outcome: open-source providers are setting a higher transparency bar that may pressure closed-model providers in future regulatory cycles.

Practical Steps

If you train a model in California in 2026:

Identify whether you cross the "developer" threshold (commercial deployment + substantial training)
Build the data-source summary using the AB 2013 categories
Publish on your website with each release
Maintain an internal data-source register for legal review
Coordinate with EU AI Act compliance — overlapping but not identical

Sources

California AB 2013 full text — https://leginfo.legislature.ca.gov
California AG enforcement guidance — https://oag.ca.gov
OpenAI training-data disclosures — https://openai.com
Anthropic training-data disclosures — https://www.anthropic.com
"State AI laws tracker" Future of Privacy Forum — https://fpf.org

## California AB 2013 and Frontier Model Transparency: What Changed: production view California AB 2013 and Frontier Model Transparency: What Changed ultimately resolves into one engineering question: when do you use the OpenAI Realtime API versus an async pipeline? Realtime wins on latency for live calls. Async wins on cost, retries, and structured tool reliability for callbacks and SMS flows. Most teams need both, and the routing layer between them becomes the most load-bearing piece of the stack. ## Broader technology framing The protocol layer determines what's possible: WebRTC for browser-side widgets, SIP trunks (Twilio, Telnyx) for PSTN voice, WebSockets for the Realtime API streaming session. Each has its own jitter buffer, its own ICE/STUN dance, and its own failure modes when a customer's corporate firewall is hostile. Front-end is **Next.js 15 + React 19** for the marketing surface and the in-app dashboards, with server components used heavily for the SEO-critical pages. Backend splits across **FastAPI** for the AI worker, **NestJS + Prisma** for the customer-facing API, and a thin **Go gateway** that does auth, rate limiting, and routing — letting each service scale on its own characteristics. Datastores: **Postgres** as the source of truth (per-vertical schemas like `healthcare_voice`, `realestate_voice`), **ChromaDB** for RAG over support docs, **Redis** for ephemeral session state. Postgres RLS enforces tenant isolation at the row level so a misconfigured query can't leak across customers. ## FAQ **Is this realistic for a small business, or is it enterprise-only?** 57+ languages are supported out of the box, and the platform is HIPAA and SOC 2 aligned, which removes most of the procurement friction in regulated verticals. For a topic like "California AB 2013 and Frontier Model Transparency: What Changed", that means you're not starting from scratch — you're configuring an agent template that's already been hardened across thousands of conversations. **Which integrations have to be in place before launch?** Day one is integration mapping (scheduler, CRM, messaging) and prompt tuning against your top 20 real call transcripts. Day two through five is shadow-mode running, where the agent transcribes and recommends but a human still answers, so you can compare side-by-side. Go-live is the moment your eval pass-rate clears your internal bar. **How do we measure whether it's actually working?** The honest answer: it scales until your tool catalog gets stale. The agent is only as good as the integrations it can actually call, so the operational discipline is keeping schemas, webhooks, and fallback paths green. The platform handles the rest — observability, retries, multi-region routing — without your team owning the GPU layer. ## Talk to us Want to see how this maps to your stack? Book a live walkthrough at [calendly.com/sagar-callsphere/new-meeting](https://calendly.com/sagar-callsphere/new-meeting), or try the vertical-specific demo at [urackit.callsphere.tech](https://urackit.callsphere.tech). 14-day trial, no credit card, pilot live in 3–5 business days.

California AB 2013 and Frontier Model Transparency: What Changed

What AB 2013 Requires

The Specific Disclosures

What Got Published

What's Still Held Back

The Enforcement Picture

What Other States Are Following

Federal Picture

What This Means for Open-Source Providers

Practical Steps

Sources

Try CallSphere AI Voice Agents

Related Articles You May Like

Claude's Published System Prompts: What They Reveal About Anthropic's Strategy

Enterprise CIO Guide: EU AI Act Enforcement Begins — What Agentic AI Teams Need To Know

Enterprise CIO Guide: NIST AI RMF 2.0 — The US Risk Framework Update

Model Cards and System Cards 2026: What Regulators Now Expect by Default

SMB Founder Playbook: EU AI Act Enforcement Begins — What Agentic AI Teams Need To Know

SMB Founder Playbook: NIST AI RMF 2.0 — The US Risk Framework Update