Skip to content
Technology
Technology7 min read0 views

California AB 2013 and Frontier Model Transparency: What Changed

California's AB 2013 forced training-data disclosure for frontier model providers. What is now public, what is not, and what other states are following.

What AB 2013 Requires

California's AB 2013, signed in 2024 and enforced beginning in 2026, requires developers of generative AI models trained or substantially modified in California to publish a high-level summary of training data used to develop their models. The bill was crafted narrowly compared to broader proposals (like the vetoed SB 1047) but has had outsized effect because California is where most US frontier-model developers operate.

This piece walks through what AB 2013 actually requires, what providers have published, and what remains unclear.

The Specific Disclosures

flowchart TB
    AB[AB 2013] --> D1[Sources of data]
    AB --> D2[Whether copyrighted material]
    AB --> D3[Whether personal information]
    AB --> D4[Whether collected via web crawl]
    AB --> D5[Whether purchased or licensed]
    AB --> D6[Whether synthetic]

For each model, providers must publish on their website:

  • High-level data sources
  • Whether the data set includes copyrighted, trademarked, or patented material
  • Whether the data set includes personal information
  • Whether collected via web crawling
  • Whether purchased or licensed from third parties
  • Whether synthetic
  • Date the data was collected
  • Time period the data covers
  • Whether modifications were made post-collection

The level of detail is "summary," not item-by-item disclosure. Importantly, the law does not require revealing trade secrets — a key concession during drafting.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

What Got Published

Frontier providers (OpenAI, Anthropic, Google, Meta, others) have updated their model documentation to include AB 2013 sections. The published summaries are typically:

  • A few paragraphs describing categories of sources (web crawl, licensed datasets, code repositories, synthetic data)
  • Acknowledgment that copyrighted material is in the dataset
  • Statement that personal data may be in publicly-accessible web data
  • Date ranges
  • High-level filtering / curation notes

These summaries are generally less detailed than the EU AI Act training-data summaries that the same providers also publish, because EU and CA expectations differ.

What's Still Held Back

  • Specific dataset names (most are categorical only: "licensed news data" not "licensed news data from XYZ")
  • Proportion or weight of each source
  • Internal evaluation datasets used for training-time decisions
  • Synthetic data generation pipelines (treated as trade secrets)

The Enforcement Picture

AB 2013 is civil; the California AG enforces. By April 2026 there have been no public enforcement actions, but the AG has solicited public comments on whether disclosures are adequate. Several civil-society groups have filed complaints arguing that several public summaries are too sparse.

The interpretive direction is unsettled: if the AG decides "high-level" requires more granularity than providers currently give, future updates will need more detail.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

What Other States Are Following

flowchart TD
    CA[California AB 2013<br/>signed 2024, enforced 2026] --> Spread
    Spread --> CO[Colorado AI Act 2024]
    Spread --> NY[New York AI deepfake law 2025]
    Spread --> TX[Texas data-disclosure proposals]
    Spread --> WA[Washington follow-on bill]

Several states have passed or are considering similar transparency provisions. The patchwork is real; most providers have chosen to publish a single global disclosure that satisfies the strictest applicable jurisdiction.

Federal Picture

There is no federal training-data disclosure law in 2026. Several proposals exist (including provisions in pending AI bills) but none has passed. The practical pressure from EU AI Act and California AB 2013 has produced de facto federal transparency without explicit federal action.

What This Means for Open-Source Providers

Open-source frontier model providers (Llama 4, Mistral, DeepSeek for the parts shipped from CA) are in scope. Their AB 2013 disclosures tend to be substantially more detailed than closed-model providers, partly because their training process is more visible anyway.

The interesting outcome: open-source providers are setting a higher transparency bar that may pressure closed-model providers in future regulatory cycles.

Practical Steps

If you train a model in California in 2026:

  1. Identify whether you cross the "developer" threshold (commercial deployment + substantial training)
  2. Build the data-source summary using the AB 2013 categories
  3. Publish on your website with each release
  4. Maintain an internal data-source register for legal review
  5. Coordinate with EU AI Act compliance — overlapping but not identical

Sources

## California AB 2013 and Frontier Model Transparency: What Changed: production view California AB 2013 and Frontier Model Transparency: What Changed ultimately resolves into one engineering question: when do you use the OpenAI Realtime API versus an async pipeline? Realtime wins on latency for live calls. Async wins on cost, retries, and structured tool reliability for callbacks and SMS flows. Most teams need both, and the routing layer between them becomes the most load-bearing piece of the stack. ## Broader technology framing The protocol layer determines what's possible: WebRTC for browser-side widgets, SIP trunks (Twilio, Telnyx) for PSTN voice, WebSockets for the Realtime API streaming session. Each has its own jitter buffer, its own ICE/STUN dance, and its own failure modes when a customer's corporate firewall is hostile. Front-end is **Next.js 15 + React 19** for the marketing surface and the in-app dashboards, with server components used heavily for the SEO-critical pages. Backend splits across **FastAPI** for the AI worker, **NestJS + Prisma** for the customer-facing API, and a thin **Go gateway** that does auth, rate limiting, and routing — letting each service scale on its own characteristics. Datastores: **Postgres** as the source of truth (per-vertical schemas like `healthcare_voice`, `realestate_voice`), **ChromaDB** for RAG over support docs, **Redis** for ephemeral session state. Postgres RLS enforces tenant isolation at the row level so a misconfigured query can't leak across customers. ## FAQ **Is this realistic for a small business, or is it enterprise-only?** 57+ languages are supported out of the box, and the platform is HIPAA and SOC 2 aligned, which removes most of the procurement friction in regulated verticals. For a topic like "California AB 2013 and Frontier Model Transparency: What Changed", that means you're not starting from scratch — you're configuring an agent template that's already been hardened across thousands of conversations. **Which integrations have to be in place before launch?** Day one is integration mapping (scheduler, CRM, messaging) and prompt tuning against your top 20 real call transcripts. Day two through five is shadow-mode running, where the agent transcribes and recommends but a human still answers, so you can compare side-by-side. Go-live is the moment your eval pass-rate clears your internal bar. **How do we measure whether it's actually working?** The honest answer: it scales until your tool catalog gets stale. The agent is only as good as the integrations it can actually call, so the operational discipline is keeping schemas, webhooks, and fallback paths green. The platform handles the rest — observability, retries, multi-region routing — without your team owning the GPU layer. ## Talk to us Want to see how this maps to your stack? Book a live walkthrough at [calendly.com/sagar-callsphere/new-meeting](https://calendly.com/sagar-callsphere/new-meeting), or try the vertical-specific demo at [urackit.callsphere.tech](https://urackit.callsphere.tech). 14-day trial, no credit card, pilot live in 3–5 business days.
Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.