---
title: "HIPAA and AI Training Data Exclusion: The Zero-Retention BAA in 2026"
description: "Most HIPAA-eligible AI deployments depend on one operational pattern: zero-data-retention endpoints with explicit training-data exclusion. Here is the contract language, the technical pattern, and the gotchas."
canonical: https://callsphere.ai/blog/vw2f-zero-retention-baa-ai-training-exclusion
category: "AI Infrastructure"
tags: ["HIPAA", "Zero Retention", "BAA", "AI Training Data", "AI Compliance"]
author: "CallSphere Team"
published: 2026-04-23T00:00:00.000Z
updated: 2026-05-07T09:32:11.238Z
---

# HIPAA and AI Training Data Exclusion: The Zero-Retention BAA in 2026

> Most HIPAA-eligible AI deployments depend on one operational pattern: zero-data-retention endpoints with explicit training-data exclusion. Here is the contract language, the technical pattern, and the gotchas.

> "We don't train on your data" is the most common AI-vendor sales line in healthcare. The HIPAA-defensible version of that promise is in writing, in the BAA, with a specific endpoint and a specific retention number.

## What the law actually says

```mermaid
flowchart TD
  In[Patient interaction] --> MinNec{Minimum necessary?}
  MinNec -->|yes| Process[AI process]
  MinNec -->|no| Reject[Block + log]
  Process --> Encrypt[(AES-256 at rest)]
  Encrypt --> DB[(PostgreSQL)]
  Process --> Audit[(Audit trail)]
  DB --> Right[Right of access §164.524]
```

CallSphere reference architecture

HIPAA does not use the word "training" but the rules cover the underlying activities directly. A use or disclosure of PHI for purposes other than treatment, payment, or operations requires authorization under 45 CFR 164.508 unless an exception applies. Using PHI to train a model that benefits other customers is not treatment, payment, or operations of the originating covered entity — it is an other purpose, requiring authorization. The Privacy Rule's minimum-necessary standard at 45 CFR 164.502(b) further constrains the data made available even within permitted purposes.

The BAA at 45 CFR 164.504(e)(2) must "establish the permitted and required uses and disclosures" of PHI by the business associate. Silence on training is not permission — but ambiguous "improvement of services" language has been read by some vendors as permission. The defensive pattern is explicit: the BAA names training, fine-tuning, and evaluation as not permitted, with a separate de-identified-data clause carving out the limited research uses the covered entity authorizes (or not).

The proposed 2026 Security Rule update reinforces the technical posture: encryption at rest and in transit, MFA, and annual third-party verification of safeguards. None of that means anything if PHI persists in a training corpus.

## What this means for AI voice and chat agents

The zero-retention pattern is the operational shape of training-data exclusion. OpenAI's API offers zero-data-retention (ZDR) on eligible endpoints under a signed BAA — prompts and completions are processed in memory and not logged. Anthropic offers zero-data-retention on eligible API features, with retained data not used for training without express permission. AWS Bedrock does not log prompts or responses by default and does not use them for training; the AWS BAA covers Bedrock and listed sub-services. Google Vertex AI under the Google Cloud BAA disables data logging for HIPAA-eligible projects with the regulated-data flag enabled.

The configuration matters. The endpoint, project flag, and account-level setting all need to align. A regular API key on a regular OpenAI account does not get ZDR — even with a BAA. The BAA scopes to the BAA-eligible endpoint; using a non-eligible endpoint is a breach even with the contract signed.

## How CallSphere implements

CallSphere routes every healthcare prompt through BAA-eligible endpoints with zero-data-retention configured. Our standard BAA includes explicit non-permission for training, fine-tuning, and evaluation on customer PHI; explicit non-permission for cross-customer data pooling; and explicit return-or-destroy at termination. Sub-processor BAAs flow the same language down. The audit trail records the model provider, model name, BAA reference, ZDR status flag, and prompt token counts on every inference call across our 90+ tools and 115+ tables. Across 50+ deployed businesses, we have not had a single training-data exposure incident. Healthcare buyers can review the model-provider stance at [/industries/healthcare](/industries/healthcare), see pricing on [/pricing](/pricing), and start with a [14-day trial](/trial). Behavioral-health customers should also see [/lp/behavioral-health](/lp/behavioral-health).

## Compliance and build checklist

1. Add explicit "no training, fine-tuning, or evaluation on PHI" language to the BAA.
2. Add explicit "no cross-customer data pooling" language to the BAA.
3. Verify zero-data-retention status on every model endpoint that supports it.
4. Confirm the model endpoint is BAA-eligible — not just the account.
5. Sign downstream BAAs with every model and inference sub-processor.
6. For OpenAI, request ZDR through [baa@openai.com](mailto:baa@openai.com) or your enterprise contact and store the confirmation.
7. For AWS Bedrock, accept the AWS BAA in AWS Artifact and apply SCPs that deny non-eligible services.
8. For Google Vertex AI, enable the regulated-data flag and use VPC Service Controls.
9. For Anthropic, confirm ZDR eligibility per feature in the API documentation.
10. Audit inference logs monthly to confirm no PHI persists outside the BAA boundary.
11. Reconfirm BAA-eligibility lists quarterly — they shift.

## FAQ

**Is "zero data retention" a HIPAA term?**
No. It is a vendor configuration term. HIPAA's permitted-uses framework at 45 CFR 164.504(e)(2) is the underlying control; ZDR is the operational shape that satisfies it.

**Can I fine-tune on de-identified data?**
Yes if the data is genuinely de-identified under 45 CFR 164.514(a) (Safe Harbor or Expert Determination). De-identified data is no longer PHI. AI conversation logs need careful de-identification — covered separately.

**Is a vendor's "we don't train on customer data" page enough?**
No. It needs to be in the BAA. Public web pages change without notice; BAAs do not.

**What about "model improvement" language?**
"Model improvement" is ambiguous and has been used by vendors to argue training is permitted. Strike or define narrowly in the BAA.

**Does the proposed Security Rule mention training?**
The NPRM does not name training directly but the asset inventory and risk analysis requirements force enumeration of every place PHI lands — including inference and any training pipeline.

## Sources

- 45 CFR 164.504(e), BAA permitted uses: [https://www.ecfr.gov/current/title-45/section-164.504](https://www.ecfr.gov/current/title-45/section-164.504)
- 45 CFR 164.508, Authorizations: [https://www.ecfr.gov/current/title-45/section-164.508](https://www.ecfr.gov/current/title-45/section-164.508)
- OpenAI, Enterprise privacy and HIPAA: [https://openai.com/enterprise-privacy](https://openai.com/enterprise-privacy)
- Anthropic, API data retention: [https://platform.claude.com/docs/en/build-with-claude/api-and-data-retention](https://platform.claude.com/docs/en/build-with-claude/api-and-data-retention)
- AWS, HIPAA Eligible Services Reference: [https://aws.amazon.com/compliance/hipaa-eligible-services-reference/](https://aws.amazon.com/compliance/hipaa-eligible-services-reference/)

---

Source: https://callsphere.ai/blog/vw2f-zero-retention-baa-ai-training-exclusion