TL;DR — Terraform 1.10 ephemeral resources finally let you fetch a Vault-issued AWS token during an apply without it ever touching the state file. That removes the most common reason AI infra leaks credentials.

What you'll set up

A Terraform configuration that stands up: an EKS cluster, an OpenSearch Serverless collection (vector store), an Application Load Balancer with WebSocket support, and IRSA roles for the voice-agent pod — all using ephemeral Vault credentials so no static AWS keys exist in TF state or CI.

Architecture

flowchart TD
  TF[terraform apply] --> VAULT[(Vault)]
  VAULT -->|ephemeral STS| AWS[AWS APIs]
  AWS --> EKS[EKS cluster]
  AWS --> OSS[OpenSearch Serverless]
  AWS --> ALB[ALB WebSocket]
  EKS --> IRSA[Pod IRSA]
  IRSA --> OSS

Step 1 — Configure the Vault provider with ephemeral output

```hcl terraform { required_version = ">= 1.10" required_providers { vault = { source = "hashicorp/vault", version = "~~> 4.4" } aws = { source = "hashicorp/aws", version = "~~> 5.70" } } }

provider "vault" { address = "https://vault.example.com" }

ephemeral "vault_aws_access_credentials" "tf_role" { backend = "aws" role = "terraform-deployer" }

provider "aws" { region = "us-east-1" access_key = ephemeral.vault_aws_access_credentials.tf_role.access_key secret_key = ephemeral.vault_aws_access_credentials.tf_role.secret_key token = ephemeral.vault_aws_access_credentials.tf_role.security_token } ```

The ephemeral block is read fresh each phase (plan, apply, refresh). Nothing lands in terraform.tfstate. When the apply finishes, Vault revokes the STS lease.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

Step 2 — Stand up EKS with OIDC

```hcl module "eks" { source = "terraform-aws-modules/eks/aws" version = "~> 20.30" cluster_name = "voice-prod" cluster_version = "1.31" vpc_id = aws_vpc.voice.id subnet_ids = aws_subnet.private[*].id enable_irsa = true eks_managed_node_groups = { voice = { instance_types = ["m7g.large"] min_size = 2; max_size = 10; desired_size = 3 } } } ```

ARM Graviton (m7g) cuts our voice-agent compute spend ~30% vs x86, and the OpenAI Realtime client and LiveKit both run cleanly on arm64 in 2026.

Step 3 — OpenSearch Serverless vector store

```hcl resource "aws_opensearchserverless_collection" "vectors" { name = "voice-vectors" type = "VECTORSEARCH" }

resource "aws_opensearchserverless_security_policy" "encr" { name = "voice-vectors-encr" type = "encryption" policy = jsonencode({ Rules = [{ ResourceType = "collection", Resource = ["collection/voice-vectors"] }] AWSOwnedKey = true }) } ```

We use this for tool-result caching and for the per-tenant FAQ embeddings. VECTORSEARCH is the right type — SEARCH will silently bill more.

Step 4 — IRSA for the voice-agent pod

```hcl data "aws_iam_policy_document" "agent_assume" { statement { actions = ["sts:AssumeRoleWithWebIdentity"] principals { type = "Federated"; identifiers = [module.eks.oidc_provider_arn] } condition { test = "StringEquals"; variable = "${replace(module.eks.cluster_oidc_issuer_url,"https://","")}:sub" values = ["system:serviceaccount:voice:voice-agent"] } } }

resource "aws_iam_role" "agent" { name = "voice-agent" assume_role_policy = data.aws_iam_policy_document.agent_assume.json } ```

Now the voice-agent pod assumes voice-agent via service-account annotation eks.amazonaws.com/role-arn — no static IAM keys, ever.

Step 5 — ALB with WebSocket idle timeout tuning

```hcl resource "aws_lb" "voice" { name = "voice-alb" load_balancer_type = "application" subnets = aws_subnet.public[*].id idle_timeout = 3600 enable_http2 = true } ```

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

idle_timeout = 3600 (1 hour) is critical. Default 60s drops long voice sessions; 3600s plus a TCP keepalive at 30s on the agent keeps WebSockets alive.

Step 6 — Run `terraform plan` with a generated lockfile in CI

```yaml

.github/workflows/terraform.yml

run: terraform init -backend-config=bucket=tf-state -backend-config=key=voice/prod
run: terraform validate
run: terraform plan -out=tfplan
run: terraform apply -auto-approve tfplan ```

The state goes to S3 with versioning and SSE-KMS; PRs only get plan permissions, main gets apply.

Step 7 — Drift-detection cron

```yaml on: { schedule: [{ cron: '0 3 * * *' }] } jobs: drift: steps: - run: terraform plan -detailed-exitcode || (gh issue create --title "Drift detected" --body "see logs"; exit 1) ```

-detailed-exitcode returns 2 when there's drift — turns into a Slack alert before someone "fixed it in the console" causes a 3am page.

Pitfalls

Ephemeral resources in output — outputs cannot reference ephemeral. The provider error message is misleading; just don't output ephemeral values.
OpenSearch Serverless eventual consistency — collection ARNs aren't ready immediately; add a time_sleep of ~30s before granting policies.
ALB idle_timeout vs upstream — set server_side_timeout on the agent ≥ ALB idle_timeout, otherwise upstream closes mid-call.
EKS OIDC issuer URL formatting — the replace(...,"https://","") is required for the sub condition; many tutorials get this wrong.
Vault role TTL too short — if max_ttl < apply duration, mid-apply refreshes fail. Set 30 min minimum.

How CallSphere does this in production

CallSphere's primary infra is a self-hosted k3s cluster (not EKS) with Postgres at 72.62.162.83 behind Cloudflare Tunnel — but tenant-isolated HIPAA installs use exactly this Terraform pattern on customer AWS accounts. We never store long-lived AWS keys in CI; Vault issues 30-min STS for every apply. 37 agents, 90+ tools, 115+ DB tables, $149/$499/$1499, 14-day trial, 22% affiliate, see pricing.

FAQ

Q: Terraform vs OpenTofu in 2026? OpenTofu is now wire-compatible with TF 1.10's ephemeral feature; pick OpenTofu if license matters, TF if you need HCP integrations.

Q: How do I share state across teams? S3 + DynamoDB lock with SSE-KMS, or HCP Terraform Workspaces with VCS-driven runs.

Q: Why not store AWS keys in CI secrets and skip Vault? You can, but you're now committed to rotating those keys. OIDC + Vault is set-and-forget.

Q: Where do model API keys go? External Secrets Operator pulls them from Vault into the pod at runtime. Never in TF state.

Terraform for AI Voice Infrastructure: Ephemeral Resources + Vault (2026)

What you'll set up

Architecture

Step 1 — Configure the Vault provider with ephemeral output

Step 2 — Stand up EKS with OIDC

Step 3 — OpenSearch Serverless vector store

Step 4 — IRSA for the voice-agent pod

Step 5 — ALB with WebSocket idle timeout tuning

Step 6 — Run `terraform plan` with a generated lockfile in CI

.github/workflows/terraform.yml

Step 7 — Drift-detection cron

Pitfalls

How CallSphere does this in production

FAQ

Sources

Try CallSphere AI Voice Agents

Related Articles You May Like

Build a Chat Agent with Haystack RAG + Open LLM (Llama 3.2, 2026)

Build a Voice Agent on Cloudflare Workers AI (No External LLM)

AWS HealthScribe 2026: The Open Medical Scribe API Layer

AWS Multi-Agent Orchestrator: Supervisor Routing Patterns Guide

How to Build Voice Agent CI/CD with Evals as Gate (GitHub Actions)

Build a CallSphere-Style Outbound Voice Campaign Tool

Product

Resources

Company

Legal

Industries

Integrations

Solutions

Compare

Pillar Guides

See AI Voice Agents in Action

What you'll set up

Architecture

Step 1 — Configure the Vault provider with ephemeral output

Step 2 — Stand up EKS with OIDC

Step 3 — OpenSearch Serverless vector store

Step 4 — IRSA for the voice-agent pod

Step 5 — ALB with WebSocket idle timeout tuning

Step 6 — Run terraform plan with a generated lockfile in CI

.github/workflows/terraform.yml

Step 7 — Drift-detection cron

Pitfalls

How CallSphere does this in production

FAQ

Sources

Try CallSphere AI Voice Agents

Related Articles You May Like

Build a Chat Agent with Haystack RAG + Open LLM (Llama 3.2, 2026)

Build a Voice Agent on Cloudflare Workers AI (No External LLM)

AWS HealthScribe 2026: The Open Medical Scribe API Layer

AWS Multi-Agent Orchestrator: Supervisor Routing Patterns Guide

How to Build Voice Agent CI/CD with Evals as Gate (GitHub Actions)

Build a CallSphere-Style Outbound Voice Campaign Tool

Product

Resources

Company

Legal

Industries

Integrations

Solutions

Compare

Pillar Guides

See AI Voice Agents in Action

Step 6 — Run `terraform plan` with a generated lockfile in CI