Terraform for AI Voice Infrastructure: Ephemeral Resources + Vault (2026)
Provision an AI voice agent stack with Terraform 1.10+: ephemeral Vault credentials, AWS LB + EKS, OpenSearch for vector store, and OIDC trust without long-lived keys.
TL;DR — Terraform 1.10 ephemeral resources finally let you fetch a Vault-issued AWS token during an apply without it ever touching the state file. That removes the most common reason AI infra leaks credentials.
What you'll set up
A Terraform configuration that stands up: an EKS cluster, an OpenSearch Serverless collection (vector store), an Application Load Balancer with WebSocket support, and IRSA roles for the voice-agent pod — all using ephemeral Vault credentials so no static AWS keys exist in TF state or CI.
Architecture
flowchart TD
TF[terraform apply] --> VAULT[(Vault)]
VAULT -->|ephemeral STS| AWS[AWS APIs]
AWS --> EKS[EKS cluster]
AWS --> OSS[OpenSearch Serverless]
AWS --> ALB[ALB WebSocket]
EKS --> IRSA[Pod IRSA]
IRSA --> OSS
Step 1 — Configure the Vault provider with ephemeral output
```hcl
terraform {
required_version = ">= 1.10"
required_providers {
vault = { source = "hashicorp/vault", version = "> 4.4" }
aws = { source = "hashicorp/aws", version = "> 5.70" }
}
}
provider "vault" { address = "https://vault.example.com" }
ephemeral "vault_aws_access_credentials" "tf_role" { backend = "aws" role = "terraform-deployer" }
provider "aws" { region = "us-east-1" access_key = ephemeral.vault_aws_access_credentials.tf_role.access_key secret_key = ephemeral.vault_aws_access_credentials.tf_role.secret_key token = ephemeral.vault_aws_access_credentials.tf_role.security_token } ```
The ephemeral block is read fresh each phase (plan, apply, refresh). Nothing lands in terraform.tfstate. When the apply finishes, Vault revokes the STS lease.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Step 2 — Stand up EKS with OIDC
```hcl module "eks" { source = "terraform-aws-modules/eks/aws" version = "~> 20.30" cluster_name = "voice-prod" cluster_version = "1.31" vpc_id = aws_vpc.voice.id subnet_ids = aws_subnet.private[*].id enable_irsa = true eks_managed_node_groups = { voice = { instance_types = ["m7g.large"] min_size = 2; max_size = 10; desired_size = 3 } } } ```
ARM Graviton (m7g) cuts our voice-agent compute spend ~30% vs x86, and the OpenAI Realtime client and LiveKit both run cleanly on arm64 in 2026.
Step 3 — OpenSearch Serverless vector store
```hcl resource "aws_opensearchserverless_collection" "vectors" { name = "voice-vectors" type = "VECTORSEARCH" }
resource "aws_opensearchserverless_security_policy" "encr" { name = "voice-vectors-encr" type = "encryption" policy = jsonencode({ Rules = [{ ResourceType = "collection", Resource = ["collection/voice-vectors"] }] AWSOwnedKey = true }) } ```
We use this for tool-result caching and for the per-tenant FAQ embeddings. VECTORSEARCH is the right type — SEARCH will silently bill more.
Step 4 — IRSA for the voice-agent pod
```hcl data "aws_iam_policy_document" "agent_assume" { statement { actions = ["sts:AssumeRoleWithWebIdentity"] principals { type = "Federated"; identifiers = [module.eks.oidc_provider_arn] } condition { test = "StringEquals"; variable = "${replace(module.eks.cluster_oidc_issuer_url,"https://","")}:sub" values = ["system:serviceaccount:voice:voice-agent"] } } }
resource "aws_iam_role" "agent" { name = "voice-agent" assume_role_policy = data.aws_iam_policy_document.agent_assume.json } ```
Now the voice-agent pod assumes voice-agent via service-account annotation eks.amazonaws.com/role-arn — no static IAM keys, ever.
Step 5 — ALB with WebSocket idle timeout tuning
```hcl resource "aws_lb" "voice" { name = "voice-alb" load_balancer_type = "application" subnets = aws_subnet.public[*].id idle_timeout = 3600 enable_http2 = true } ```
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
idle_timeout = 3600 (1 hour) is critical. Default 60s drops long voice sessions; 3600s plus a TCP keepalive at 30s on the agent keeps WebSockets alive.
Step 6 — Run terraform plan with a generated lockfile in CI
```yaml
.github/workflows/terraform.yml
- run: terraform init -backend-config=bucket=tf-state -backend-config=key=voice/prod
- run: terraform validate
- run: terraform plan -out=tfplan
- run: terraform apply -auto-approve tfplan ```
The state goes to S3 with versioning and SSE-KMS; PRs only get plan permissions, main gets apply.
Step 7 — Drift-detection cron
```yaml on: { schedule: [{ cron: '0 3 * * *' }] } jobs: drift: steps: - run: terraform plan -detailed-exitcode || (gh issue create --title "Drift detected" --body "see logs"; exit 1) ```
-detailed-exitcode returns 2 when there's drift — turns into a Slack alert before someone "fixed it in the console" causes a 3am page.
Pitfalls
- Ephemeral resources in
output— outputs cannot reference ephemeral. The provider error message is misleading; just don't output ephemeral values. - OpenSearch Serverless eventual consistency — collection ARNs aren't ready immediately; add a
time_sleepof ~30s before granting policies. - ALB idle_timeout vs upstream — set
server_side_timeouton the agent ≥ ALB idle_timeout, otherwise upstream closes mid-call. - EKS OIDC issuer URL formatting — the
replace(...,"https://","")is required for thesubcondition; many tutorials get this wrong. - Vault role TTL too short — if
max_ttl< apply duration, mid-apply refreshes fail. Set 30 min minimum.
How CallSphere does this in production
CallSphere's primary infra is a self-hosted k3s cluster (not EKS) with Postgres at 72.62.162.83 behind Cloudflare Tunnel — but tenant-isolated HIPAA installs use exactly this Terraform pattern on customer AWS accounts. We never store long-lived AWS keys in CI; Vault issues 30-min STS for every apply. 37 agents, 90+ tools, 115+ DB tables, $149/$499/$1499, 14-day trial, 22% affiliate, see pricing.
FAQ
Q: Terraform vs OpenTofu in 2026? OpenTofu is now wire-compatible with TF 1.10's ephemeral feature; pick OpenTofu if license matters, TF if you need HCP integrations.
Q: How do I share state across teams? S3 + DynamoDB lock with SSE-KMS, or HCP Terraform Workspaces with VCS-driven runs.
Q: Why not store AWS keys in CI secrets and skip Vault? You can, but you're now committed to rotating those keys. OIDC + Vault is set-and-forget.
Q: Where do model API keys go? External Secrets Operator pulls them from Vault into the pod at runtime. Never in TF state.
Sources
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.