By Sagar Shankaran, Founder of CallSphere
Provision an AI voice agent stack with Terraform 1.10+: ephemeral Vault credentials, AWS LB + EKS, OpenSearch for vector store, and OIDC trust without long-lived keys.
Key takeaways
TL;DR — Terraform 1.10 ephemeral resources finally let you fetch a Vault-issued AWS token during an apply without it ever touching the state file. That removes the most common reason AI infra leaks credentials.
A Terraform configuration that stands up: an EKS cluster, an OpenSearch Serverless collection (vector store), an Application Load Balancer with WebSocket support, and IRSA roles for the voice-agent pod — all using ephemeral Vault credentials so no static AWS keys exist in TF state or CI.
flowchart TD
TF[terraform apply] --> VAULT[(Vault)]
VAULT -->|ephemeral STS| AWS[AWS APIs]
AWS --> EKS[EKS cluster]
AWS --> OSS[OpenSearch Serverless]
AWS --> ALB[ALB WebSocket]
EKS --> IRSA[Pod IRSA]
IRSA --> OSS
```hcl
terraform {
required_version = ">= 1.10"
required_providers {
vault = { source = "hashicorp/vault", version = "> 4.4" }
aws = { source = "hashicorp/aws", version = "> 5.70" }
}
}
provider "vault" { address = "https://vault.example.com" }
ephemeral "vault_aws_access_credentials" "tf_role" { backend = "aws" role = "terraform-deployer" }
provider "aws" { region = "us-east-1" access_key = ephemeral.vault_aws_access_credentials.tf_role.access_key secret_key = ephemeral.vault_aws_access_credentials.tf_role.secret_key token = ephemeral.vault_aws_access_credentials.tf_role.security_token } ```
The ephemeral block is read fresh each phase (plan, apply, refresh). Nothing lands in terraform.tfstate. When the apply finishes, Vault revokes the STS lease.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
```hcl module "eks" { source = "terraform-aws-modules/eks/aws" version = "~> 20.30" cluster_name = "voice-prod" cluster_version = "1.31" vpc_id = aws_vpc.voice.id subnet_ids = aws_subnet.private[*].id enable_irsa = true eks_managed_node_groups = { voice = { instance_types = ["m7g.large"] min_size = 2; max_size = 10; desired_size = 3 } } } ```
ARM Graviton (m7g) cuts our voice-agent compute spend ~30% vs x86, and the OpenAI Realtime client and LiveKit both run cleanly on arm64 in 2026.
```hcl resource "aws_opensearchserverless_collection" "vectors" { name = "voice-vectors" type = "VECTORSEARCH" }
resource "aws_opensearchserverless_security_policy" "encr" { name = "voice-vectors-encr" type = "encryption" policy = jsonencode({ Rules = [{ ResourceType = "collection", Resource = ["collection/voice-vectors"] }] AWSOwnedKey = true }) } ```
We use this for tool-result caching and for the per-tenant FAQ embeddings. VECTORSEARCH is the right type — SEARCH will silently bill more.
```hcl data "aws_iam_policy_document" "agent_assume" { statement { actions = ["sts:AssumeRoleWithWebIdentity"] principals { type = "Federated"; identifiers = [module.eks.oidc_provider_arn] } condition { test = "StringEquals"; variable = "${replace(module.eks.cluster_oidc_issuer_url,"https://","")}:sub" values = ["system:serviceaccount:voice:voice-agent"] } } }
resource "aws_iam_role" "agent" { name = "voice-agent" assume_role_policy = data.aws_iam_policy_document.agent_assume.json } ```
Now the voice-agent pod assumes voice-agent via service-account annotation eks.amazonaws.com/role-arn — no static IAM keys, ever.
```hcl resource "aws_lb" "voice" { name = "voice-alb" load_balancer_type = "application" subnets = aws_subnet.public[*].id idle_timeout = 3600 enable_http2 = true } ```
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
idle_timeout = 3600 (1 hour) is critical. Default 60s drops long voice sessions; 3600s plus a TCP keepalive at 30s on the agent keeps WebSockets alive.
terraform plan with a generated lockfile in CI```yaml
The state goes to S3 with versioning and SSE-KMS; PRs only get plan permissions, main gets apply.
```yaml on: { schedule: [{ cron: '0 3 * * *' }] } jobs: drift: steps: - run: terraform plan -detailed-exitcode || (gh issue create --title "Drift detected" --body "see logs"; exit 1) ```
-detailed-exitcode returns 2 when there's drift — turns into a Slack alert before someone "fixed it in the console" causes a 3am page.
output — outputs cannot reference ephemeral. The provider error message is misleading; just don't output ephemeral values.time_sleep of ~30s before granting policies.server_side_timeout on the agent ≥ ALB idle_timeout, otherwise upstream closes mid-call.replace(...,"https://","") is required for the sub condition; many tutorials get this wrong.max_ttl < apply duration, mid-apply refreshes fail. Set 30 min minimum.CallSphere's primary infra is a self-hosted k3s cluster (not EKS) with Postgres at 72.62.162.83 behind Cloudflare Tunnel — but tenant-isolated HIPAA installs use exactly this Terraform pattern on customer AWS accounts. We never store long-lived AWS keys in CI; Vault issues 30-min STS for every apply. 37 agents, 90+ tools, 115+ DB tables, $149/$499/$1499, 14-day trial, 22% affiliate, see pricing.
Q: Terraform vs OpenTofu in 2026? OpenTofu is now wire-compatible with TF 1.10's ephemeral feature; pick OpenTofu if license matters, TF if you need HCP integrations.
Q: How do I share state across teams? S3 + DynamoDB lock with SSE-KMS, or HCP Terraform Workspaces with VCS-driven runs.
Q: Why not store AWS keys in CI secrets and skip Vault? You can, but you're now committed to rotating those keys. OIDC + Vault is set-and-forget.
Q: Where do model API keys go? External Secrets Operator pulls them from Vault into the pod at runtime. Never in TF state.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
Haystack 2.7's Agent component plus an Ollama-served Llama 3.2 gives you tool-calling RAG with citations. Here's a complete pipeline against your own document store.
Run STT, LLM, and TTS entirely on Cloudflare's edge — no OpenAI, no ElevenLabs. Real working code with Whisper, Llama 3.3 70B, and Deepgram Aura.
AWS HealthScribe became the open scribe layer EHR vendors built on top of in 2026. Here's the API surface, the per-encounter pricing, the BAA terms.
AWS Multi-Agent Orchestrator ships supervisor routing, classifier, and shared memory. How to compose a customer-support agent team on Bedrock that scales cleanly.
Version your prompts in git, run a 50-case eval suite on every PR, block merges below threshold, and ship a new agent prompt with confidence — full GitHub Actions tutorial.
Replace expensive outbound SDR tooling with a self-hosted dialer that runs OpenAI Realtime agents at 100 concurrent calls. Full architecture and code.
© 2026 CallSphere LLC. All rights reserved.
Watch how CallSphere handles real customer calls, schedules appointments, and processes payments — live.
Try Live DemoBook a DemoCalculate Your ROI