By Sagar Shankaran, Founder of CallSphere
Deploy a HIPAA-eligible voice agent on AWS App Runner: FastAPI WebSocket bridge, Bedrock Claude for reasoning, ECR auto-deploy from GitHub, VPC connector for private RDS.
Key takeaways
TL;DR — App Runner is AWS's fully-managed container service: point it at an ECR image, set CPU/memory, and it autoscales 1-25 instances behind a public HTTPS URL with WebSocket support. Pair with Bedrock Claude 4.7 Sonnet, an RDS Postgres in a VPC, and Twilio Media Streams for a HIPAA-eligible voice agent without managing EKS or ECS.
A FastAPI service containerized to ECR Public, deployed via App Runner with VPC connector to a private RDS for Postgres. The service bridges Twilio Media Streams to Bedrock Nova Sonic (Amazon's speech-to-speech model) for sub-second voice. CI: GitHub Actions builds the image, App Runner auto-deploys.
amazon.nova-sonic-v1:0 and/or anthropic.claude-sonnet-4-7-20250620-v1:0.flowchart TD
GH[GitHub Actions] -->|push image| ECR[Amazon ECR]
ECR -->|auto-deploy| AR[AWS App Runner]
AR -->|VPC connector| RDS[(RDS Postgres private)]
T[Twilio] -->|wss| AR
AR <-->|InvokeModelWithBidirectionalStream| BS[Bedrock Nova Sonic]
AR -->|InvokeModel fallback| CL[Bedrock Claude 4.7]
```dockerfile FROM public.ecr.aws/docker/library/python:3.11-slim WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY . . EXPOSE 8080 CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8080", "--workers", "1"] ```
App Runner only forwards traffic to one port; use a single uvicorn worker per container and let App Runner scale instances.
```bash aws ecr create-repository --repository-name voice-agent aws ecr get-login-password | docker login --username AWS --password-stdin $ACCOUNT.dkr.ecr.us-east-1.amazonaws.com docker build -t voice-agent . docker tag voice-agent:latest $ACCOUNT.dkr.ecr.us-east-1.amazonaws.com/voice-agent:latest docker push $ACCOUNT.dkr.ecr.us-east-1.amazonaws.com/voice-agent:latest ```
```bash aws apprunner create-service \ --service-name voice-agent \ --source-configuration '{ "ImageRepository": { "ImageIdentifier": "'$ACCOUNT'.dkr.ecr.us-east-1.amazonaws.com/voice-agent:latest", "ImageRepositoryType": "ECR", "ImageConfiguration": {"Port": "8080"} }, "AutoDeploymentsEnabled": true, "AuthenticationConfiguration": { "AccessRoleArn": "arn:aws:iam::'$ACCOUNT':role/AppRunnerECRAccessRole" } }' \ --instance-configuration '{ "Cpu": "1 vCPU", "Memory": "2 GB", "InstanceRoleArn": "arn:aws:iam::'$ACCOUNT':role/AppRunnerVoiceAgentRole" }' \ --network-configuration '{ "EgressConfiguration": {"EgressType":"VPC","VpcConnectorArn":"arn:aws:apprunner:us-east-1:'$ACCOUNT':vpcconnector/voice-vpc/1/...."} }' ```
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
AutoDeploymentsEnabled: true makes App Runner pull a new image whenever ECR has a fresh :latest tag.
```python import boto3, json br = boto3.client("bedrock-runtime", region_name="us-east-1")
async def nova_sonic_session(twilio_ws): response = br.invoke_model_with_bidirectional_stream( modelId="amazon.nova-sonic-v1:0", body=streaming_body_iter(twilio_ws) ) async for chunk in response["body"]: ev = json.loads(chunk["chunk"]["bytes"]) if ev["type"] == "audioOutput": await twilio_ws.send_text(json.dumps({ "event": "media", "streamSid": sid, "media": {"payload": ev["audioOutput"]["content"]} })) ```
Nova Sonic is Amazon's speech-to-speech model — drop-in replacement for STT+LLM+TTS.
```bash aws apprunner create-vpc-connector \ --vpc-connector-name voice-vpc \ --subnets subnet-aaa subnet-bbb \ --security-groups sg-rds-client ```
App Runner egress goes through this connector, so RDS can be private (no public IP). Inbound (Twilio → App Runner) still hits the public HTTPS URL — no change needed.
```json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": ["bedrock:InvokeModel","bedrock:InvokeModelWithBidirectionalStream"], "Resource": "" }, { "Effect": "Allow", "Action": "rds-db:connect", "Resource": "arn:aws:rds-db:us-east-1::dbuser:db-*/voice_user" } ] } ```
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
```yaml name: ci on: { push: { branches: [main] } } jobs: build: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: aws-actions/configure-aws-credentials@v4 with: { role-to-assume: arn:aws:iam::ACCT:role/GhaDeployer, aws-region: us-east-1 } - run: aws ecr get-login-password | docker login --username AWS --password-stdin ACCT.dkr.ecr.us-east-1.amazonaws.com - run: docker build -t voice-agent . && docker tag voice-agent ACCT.dkr.ecr.us-east-1.amazonaws.com/voice-agent:latest && docker push ACCT.dkr.ecr.us-east-1.amazonaws.com/voice-agent:latest ```
App Runner picks up the new image automatically.
min instances = 2.CallSphere's HIPAA Healthcare vertical runs on EKS in a private VPC with FastAPI :8084 — App Runner was tempting but the cost crossed over for our scale (37 agents, 90+ tools, 115+ DB tables, 6 verticals). For teams under ~50k call-min/day, App Runner is cheaper than EKS and far less ops. We use Pion Go + NATS for the OneRoof multi-family vertical's SIP layer. $149/$499/$1499, 14-day trial, 22% affiliate.
Q: App Runner vs ECS Fargate vs EKS? App Runner: zero ops, $$$ per call-min beyond ~50k/day. Fargate: middle. EKS: best cost at scale, most ops. Start with App Runner.
Q: Does App Runner support Bedrock streaming?
Yes — InvokeModelWithBidirectionalStream works fine with App Runner's WS support.
Q: Multi-region? App Runner is regional; for global, deploy in 2-3 regions and use Route 53 latency routing.
Q: Cost at 1k call-min/day? 1 vCPU 2GB instance @ $0.064/hour x 24h x ~3 instances = $4.60/day compute. Bedrock + Twilio dominate.
Q: Can I deploy from GitHub source (no Docker)? Yes — App Runner supports source-code mode for Python/Node, but for voice agents Docker gives you better control over runtime versions.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
Haystack 2.7's Agent component plus an Ollama-served Llama 3.2 gives you tool-calling RAG with citations. Here's a complete pipeline against your own document store.
Run STT, LLM, and TTS entirely on Cloudflare's edge — no OpenAI, no ElevenLabs. Real working code with Whisper, Llama 3.3 70B, and Deepgram Aura.
AWS HealthScribe became the open scribe layer EHR vendors built on top of in 2026. Here's the API surface, the per-encounter pricing, the BAA terms.
AWS Multi-Agent Orchestrator ships supervisor routing, classifier, and shared memory. How to compose a customer-support agent team on Bedrock that scales cleanly.
Version your prompts in git, run a 50-case eval suite on every PR, block merges below threshold, and ship a new agent prompt with confidence — full GitHub Actions tutorial.
Replace expensive outbound SDR tooling with a self-hosted dialer that runs OpenAI Realtime agents at 100 concurrent calls. Full architecture and code.
© 2026 CallSphere LLC. All rights reserved.
Watch how CallSphere handles real customer calls, schedules appointments, and processes payments — live.
Try Live DemoBook a DemoCalculate Your ROI