---
title: "Agentic AI Development Environment: VS Code, Docker, and GPU Setup Guide"
description: "Step-by-step guide to setting up your agentic AI dev environment — VS Code extensions, Docker Compose for LLM services, GPU passthrough, and debugging config."
canonical: https://callsphere.ai/blog/agentic-ai-dev-environment-setup-vscode-docker-gpu
category: "Learn Agentic AI"
tags: ["Development Environment", "VS Code", "Docker", "GPU", "Setup Guide"]
author: "CallSphere Team"
published: 2026-03-14T00:00:00.000Z
updated: 2026-05-06T01:02:41.504Z
---

# Agentic AI Development Environment: VS Code, Docker, and GPU Setup Guide

> Step-by-step guide to setting up your agentic AI dev environment — VS Code extensions, Docker Compose for LLM services, GPU passthrough, and debugging config.

## Why Your Dev Environment Matters for Agentic AI

Agentic AI development has unique requirements that a standard web development setup does not cover. You need to manage API keys for multiple LLM providers, run local model servers for testing, handle streaming responses, debug non-deterministic agent behavior, and sometimes leverage GPU hardware for local inference or embedding generation.

A well-configured development environment reduces the friction between writing code and testing agent behavior. This guide walks you through setting up a complete agentic AI development environment using VS Code, Docker, and optional GPU support.

## VS Code Configuration

### Essential Extensions

Install these extensions for an optimal agentic AI development experience:

```mermaid
flowchart TD
    Q{"What matters most
for your team?"}
    DIM1["Time to first
production deploy"]
    DIM2["Total cost of
ownership at scale"]
    DIM3["Debuggability and
observability"]
    DIM4["Ecosystem and
community support"]
    PICK{Score the
four axes}
    A(["Pick
Agentic AI Development
Environment:"])
    B(["Pick
Code, Docker, and GPU
Setup Guide"])
    Q --> DIM1 --> PICK
    Q --> DIM2 --> PICK
    Q --> DIM3 --> PICK
    Q --> DIM4 --> PICK
    PICK -->|Speed and ecosystem| A
    PICK -->|Control and TCO| B
    style Q fill:#4f46e5,stroke:#4338ca,color:#fff
    style PICK fill:#f59e0b,stroke:#d97706,color:#1f2937
    style A fill:#0ea5e9,stroke:#0369a1,color:#fff
    style B fill:#059669,stroke:#047857,color:#fff
```

**Python Development**:

- **Python** (ms-python.python) — Core Python support
- **Pylance** (ms-python.vscode-pylance) — Fast, feature-rich Python language server
- **Ruff** (charliermarsh.ruff) — Extremely fast Python linter and formatter (replaces Black, isort, and Flake8)

**AI and Agent Development**:

- **Continue** (continue.continue) — AI code assistant that works with Claude and other models
- **REST Client** (humao.rest-client) — Test API endpoints directly from VS Code
- **Thunder Client** (rangav.vscode-thunder-client) — GUI-based API testing

**Infrastructure**:

- **Docker** (ms-azuretools.vscode-docker) — Docker file support and container management
- **YAML** (redhat.vscode-yaml) — YAML validation for Docker Compose and Kubernetes configs
- **Remote - SSH** (ms-vscode-remote.remote-ssh) — Develop on remote GPU machines seamlessly

### VS Code Settings

Configure VS Code for Python agentic AI development:

```json
{
  "python.defaultInterpreterPath": "./venv/bin/python",
  "python.analysis.typeCheckingMode": "basic",
  "editor.formatOnSave": true,
  "[python]": {
    "editor.defaultFormatter": "charliermarsh.ruff",
    "editor.codeActionsOnSave": {
      "source.fixAll": "explicit",
      "source.organizeImports": "explicit"
    }
  },
  "files.associations": {
    "*.env": "dotenv",
    "*.env.*": "dotenv"
  },
  "editor.rulers": [88],
  "files.exclude": {
    "**/__pycache__": true,
    "**/.pytest_cache": true,
    "**/node_modules": true
  }
}
```

### Launch Configuration for Debugging

Debugging agentic AI code requires special configuration because agent loops are often async and involve external API calls. Create a `.vscode/launch.json`:

```json
{
  "version": "0.2.0",
  "configurations": [
    {
      "name": "Debug Agent Server",
      "type": "debugpy",
      "request": "launch",
      "module": "uvicorn",
      "args": [
        "app.main:app",
        "--reload",
        "--port", "8000"
      ],
      "env": {
        "ANTHROPIC_API_KEY": "${env:ANTHROPIC_API_KEY}",
        "OPENAI_API_KEY": "${env:OPENAI_API_KEY}",
        "LOG_LEVEL": "DEBUG"
      },
      "console": "integratedTerminal",
      "justMyCode": false
    },
    {
      "name": "Debug Agent Script",
      "type": "debugpy",
      "request": "launch",
      "program": "${file}",
      "env": {
        "ANTHROPIC_API_KEY": "${env:ANTHROPIC_API_KEY}",
        "OPENAI_API_KEY": "${env:OPENAI_API_KEY}"
      },
      "console": "integratedTerminal"
    },
    {
      "name": "Debug Tests",
      "type": "debugpy",
      "request": "launch",
      "module": "pytest",
      "args": [
        "${file}",
        "-v",
        "--tb=short"
      ],
      "console": "integratedTerminal"
    }
  ]
}
```

The `justMyCode: false` setting is important — it lets you step into framework code (Anthropic SDK, OpenAI SDK) when debugging agent behavior.

## Environment Variable Management

### The .env File Structure

Agentic AI projects typically need many environment variables. Organize them clearly:

```bash
# .env

# ── LLM Providers ──
ANTHROPIC_API_KEY=sk-ant-your-key-here
OPENAI_API_KEY=sk-proj-your-key-here
GOOGLE_API_KEY=your-google-key-here

# ── Database ──
DATABASE_URL=postgresql://user:pass@localhost:5432/agents
REDIS_URL=redis://localhost:6379/0

# ── Vector Database ──
QDRANT_URL=http://localhost:6333
PINECONE_API_KEY=your-pinecone-key

# ── Observability ──
LANGSMITH_API_KEY=your-langsmith-key
LANGSMITH_PROJECT=my-agent-project

# ── Application ──
APP_ENV=development
LOG_LEVEL=DEBUG
AGENT_MAX_ITERATIONS=10
AGENT_TIMEOUT_SECONDS=30
```

### Security Best Practices

Never commit API keys to version control. Create a `.env.example` file with placeholder values and add `.env` to `.gitignore`:

```bash
# .gitignore
.env
.env.local
.env.*.local
```

For team development, use a shared secrets manager. Options include:

- **1Password CLI** — `op run -- python main.py` injects secrets at runtime
- **Doppler** — Syncs secrets across environments and team members
- **AWS Secrets Manager** — Good for teams already on AWS
- **HashiCorp Vault** — Self-hosted, enterprise-grade

Load environment variables in your Python code with `python-dotenv`:

```python
from dotenv import load_dotenv
import os

load_dotenv()  # loads .env file

anthropic_key = os.getenv("ANTHROPIC_API_KEY")
if not anthropic_key:
    raise ValueError("ANTHROPIC_API_KEY not set")
```

## Docker Compose for Local Services

A Docker Compose file lets you spin up all the services your agent needs with one command. Here is a production-grade setup:

```yaml
# docker-compose.yml
version: "3.9"

services:
  # ── PostgreSQL with pgvector ──
  postgres:
    image: pgvector/pgvector:pg16
    ports:
      - "5432:5432"
    environment:
      POSTGRES_USER: agents
      POSTGRES_PASSWORD: localdev
      POSTGRES_DB: agents_dev
    volumes:
      - pgdata:/var/lib/postgresql/data
      - ./db/init.sql:/docker-entrypoint-initdb.d/init.sql
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U agents"]
      interval: 5s
      timeout: 3s
      retries: 5

  # ── Redis for caching and sessions ──
  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
    volumes:
      - redisdata:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      timeout: 3s
      retries: 5

  # ── Qdrant vector database ──
  qdrant:
    image: qdrant/qdrant:v1.12.0
    ports:
      - "6333:6333"
      - "6334:6334"
    volumes:
      - qdrant_data:/qdrant/storage
    environment:
      QDRANT__SERVICE__GRPC_PORT: 6334

  # ── Local LLM server (Ollama) ──
  ollama:
    image: ollama/ollama:latest
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    # Uncomment for GPU support:
    # deploy:
    #   resources:
    #     reservations:
    #       devices:
    #         - driver: nvidia
    #           count: 1
    #           capabilities: [gpu]

  # ── Observability: Jaeger for tracing ──
  jaeger:
    image: jaegertracing/all-in-one:1.55
    ports:
      - "16686:16686"  # UI
      - "4317:4317"    # OTLP gRPC
      - "4318:4318"    # OTLP HTTP

volumes:
  pgdata:
  redisdata:
  qdrant_data:
  ollama_data:
```

Start all services:

```bash
docker compose up -d
```

Verify everything is running:

```bash
docker compose ps
docker compose logs postgres --tail 20
```

## GPU Setup for Local Inference

If you run local models (via Ollama, vLLM, or text-generation-inference), GPU acceleration dramatically improves inference speed.

### NVIDIA GPU Setup on Ubuntu

```bash
# Install NVIDIA drivers
sudo apt update
sudo apt install -y nvidia-driver-550

# Verify driver installation
nvidia-smi

# Install NVIDIA Container Toolkit
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey   | sudo gpg --dearmor -o     /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list   | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g'   | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt update
sudo apt install -y nvidia-container-toolkit

# Configure Docker runtime
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

# Verify GPU access in Docker
docker run --rm --gpus all nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smi
```

### Running Local Models with Ollama

Once Docker GPU support is configured, run Ollama with GPU acceleration:

```bash
# Pull a model
docker compose exec ollama ollama pull llama3.3:8b

# Test inference
docker compose exec ollama ollama run llama3.3:8b   "Explain agentic AI in one paragraph"
```

Use the local model in your agent code by pointing to the Ollama API:

```python
from openai import OpenAI

# Ollama exposes an OpenAI-compatible API
local_client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama",  # Required but not validated
)

response = local_client.chat.completions.create(
    model="llama3.3:8b",
    messages=[
        {"role": "user", "content": "Hello!"}
    ],
)
print(response.choices[0].message.content)
```

This is useful for development and testing where you do not want to burn API credits for every debug iteration.

## Project Structure

A well-organized project structure makes navigation intuitive and testing straightforward:

```
my-agent-project/
├── app/
│   ├── __init__.py
│   ├── main.py              # FastAPI entry point
│   ├── agents/
│   │   ├── __init__.py
│   │   ├── base.py           # Agent base class
│   │   ├── triage.py         # Triage agent
│   │   └── specialists/
│   │       ├── support.py
│   │       └── billing.py
│   ├── tools/
│   │   ├── __init__.py
│   │   ├── database.py       # Database query tools
│   │   ├── email.py          # Email sending tools
│   │   └── search.py         # Knowledge base search
│   ├── models/
│   │   ├── __init__.py
│   │   └── schemas.py        # Pydantic models
│   └── core/
│       ├── config.py          # Settings management
│       ├── database.py        # DB connection
│       └── llm.py             # LLM client factory
├── tests/
│   ├── conftest.py
│   ├── test_tools/
│   ├── test_agents/
│   └── fixtures/
│       └── conversations.json
├── db/
│   ├── init.sql
│   └── migrations/
├── docker-compose.yml
├── Dockerfile
├── pyproject.toml
├── .env.example
├── .gitignore
└── README.md
```

## Debugging Tips for Agent Development

### Log Every LLM Interaction

The single most useful debugging technique for agentic AI is logging every LLM request and response. Use a middleware or wrapper:

```python
import structlog
import json

logger = structlog.get_logger()

def log_llm_call(messages, response, duration_ms):
    logger.info(
        "llm_call",
        model=response.model,
        input_tokens=response.usage.input_tokens,
        output_tokens=response.usage.output_tokens,
        stop_reason=response.stop_reason,
        duration_ms=duration_ms,
        tool_calls=[
            b.name for b in response.content
            if hasattr(b, "name")
        ],
    )
```

### Use Breakpoints in the Agent Loop

Set breakpoints at key points in the agent loop: after the LLM response, before tool execution, and after tool results are formatted. This lets you inspect the agent's reasoning at each step.

### Replay Conversations

Save conversation histories as JSON fixtures. When you encounter a bug, save the conversation state and replay it deterministically in tests. This is far more effective than trying to reproduce non-deterministic agent behavior manually.

## Frequently Asked Questions

### Do I need a GPU for agentic AI development?

No. Most agentic AI development uses cloud-hosted models (Claude, GPT-4o) via API calls, which require no local GPU. A GPU is only needed if you want to run local models (Llama, Mistral) for development and testing without API costs, or if you generate embeddings locally for RAG. A modern laptop with 16GB RAM is sufficient for most agentic AI development work. Consider using a cloud GPU instance (Lambda, RunPod, or a cloud provider) for occasional local model testing rather than investing in a dedicated GPU machine.

### What Python version should I use?

Use Python 3.11 or 3.12. Both the Anthropic and OpenAI SDKs require Python 3.9+, but 3.11 and 3.12 offer significant performance improvements and better error messages. Avoid Python 3.13 if you rely on libraries that have not yet updated their C extensions. Use `pyenv` to manage multiple Python versions and create virtual environments per project.

### Should I use virtual environments or Docker for Python dependencies?

Use both. Virtual environments (`venv` or `uv`) for local development give you fast iteration with IDE integration. Docker for running services (databases, vector stores, local models) that your agent depends on. Your agent code runs locally in the virtual environment and connects to Dockerized services. For deployment, package everything in Docker. This approach gives you the best developer experience while maintaining production parity.

### How do I manage multiple LLM API keys across projects?

Use a `.env` file per project with `python-dotenv` for loading. For shared keys across projects, use `direnv` with a `~/.envrc` file that exports common variables, or use a secrets manager like 1Password CLI. Never set API keys as global environment variables in your shell profile — this makes them available to every process on your machine, which is a security risk.

### How do I debug streaming agent responses?

Streaming complicates debugging because you cannot inspect the full response at a breakpoint. Two strategies: (1) Add a debug mode flag that disables streaming and uses the synchronous API instead, making the full response available for inspection. (2) Accumulate streamed chunks into a buffer and log the complete response after streaming finishes. Use the VS Code debug console to inspect the accumulated buffer at breakpoints after the stream completes.

---

Source: https://callsphere.ai/blog/agentic-ai-dev-environment-setup-vscode-docker-gpu
