---
title: "TypeScript AI Agent Testing: Vitest, Mock LLMs, and Snapshot Testing"
description: "Learn how to test AI agent applications in TypeScript. Covers Vitest setup, strategies for mocking LLM responses, snapshot testing for agent outputs, deterministic tool testing, and CI integration for reliable agent test suites."
canonical: https://callsphere.ai/blog/typescript-ai-agent-testing-vitest-mock-llms-snapshot-testing
category: "Learn Agentic AI"
tags: ["Testing", "Vitest", "TypeScript", "AI Agents", "Mocking", "CI/CD"]
author: "CallSphere Team"
published: 2026-03-17T00:00:00.000Z
updated: 2026-05-06T01:02:44.082Z
---

# TypeScript AI Agent Testing: Vitest, Mock LLMs, and Snapshot Testing

> Learn how to test AI agent applications in TypeScript. Covers Vitest setup, strategies for mocking LLM responses, snapshot testing for agent outputs, deterministic tool testing, and CI integration for reliable agent test suites.

## The Testing Challenge with AI Agents

AI agents are inherently non-deterministic. The same prompt can produce different responses across runs, making traditional assertion-based testing unreliable. A robust agent testing strategy separates what you can test deterministically — tool execution, input validation, state management, routing logic — from what requires fuzzy evaluation — the quality and correctness of LLM-generated text.

This guide walks through practical patterns for testing TypeScript AI agents using Vitest.

## Setting Up Vitest

Install Vitest and configure it for a TypeScript project:

```mermaid
flowchart LR
    PR(["PR opened"])
    UNIT["Unit tests"]
    EVAL["Eval harness
PromptFoo or Braintrust"]
    GOLD[("Golden set
200 tagged cases")]
    JUDGE["LLM as judge
plus regex graders"]
    SCORE["Aggregate score
and per slice"]
    GATE{"Score regress
more than 2 percent?"}
    BLOCK(["Block merge"])
    MERGE(["Merge to main"])
    PR --> UNIT --> EVAL --> GOLD --> JUDGE --> SCORE --> GATE
    GATE -->|Yes| BLOCK
    GATE -->|No| MERGE
    style EVAL fill:#4f46e5,stroke:#4338ca,color:#fff
    style GATE fill:#f59e0b,stroke:#d97706,color:#1f2937
    style BLOCK fill:#dc2626,stroke:#b91c1c,color:#fff
    style MERGE fill:#059669,stroke:#047857,color:#fff
```

```bash
npm install -D vitest @vitest/coverage-v8
```

```typescript
// vitest.config.ts
import { defineConfig } from "vitest/config";
import path from "path";

export default defineConfig({
  test: {
    globals: true,
    environment: "node",
    coverage: {
      provider: "v8",
      include: ["src/**/*.ts"],
      exclude: ["src/**/*.test.ts"],
    },
    testTimeout: 30_000, // Agent tests may be slow
  },
  resolve: {
    alias: {
      "@": path.resolve(__dirname, "src"),
    },
  },
});
```

## Mocking LLM Responses

The most important testing pattern is replacing the LLM client with a mock that returns predetermined responses:

```typescript
// src/lib/__mocks__/openai-client.ts
import { vi } from "vitest";

export function createMockOpenAI() {
  return {
    chat: {
      completions: {
        create: vi.fn(),
      },
    },
  };
}

export function mockChatResponse(content: string, toolCalls?: any[]) {
  return {
    choices: [
      {
        message: {
          role: "assistant",
          content,
          tool_calls: toolCalls ?? null,
        },
        finish_reason: toolCalls ? "tool_calls" : "stop",
      },
    ],
    usage: { prompt_tokens: 100, completion_tokens: 50, total_tokens: 150 },
  };
}

export function mockToolCallResponse(name: string, args: object) {
  return mockChatResponse(null as any, [
    {
      id: "call_mock_123",
      type: "function",
      function: {
        name,
        arguments: JSON.stringify(args),
      },
    },
  ]);
}
```

## Testing Tool Execution Deterministically

Tools are pure functions with defined inputs and outputs — test them directly:

```typescript
// src/tools/weather.test.ts
import { describe, it, expect, vi } from "vitest";
import { weatherTool } from "./weather";

// Mock the external API
vi.mock("./weather-api", () => ({
  fetchWeather: vi.fn().mockResolvedValue({
    temperature: 22,
    condition: "sunny",
    humidity: 45,
  }),
}));

describe("weatherTool", () => {
  it("returns formatted weather data for valid city", async () => {
    const result = await weatherTool.execute({
      city: "San Francisco",
      units: "celsius",
    });

    expect(result).toEqual({
      temperature: 22,
      condition: "sunny",
      humidity: 45,
    });
  });

  it("validates input schema rejects empty city", () => {
    const parsed = weatherTool.inputSchema.safeParse({ city: "" });
    expect(parsed.success).toBe(false);
  });

  it("applies default units when not specified", () => {
    const parsed = weatherTool.inputSchema.safeParse({ city: "Tokyo" });
    expect(parsed.success).toBe(true);
    if (parsed.success) {
      expect(parsed.data.units).toBe("celsius");
    }
  });
});
```

## Testing the Agent Loop

Test that the agent correctly orchestrates tool calls and handles multi-step conversations:

```typescript
// src/agent/support-agent.test.ts
import { describe, it, expect, vi, beforeEach } from "vitest";
import { runAgent } from "./support-agent";
import { createMockOpenAI, mockChatResponse, mockToolCallResponse } from "../lib/__mocks__/openai-client";

describe("Support Agent", () => {
  let mockClient: ReturnType;

  beforeEach(() => {
    mockClient = createMockOpenAI();
  });

  it("calls search tool when user asks a question", async () => {
    // First call: model decides to search
    mockClient.chat.completions.create
      .mockResolvedValueOnce(
        mockToolCallResponse("search_docs", { query: "reset password" })
      )
      // Second call: model responds with answer
      .mockResolvedValueOnce(
        mockChatResponse("To reset your password, go to Settings > Security.")
      );

    const result = await runAgent(mockClient as any, "How do I reset my password?");

    expect(result.text).toContain("reset your password");
    expect(mockClient.chat.completions.create).toHaveBeenCalledTimes(2);
  });

  it("respects maximum iteration limit", async () => {
    // Model keeps calling tools indefinitely
    mockClient.chat.completions.create.mockResolvedValue(
      mockToolCallResponse("search_docs", { query: "something" })
    );

    const result = await runAgent(mockClient as any, "loop forever", { maxIterations: 3 });

    expect(result.text).toContain("maximum iterations");
    expect(mockClient.chat.completions.create).toHaveBeenCalledTimes(3);
  });
});
```

## Snapshot Testing for Agent Outputs

When you want to catch unexpected changes in agent behavior without brittle exact-match assertions, use snapshots on structured outputs:

```typescript
it("produces expected structured analysis", async () => {
  mockClient.chat.completions.create.mockResolvedValueOnce(
    mockChatResponse(JSON.stringify({
      sentiment: "positive",
      confidence: 0.92,
      topics: ["product", "pricing"],
    }))
  );

  const result = await analyzeText(mockClient as any, "Great product, fair price!");

  expect(result).toMatchSnapshot();
});
```

Run `vitest --update` to update snapshots when behavior intentionally changes. Review snapshot diffs in pull requests to catch unintended regressions.

## CI Integration

Add agent tests to your CI pipeline:

```yaml
# .github/workflows/test.yml
name: Agent Tests
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 20
      - run: npm ci
      - run: npx vitest run --coverage
      - uses: actions/upload-artifact@v4
        with:
          name: coverage
          path: coverage/
```

Because all LLM calls are mocked, these tests are fast, deterministic, and free — no API keys needed in CI.

## FAQ

### Should I ever test with real LLM API calls?

Yes, but separately from your main test suite. Run a small set of "smoke tests" or "evaluation tests" against the real API on a schedule (daily or pre-release). These tests use fuzzy assertions — checking that responses contain expected keywords or pass a rubric — rather than exact matches. Keep them in a separate test file with a longer timeout.

### How do I test streaming responses?

Mock the streaming response as an async iterable. Create a helper that yields chunks with simulated delays. Test that your stream processing code correctly accumulates deltas, handles tool call fragments, and emits the final assembled message.

### What code coverage target should I aim for?

Focus on 90%+ coverage for tool implementations, input validation, and routing logic. The agent loop orchestration should be covered by integration tests with mocked LLM responses. Do not chase coverage on thin wrapper code that just forwards calls to the LLM SDK.

---

#Testing #Vitest #TypeScript #AIAgents #Mocking #CICD #AgenticAI #LearnAI #AIEngineering

---

Source: https://callsphere.ai/blog/typescript-ai-agent-testing-vitest-mock-llms-snapshot-testing
