Skip to content
Learn Agentic AI
Learn Agentic AI10 min read2 views

TypeScript AI Agent Testing: Vitest, Mock LLMs, and Snapshot Testing

Learn how to test AI agent applications in TypeScript. Covers Vitest setup, strategies for mocking LLM responses, snapshot testing for agent outputs, deterministic tool testing, and CI integration for reliable agent test suites.

The Testing Challenge with AI Agents

AI agents are inherently non-deterministic. The same prompt can produce different responses across runs, making traditional assertion-based testing unreliable. A robust agent testing strategy separates what you can test deterministically — tool execution, input validation, state management, routing logic — from what requires fuzzy evaluation — the quality and correctness of LLM-generated text.

This guide walks through practical patterns for testing TypeScript AI agents using Vitest.

Setting Up Vitest

Install Vitest and configure it for a TypeScript project:

flowchart TD
    START["TypeScript AI Agent Testing: Vitest, Mock LLMs, a…"] --> A
    A["The Testing Challenge with AI Agents"]
    A --> B
    B["Setting Up Vitest"]
    B --> C
    C["Mocking LLM Responses"]
    C --> D
    D["Testing Tool Execution Deterministically"]
    D --> E
    E["Testing the Agent Loop"]
    E --> F
    F["Snapshot Testing for Agent Outputs"]
    F --> G
    G["CI Integration"]
    G --> H
    H["FAQ"]
    H --> DONE["Key Takeaways"]
    style START fill:#4f46e5,stroke:#4338ca,color:#fff
    style DONE fill:#059669,stroke:#047857,color:#fff
npm install -D vitest @vitest/coverage-v8
// vitest.config.ts
import { defineConfig } from "vitest/config";
import path from "path";

export default defineConfig({
  test: {
    globals: true,
    environment: "node",
    coverage: {
      provider: "v8",
      include: ["src/**/*.ts"],
      exclude: ["src/**/*.test.ts"],
    },
    testTimeout: 30_000, // Agent tests may be slow
  },
  resolve: {
    alias: {
      "@": path.resolve(__dirname, "src"),
    },
  },
});

Mocking LLM Responses

The most important testing pattern is replacing the LLM client with a mock that returns predetermined responses:

// src/lib/__mocks__/openai-client.ts
import { vi } from "vitest";

export function createMockOpenAI() {
  return {
    chat: {
      completions: {
        create: vi.fn(),
      },
    },
  };
}

export function mockChatResponse(content: string, toolCalls?: any[]) {
  return {
    choices: [
      {
        message: {
          role: "assistant",
          content,
          tool_calls: toolCalls ?? null,
        },
        finish_reason: toolCalls ? "tool_calls" : "stop",
      },
    ],
    usage: { prompt_tokens: 100, completion_tokens: 50, total_tokens: 150 },
  };
}

export function mockToolCallResponse(name: string, args: object) {
  return mockChatResponse(null as any, [
    {
      id: "call_mock_123",
      type: "function",
      function: {
        name,
        arguments: JSON.stringify(args),
      },
    },
  ]);
}

Testing Tool Execution Deterministically

Tools are pure functions with defined inputs and outputs — test them directly:

// src/tools/weather.test.ts
import { describe, it, expect, vi } from "vitest";
import { weatherTool } from "./weather";

// Mock the external API
vi.mock("./weather-api", () => ({
  fetchWeather: vi.fn().mockResolvedValue({
    temperature: 22,
    condition: "sunny",
    humidity: 45,
  }),
}));

describe("weatherTool", () => {
  it("returns formatted weather data for valid city", async () => {
    const result = await weatherTool.execute({
      city: "San Francisco",
      units: "celsius",
    });

    expect(result).toEqual({
      temperature: 22,
      condition: "sunny",
      humidity: 45,
    });
  });

  it("validates input schema rejects empty city", () => {
    const parsed = weatherTool.inputSchema.safeParse({ city: "" });
    expect(parsed.success).toBe(false);
  });

  it("applies default units when not specified", () => {
    const parsed = weatherTool.inputSchema.safeParse({ city: "Tokyo" });
    expect(parsed.success).toBe(true);
    if (parsed.success) {
      expect(parsed.data.units).toBe("celsius");
    }
  });
});

Testing the Agent Loop

Test that the agent correctly orchestrates tool calls and handles multi-step conversations:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

// src/agent/support-agent.test.ts
import { describe, it, expect, vi, beforeEach } from "vitest";
import { runAgent } from "./support-agent";
import { createMockOpenAI, mockChatResponse, mockToolCallResponse } from "../lib/__mocks__/openai-client";

describe("Support Agent", () => {
  let mockClient: ReturnType<typeof createMockOpenAI>;

  beforeEach(() => {
    mockClient = createMockOpenAI();
  });

  it("calls search tool when user asks a question", async () => {
    // First call: model decides to search
    mockClient.chat.completions.create
      .mockResolvedValueOnce(
        mockToolCallResponse("search_docs", { query: "reset password" })
      )
      // Second call: model responds with answer
      .mockResolvedValueOnce(
        mockChatResponse("To reset your password, go to Settings > Security.")
      );

    const result = await runAgent(mockClient as any, "How do I reset my password?");

    expect(result.text).toContain("reset your password");
    expect(mockClient.chat.completions.create).toHaveBeenCalledTimes(2);
  });

  it("respects maximum iteration limit", async () => {
    // Model keeps calling tools indefinitely
    mockClient.chat.completions.create.mockResolvedValue(
      mockToolCallResponse("search_docs", { query: "something" })
    );

    const result = await runAgent(mockClient as any, "loop forever", { maxIterations: 3 });

    expect(result.text).toContain("maximum iterations");
    expect(mockClient.chat.completions.create).toHaveBeenCalledTimes(3);
  });
});

Snapshot Testing for Agent Outputs

When you want to catch unexpected changes in agent behavior without brittle exact-match assertions, use snapshots on structured outputs:

it("produces expected structured analysis", async () => {
  mockClient.chat.completions.create.mockResolvedValueOnce(
    mockChatResponse(JSON.stringify({
      sentiment: "positive",
      confidence: 0.92,
      topics: ["product", "pricing"],
    }))
  );

  const result = await analyzeText(mockClient as any, "Great product, fair price!");

  expect(result).toMatchSnapshot();
});

Run vitest --update to update snapshots when behavior intentionally changes. Review snapshot diffs in pull requests to catch unintended regressions.

CI Integration

Add agent tests to your CI pipeline:

# .github/workflows/test.yml
name: Agent Tests
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 20
      - run: npm ci
      - run: npx vitest run --coverage
      - uses: actions/upload-artifact@v4
        with:
          name: coverage
          path: coverage/

Because all LLM calls are mocked, these tests are fast, deterministic, and free — no API keys needed in CI.

FAQ

Should I ever test with real LLM API calls?

Yes, but separately from your main test suite. Run a small set of "smoke tests" or "evaluation tests" against the real API on a schedule (daily or pre-release). These tests use fuzzy assertions — checking that responses contain expected keywords or pass a rubric — rather than exact matches. Keep them in a separate test file with a longer timeout.

How do I test streaming responses?

Mock the streaming response as an async iterable. Create a helper that yields chunks with simulated delays. Test that your stream processing code correctly accumulates deltas, handles tool call fragments, and emits the final assembled message.

What code coverage target should I aim for?

Focus on 90%+ coverage for tool implementations, input validation, and routing logic. The agent loop orchestration should be covered by integration tests with mocked LLM responses. Do not chase coverage on thin wrapper code that just forwards calls to the LLM SDK.


#Testing #Vitest #TypeScript #AIAgents #Mocking #CICD #AgenticAI #LearnAI #AIEngineering

Share
C

Written by

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

Use Cases

Automating Client Document Collection: How AI Agents Chase Missing Tax Documents and Reduce Filing Delays

See how AI agents automate tax document collection — chasing missing W-2s, 1099s, and receipts via calls and texts to eliminate the #1 CPA bottleneck.

AI Interview Prep

7 MLOps & AI Deployment Interview Questions for 2026

Real MLOps and AI deployment interview questions from Google, Amazon, Meta, and Microsoft in 2026. Covers CI/CD for ML, model monitoring, quantization, continuous batching, serving infrastructure, and evaluation frameworks.

Learn Agentic AI

API Design for AI Agent Tool Functions: Best Practices and Anti-Patterns

How to design tool functions that LLMs can use effectively with clear naming, enum parameters, structured responses, informative error messages, and documentation.

Learn Agentic AI

Computer Use in GPT-5.4: Building AI Agents That Navigate Desktop Applications

Technical guide to GPT-5.4's computer use capabilities for building AI agents that interact with desktop UIs, browser automation, and real-world application workflows.

Learn Agentic AI

Prompt Engineering for AI Agents: System Prompts, Tool Descriptions, and Few-Shot Patterns

Agent-specific prompt engineering techniques: crafting effective system prompts, writing clear tool descriptions for function calling, and few-shot examples that improve complex task performance.

Learn Agentic AI

AI Agents for IT Helpdesk: L1 Automation, Ticket Routing, and Knowledge Base Integration

Build IT helpdesk AI agents with multi-agent architecture for triage, device, network, and security issues. RAG-powered knowledge base, automated ticket creation, routing, and escalation.