---
title: "Unit Testing AI Agents: Mocking LLM Calls for Fast, Deterministic Tests"
description: "Learn how to mock LLM API calls in your AI agent tests using FakeLLM objects, response fixtures, and assertion patterns for fast, deterministic, cost-free unit tests."
canonical: https://callsphere.ai/blog/unit-testing-ai-agents-mocking-llm-calls-deterministic-tests
category: "Learn Agentic AI"
tags: ["Unit Testing", "AI Agents", "Mocking", "pytest", "Python", "Testing"]
author: "CallSphere Team"
published: 2026-03-17T00:00:00.000Z
updated: 2026-05-07T01:29:46.110Z
---

# Unit Testing AI Agents: Mocking LLM Calls for Fast, Deterministic Tests

> Learn how to mock LLM API calls in your AI agent tests using FakeLLM objects, response fixtures, and assertion patterns for fast, deterministic, cost-free unit tests.

## Why Unit Testing Agents Requires Special Patterns

AI agents depend on LLM calls that are non-deterministic, slow, and expensive. A single GPT-4 call takes 2-10 seconds and costs tokens — making it impractical to run hundreds of tests on every commit. Unit tests must be fast, free, and repeatable, which means you need a strategy for replacing real LLM calls with controlled substitutes.

The core challenge is that LLM outputs vary between calls even with `temperature=0`. Your tests need to verify your agent's logic — tool selection, state management, output parsing — without coupling to the exact wording an LLM produces.

## Strategy 1: FakeLLM Classes

Create a drop-in replacement for your LLM client that returns predetermined responses.

```mermaid
flowchart LR
    PR(["PR opened"])
    UNIT["Unit tests"]
    EVAL["Eval harness
PromptFoo or Braintrust"]
    GOLD[("Golden set
200 tagged cases")]
    JUDGE["LLM as judge
plus regex graders"]
    SCORE["Aggregate score
and per slice"]
    GATE{"Score regress
more than 2 percent?"}
    BLOCK(["Block merge"])
    MERGE(["Merge to main"])
    PR --> UNIT --> EVAL --> GOLD --> JUDGE --> SCORE --> GATE
    GATE -->|Yes| BLOCK
    GATE -->|No| MERGE
    style EVAL fill:#4f46e5,stroke:#4338ca,color:#fff
    style GATE fill:#f59e0b,stroke:#d97706,color:#1f2937
    style BLOCK fill:#dc2626,stroke:#b91c1c,color:#fff
    style MERGE fill:#059669,stroke:#047857,color:#fff
```

```python
from dataclasses import dataclass, field
from typing import Any

@dataclass
class FakeLLM:
    """A deterministic LLM replacement for unit tests."""
    responses: list[str] = field(default_factory=list)
    call_log: list[dict] = field(default_factory=list)
    _call_index: int = 0

    def chat(self, messages: list[dict], **kwargs) -> dict:
        self.call_log.append({"messages": messages, **kwargs})
        response = self.responses[self._call_index]
        self._call_index += 1
        return {"role": "assistant", "content": response}
```

This pattern lets you pre-load a sequence of responses and later inspect exactly what your agent sent to the LLM.

## Strategy 2: Response Fixtures with pytest

Store realistic LLM responses as fixtures so multiple tests can share them.

```python
import pytest
import json
from pathlib import Path

@pytest.fixture
def tool_call_response():
    """Fixture simulating an LLM response that invokes a tool."""
    return {
        "role": "assistant",
        "content": None,
        "tool_calls": [
            {
                "id": "call_abc123",
                "type": "function",
                "function": {
                    "name": "search_database",
                    "arguments": json.dumps({"query": "open tickets", "limit": 10}),
                },
            }
        ],
    }

@pytest.fixture
def fixture_dir():
    return Path(__file__).parent / "fixtures"

def load_fixture(fixture_dir: Path, name: str) -> dict:
    return json.loads((fixture_dir / f"{name}.json").read_text())
```

Storing fixtures as JSON files in a `tests/fixtures/` directory keeps tests clean and makes it easy to update expected responses when your prompts change.

## Strategy 3: Patching with unittest.mock

Use `unittest.mock.patch` to intercept LLM calls at the boundary.

```python
from unittest.mock import patch, MagicMock
from my_agent.core import Agent

def test_agent_extracts_entities():
    fake_response = MagicMock()
    fake_response.choices = [
        MagicMock(message=MagicMock(
            content='{"entities": ["Acme Corp", "Jane Doe"]}',
            tool_calls=None,
        ))
    ]

    with patch("my_agent.core.openai_client.chat.completions.create") as mock_create:
        mock_create.return_value = fake_response
        agent = Agent()
        result = agent.extract_entities("Contact Jane Doe at Acme Corp")

    assert result == ["Acme Corp", "Jane Doe"]
    mock_create.assert_called_once()
    call_args = mock_create.call_args
    assert any("extract" in str(m) for m in call_args.kwargs["messages"])
```

## Assertion Patterns for Agent Tests

Focus your assertions on what your code controls, not on LLM output text.

```python
def test_agent_selects_correct_tool(fake_llm):
    """Verify the agent passes the right tools to the LLM."""
    fake_llm.responses = ['{"action": "search", "query": "test"}']
    agent = Agent(llm=fake_llm)

    agent.run("Find recent orders")

    call = fake_llm.call_log[0]
    tool_names = [t["function"]["name"] for t in call["tools"]]
    assert "search_orders" in tool_names
    assert "delete_account" not in tool_names  # safety check

def test_agent_retries_on_parse_failure(fake_llm):
    """Verify retry logic when LLM returns malformed JSON."""
    fake_llm.responses = ["not json", '{"action": "search"}']
    agent = Agent(llm=fake_llm, max_retries=2)

    result = agent.run("Find orders")

    assert len(fake_llm.call_log) == 2  # retried once
    assert result["action"] == "search"
```

## FAQ

### How do I handle streaming responses in unit tests?

Create an async generator fixture that yields predetermined chunks. Replace the streaming client method with this generator using `patch`. This lets you test your chunk-assembly logic without a real stream.

### Should I use `temperature=0` instead of mocking?

Setting `temperature=0` reduces variance but does not eliminate it — model updates can still change outputs. It also still costs tokens and takes seconds per call. Use `temperature=0` for integration tests, but always mock for unit tests.

### How many response fixtures should I maintain?

Keep a small, representative set: one normal response, one tool-call response, one refusal, one malformed response, and one empty response. Five to ten fixtures cover most agent logic paths without becoming a maintenance burden.

---

#UnitTesting #AIAgents #Mocking #Pytest #Python #Testing #AgenticAI #LearnAI #AIEngineering

---

Source: https://callsphere.ai/blog/unit-testing-ai-agents-mocking-llm-calls-deterministic-tests