FastAPI Testing for AI Agent APIs: pytest, httpx, and Mock Strategies

The Testing Challenge for AI Agent APIs

Testing AI agent APIs is harder than testing typical CRUD endpoints because of external dependencies. Your endpoints call LLM APIs that are non-deterministic, expensive, and rate-limited. They read from vector databases, write to conversation stores, and may trigger background processing. A good test strategy mocks the expensive external calls while keeping everything else as real as possible.

The goal is a test suite that runs in seconds, costs nothing in API fees, and catches real bugs in your request handling, validation, error handling, and business logic.

Setting Up pytest for FastAPI

Install the testing dependencies:

flowchart LR
    PR(["PR opened"])
    UNIT["Unit tests"]
    EVAL["Eval harness<br/>PromptFoo or Braintrust"]
    GOLD[("Golden set<br/>200 tagged cases")]
    JUDGE["LLM as judge<br/>plus regex graders"]
    SCORE["Aggregate score<br/>and per slice"]
    GATE{"Score regress<br/>more than 2 percent?"}
    BLOCK(["Block merge"])
    MERGE(["Merge to main"])
    PR --> UNIT --> EVAL --> GOLD --> JUDGE --> SCORE --> GATE
    GATE -->|Yes| BLOCK
    GATE -->|No| MERGE
    style EVAL fill:#4f46e5,stroke:#4338ca,color:#fff
    style GATE fill:#f59e0b,stroke:#d97706,color:#1f2937
    style BLOCK fill:#dc2626,stroke:#b91c1c,color:#fff
    style MERGE fill:#059669,stroke:#047857,color:#fff

pip install pytest pytest-asyncio httpx

Configure pytest in your pyproject.toml:

[tool.pytest.ini_options]
asyncio_mode = "auto"
testpaths = ["tests"]

Create your test fixtures in tests/conftest.py:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

import pytest
from httpx import AsyncClient, ASGITransport
from sqlalchemy.ext.asyncio import (
    create_async_engine,
    async_sessionmaker,
)

from app.main import app
from app.dependencies import get_db, get_llm_service

# Test database
TEST_DB_URL = "sqlite+aiosqlite:///./test.db"
test_engine = create_async_engine(TEST_DB_URL)
test_session_factory = async_sessionmaker(
    test_engine, expire_on_commit=False
)

async def get_test_db():
    async with test_session_factory() as session:
        try:
            yield session
            await session.commit()
        except Exception:
            await session.rollback()
            raise

@pytest.fixture(autouse=True)
async def setup_database():
    async with test_engine.begin() as conn:
        await conn.run_sync(Base.metadata.create_all)
    yield
    async with test_engine.begin() as conn:
        await conn.run_sync(Base.metadata.drop_all)

@pytest.fixture
async def client():
    app.dependency_overrides[get_db] = get_test_db
    app.dependency_overrides[get_llm_service] = (
        lambda: MockLLMService()
    )
    transport = ASGITransport(app=app)
    async with AsyncClient(
        transport=transport,
        base_url="http://test",
    ) as ac:
        yield ac
    app.dependency_overrides.clear()

Mock LLM Service

Create a deterministic mock that replaces real LLM calls:

class MockLLMService:
    def __init__(self):
        self.calls = []
        self.response_text = "This is a mock agent response."

    async def generate(self, messages: list[dict]) -> str:
        self.calls.append(messages)
        return self.response_text

    async def stream_generate(self, message: str):
        self.calls.append(message)
        for word in self.response_text.split():
            yield word + " "

    def set_response(self, text: str):
        self.response_text = text

    def set_error(self, error: Exception):
        self._error = error

    async def generate_with_error(self, messages):
        if hasattr(self, "_error"):
            raise self._error
        return await self.generate(messages)

This mock records every call for assertion and lets tests configure specific responses or errors.

Testing Basic Endpoints

Write tests for your agent chat endpoint:

async def test_chat_returns_response(client):
    response = await client.post(
        "/agents/chat",
        json={
            "messages": [
                {"role": "user", "content": "Hello"}
            ],
            "session_id": "test-123",
        },
    )
    assert response.status_code == 200
    data = response.json()
    assert "response" in data
    assert len(data["response"]) > 0

async def test_chat_validates_empty_messages(client):
    response = await client.post(
        "/agents/chat",
        json={"messages": [], "session_id": "test-123"},
    )
    assert response.status_code == 422

async def test_chat_validates_message_format(client):
    response = await client.post(
        "/agents/chat",
        json={
            "messages": [
                {"role": "invalid_role", "content": "Hello"}
            ],
        },
    )
    assert response.status_code == 422

async def test_chat_rejects_missing_auth(client):
    # Remove default auth header if set
    response = await client.post(
        "/agents/chat",
        json={
            "messages": [
                {"role": "user", "content": "Hello"}
            ],
        },
        headers={"Authorization": ""},
    )
    assert response.status_code == 401

Testing Streaming Endpoints

Streaming endpoints require reading the response body as a stream:

async def test_stream_chat_returns_tokens(client):
    response = await client.post(
        "/agents/chat/stream",
        json={
            "messages": [
                {"role": "user", "content": "Hello"}
            ],
        },
    )
    assert response.status_code == 200

    # For SSE, parse the event stream
    body = response.text
    assert "data:" in body

    # Extract all data lines
    data_lines = [
        line.split("data: ", 1)[1]
        for line in body.split("\n")
        if line.startswith("data: ")
    ]
    assert len(data_lines) > 0

Testing with Database State

Tests that depend on existing data should set up state through fixtures or helper functions:

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

async def test_get_conversation_history(client):
    # Create a conversation first
    create_response = await client.post(
        "/conversations",
        json={"agent_type": "assistant"},
    )
    conversation_id = create_response.json()["id"]

    # Send some messages
    await client.post(
        "/agents/chat",
        json={
            "messages": [
                {"role": "user", "content": "First message"}
            ],
            "session_id": conversation_id,
        },
    )

    # Fetch history
    history_response = await client.get(
        f"/conversations/{conversation_id}/history"
    )
    assert history_response.status_code == 200
    messages = history_response.json()["messages"]
    assert len(messages) >= 2  # user + assistant

async def test_conversation_not_found(client):
    response = await client.get(
        "/conversations/nonexistent-id/history"
    )
    assert response.status_code == 404

Testing Error Scenarios

Deliberately trigger error conditions to verify your error handling:

async def test_llm_timeout_returns_503(client):
    import asyncio

    class TimeoutLLMService:
        async def generate(self, messages):
            raise asyncio.TimeoutError("LLM request timed out")

    app.dependency_overrides[get_llm_service] = (
        lambda: TimeoutLLMService()
    )

    response = await client.post(
        "/agents/chat",
        json={
            "messages": [
                {"role": "user", "content": "Hello"}
            ],
        },
    )
    assert response.status_code == 503
    assert "timeout" in response.json()["error"].lower()

async def test_rate_limit_returns_429(client):
    class RateLimitedLLMService:
        async def generate(self, messages):
            from openai import RateLimitError
            raise RateLimitError(
                "Rate limit exceeded",
                response=None,
                body=None,
            )

    app.dependency_overrides[get_llm_service] = (
        lambda: RateLimitedLLMService()
    )

    response = await client.post(
        "/agents/chat",
        json={
            "messages": [
                {"role": "user", "content": "Hello"}
            ],
        },
    )
    assert response.status_code == 429

Parameterized Tests for Agent Types

Use pytest parametrize to test multiple agent configurations with the same test logic:

@pytest.mark.parametrize("agent_type", [
    "assistant", "researcher", "coder",
])
async def test_all_agent_types_respond(client, agent_type):
    response = await client.post(
        f"/agents/{agent_type}/chat",
        json={
            "messages": [
                {"role": "user", "content": "Hello"}
            ],
        },
    )
    assert response.status_code == 200
    assert "response" in response.json()

FAQ

Should I test with a real database or mock it?

Use a real test database, not a mock. Mocking the database hides SQL errors, missing columns, constraint violations, and query logic bugs. Use an in-memory SQLite database for fast tests or a dedicated PostgreSQL test database for integration tests. Create and drop all tables per test using the setup_database fixture to ensure test isolation. The test database approach catches real bugs that mocks would miss.

How do I test that my mock LLM service was called with the correct prompt?

Record calls in your mock service and assert against them. The MockLLMService shown above stores every call in a self.calls list. After your test makes a request, access the mock from the dependency override and check mock_llm.calls[-1] to verify the messages passed to the LLM. This lets you verify that your endpoint correctly constructs the prompt with conversation history, system prompts, and context.

How do I run only async tests with pytest?

With pytest-asyncio and asyncio_mode = "auto" in your config, any async def test_* function is automatically treated as an async test. You do not need the @pytest.mark.asyncio decorator when using auto mode. Run all tests with pytest tests/ and they will execute correctly whether sync or async.

#FastAPI #Testing #Pytest #AIAgents #Mock #AgenticAI #LearnAI #AIEngineering

FastAPI Testing for AI Agent APIs: pytest, httpx, and Mock Strategies

The Testing Challenge for AI Agent APIs

Setting Up pytest for FastAPI

Mock LLM Service

Testing Basic Endpoints

Testing Streaming Endpoints

Testing with Database State

Testing Error Scenarios

Parameterized Tests for Agent Types

FAQ

Should I test with a real database or mock it?

How do I test that my mock LLM service was called with the correct prompt?

How do I run only async tests with pytest?

Try CallSphere AI Voice Agents

Related Articles You May Like

Personal AI Assistant: How to Pick One for Business in 2026

Free AI Agents in 2026: When Free Wins and When It Costs You

Graphiti: How Temporal Knowledge Graphs Give AI Voice Agents Persistent Memory (2026 Guide)

Chatbot App vs ChatGPT: What's the Difference, and Which Do I Need?

Building an HVAC After-Hours Emergency Escalation System: A Complete Engineering Guide

WebRTC Mobile Testing with BrowserStack + Sauce Labs (2026)