Skip to content
FastAPI Testing for AI Agent APIs: pytest, httpx, and Mock Strategies
Learn Agentic AI12 min read18 views

FastAPI Testing for AI Agent APIs: pytest, httpx, and Mock Strategies

Write comprehensive tests for AI agent APIs using pytest and httpx. Covers TestClient usage, async test patterns, fixture design for database and LLM mocking, and strategies for testing streaming endpoints.

The Testing Challenge for AI Agent APIs

Testing AI agent APIs is harder than testing typical CRUD endpoints because of external dependencies. Your endpoints call LLM APIs that are non-deterministic, expensive, and rate-limited. They read from vector databases, write to conversation stores, and may trigger background processing. A good test strategy mocks the expensive external calls while keeping everything else as real as possible.

The goal is a test suite that runs in seconds, costs nothing in API fees, and catches real bugs in your request handling, validation, error handling, and business logic.

Setting Up pytest for FastAPI

Install the testing dependencies:

flowchart LR
    PR(["PR opened"])
    UNIT["Unit tests"]
    EVAL["Eval harness<br/>PromptFoo or Braintrust"]
    GOLD[("Golden set<br/>200 tagged cases")]
    JUDGE["LLM as judge<br/>plus regex graders"]
    SCORE["Aggregate score<br/>and per slice"]
    GATE{"Score regress<br/>more than 2 percent?"}
    BLOCK(["Block merge"])
    MERGE(["Merge to main"])
    PR --> UNIT --> EVAL --> GOLD --> JUDGE --> SCORE --> GATE
    GATE -->|Yes| BLOCK
    GATE -->|No| MERGE
    style EVAL fill:#4f46e5,stroke:#4338ca,color:#fff
    style GATE fill:#f59e0b,stroke:#d97706,color:#1f2937
    style BLOCK fill:#dc2626,stroke:#b91c1c,color:#fff
    style MERGE fill:#059669,stroke:#047857,color:#fff
pip install pytest pytest-asyncio httpx

Configure pytest in your pyproject.toml:

[tool.pytest.ini_options]
asyncio_mode = "auto"
testpaths = ["tests"]

Create your test fixtures in tests/conftest.py:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
import pytest
from httpx import AsyncClient, ASGITransport
from sqlalchemy.ext.asyncio import (
    create_async_engine,
    async_sessionmaker,
)

from app.main import app
from app.dependencies import get_db, get_llm_service

# Test database
TEST_DB_URL = "sqlite+aiosqlite:///./test.db"
test_engine = create_async_engine(TEST_DB_URL)
test_session_factory = async_sessionmaker(
    test_engine, expire_on_commit=False
)

async def get_test_db():
    async with test_session_factory() as session:
        try:
            yield session
            await session.commit()
        except Exception:
            await session.rollback()
            raise

@pytest.fixture(autouse=True)
async def setup_database():
    async with test_engine.begin() as conn:
        await conn.run_sync(Base.metadata.create_all)
    yield
    async with test_engine.begin() as conn:
        await conn.run_sync(Base.metadata.drop_all)

@pytest.fixture
async def client():
    app.dependency_overrides[get_db] = get_test_db
    app.dependency_overrides[get_llm_service] = (
        lambda: MockLLMService()
    )
    transport = ASGITransport(app=app)
    async with AsyncClient(
        transport=transport,
        base_url="http://test",
    ) as ac:
        yield ac
    app.dependency_overrides.clear()

Mock LLM Service

Create a deterministic mock that replaces real LLM calls:

class MockLLMService:
    def __init__(self):
        self.calls = []
        self.response_text = "This is a mock agent response."

    async def generate(self, messages: list[dict]) -> str:
        self.calls.append(messages)
        return self.response_text

    async def stream_generate(self, message: str):
        self.calls.append(message)
        for word in self.response_text.split():
            yield word + " "

    def set_response(self, text: str):
        self.response_text = text

    def set_error(self, error: Exception):
        self._error = error

    async def generate_with_error(self, messages):
        if hasattr(self, "_error"):
            raise self._error
        return await self.generate(messages)

This mock records every call for assertion and lets tests configure specific responses or errors.

Testing Basic Endpoints

Write tests for your agent chat endpoint:

async def test_chat_returns_response(client):
    response = await client.post(
        "/agents/chat",
        json={
            "messages": [
                {"role": "user", "content": "Hello"}
            ],
            "session_id": "test-123",
        },
    )
    assert response.status_code == 200
    data = response.json()
    assert "response" in data
    assert len(data["response"]) > 0

async def test_chat_validates_empty_messages(client):
    response = await client.post(
        "/agents/chat",
        json={"messages": [], "session_id": "test-123"},
    )
    assert response.status_code == 422

async def test_chat_validates_message_format(client):
    response = await client.post(
        "/agents/chat",
        json={
            "messages": [
                {"role": "invalid_role", "content": "Hello"}
            ],
        },
    )
    assert response.status_code == 422

async def test_chat_rejects_missing_auth(client):
    # Remove default auth header if set
    response = await client.post(
        "/agents/chat",
        json={
            "messages": [
                {"role": "user", "content": "Hello"}
            ],
        },
        headers={"Authorization": ""},
    )
    assert response.status_code == 401

Testing Streaming Endpoints

Streaming endpoints require reading the response body as a stream:

async def test_stream_chat_returns_tokens(client):
    response = await client.post(
        "/agents/chat/stream",
        json={
            "messages": [
                {"role": "user", "content": "Hello"}
            ],
        },
    )
    assert response.status_code == 200

    # For SSE, parse the event stream
    body = response.text
    assert "data:" in body

    # Extract all data lines
    data_lines = [
        line.split("data: ", 1)[1]
        for line in body.split("\n")
        if line.startswith("data: ")
    ]
    assert len(data_lines) > 0

Testing with Database State

Tests that depend on existing data should set up state through fixtures or helper functions:

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

async def test_get_conversation_history(client):
    # Create a conversation first
    create_response = await client.post(
        "/conversations",
        json={"agent_type": "assistant"},
    )
    conversation_id = create_response.json()["id"]

    # Send some messages
    await client.post(
        "/agents/chat",
        json={
            "messages": [
                {"role": "user", "content": "First message"}
            ],
            "session_id": conversation_id,
        },
    )

    # Fetch history
    history_response = await client.get(
        f"/conversations/{conversation_id}/history"
    )
    assert history_response.status_code == 200
    messages = history_response.json()["messages"]
    assert len(messages) >= 2  # user + assistant

async def test_conversation_not_found(client):
    response = await client.get(
        "/conversations/nonexistent-id/history"
    )
    assert response.status_code == 404

Testing Error Scenarios

Deliberately trigger error conditions to verify your error handling:

async def test_llm_timeout_returns_503(client):
    import asyncio

    class TimeoutLLMService:
        async def generate(self, messages):
            raise asyncio.TimeoutError("LLM request timed out")

    app.dependency_overrides[get_llm_service] = (
        lambda: TimeoutLLMService()
    )

    response = await client.post(
        "/agents/chat",
        json={
            "messages": [
                {"role": "user", "content": "Hello"}
            ],
        },
    )
    assert response.status_code == 503
    assert "timeout" in response.json()["error"].lower()

async def test_rate_limit_returns_429(client):
    class RateLimitedLLMService:
        async def generate(self, messages):
            from openai import RateLimitError
            raise RateLimitError(
                "Rate limit exceeded",
                response=None,
                body=None,
            )

    app.dependency_overrides[get_llm_service] = (
        lambda: RateLimitedLLMService()
    )

    response = await client.post(
        "/agents/chat",
        json={
            "messages": [
                {"role": "user", "content": "Hello"}
            ],
        },
    )
    assert response.status_code == 429

Parameterized Tests for Agent Types

Use pytest parametrize to test multiple agent configurations with the same test logic:

@pytest.mark.parametrize("agent_type", [
    "assistant", "researcher", "coder",
])
async def test_all_agent_types_respond(client, agent_type):
    response = await client.post(
        f"/agents/{agent_type}/chat",
        json={
            "messages": [
                {"role": "user", "content": "Hello"}
            ],
        },
    )
    assert response.status_code == 200
    assert "response" in response.json()

FAQ

Should I test with a real database or mock it?

Use a real test database, not a mock. Mocking the database hides SQL errors, missing columns, constraint violations, and query logic bugs. Use an in-memory SQLite database for fast tests or a dedicated PostgreSQL test database for integration tests. Create and drop all tables per test using the setup_database fixture to ensure test isolation. The test database approach catches real bugs that mocks would miss.

How do I test that my mock LLM service was called with the correct prompt?

Record calls in your mock service and assert against them. The MockLLMService shown above stores every call in a self.calls list. After your test makes a request, access the mock from the dependency override and check mock_llm.calls[-1] to verify the messages passed to the LLM. This lets you verify that your endpoint correctly constructs the prompt with conversation history, system prompts, and context.

How do I run only async tests with pytest?

With pytest-asyncio and asyncio_mode = "auto" in your config, any async def test_* function is automatically treated as an async test. You do not need the @pytest.mark.asyncio decorator when using auto mode. Run all tests with pytest tests/ and they will execute correctly whether sync or async.


#FastAPI #Testing #Pytest #AIAgents #Mock #AgenticAI #LearnAI #AIEngineering

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

AI Agents

Personal AI Assistant: How to Pick One for Business in 2026

A founder's guide to the personal AI assistant market: best AI assistant apps, business-grade options, and how CallSphere's voice agent fits in.

AI Agents

Free AI Agents in 2026: When Free Wins and When It Costs You

A founder's guide to free AI agents, low-code AI agent builders, and how to know when you should pay for a real platform like CallSphere.

Agentic AI

Graphiti: How Temporal Knowledge Graphs Give AI Voice Agents Persistent Memory (2026 Guide)

Graphiti is the open-source temporal knowledge graph for AI agents in 2026. Learn how bi-temporal memory beats vector RAG for voice agents and long-running LLMs.

AI Agents

Chatbot App vs ChatGPT: What's the Difference, and Which Do I Need?

Chatbot app vs ChatGPT in 2026: a founder's clear take on the difference, when to use which, and how a real AI chatbot app development works.

HVAC

Building an HVAC After-Hours Emergency Escalation System: A Complete Engineering Guide

How we built a fault-tolerant HVAC emergency triage and tech-dispatch platform on Kubernetes — three-tier CQRS, 11 micro-agents on the OpenAI Agents SDK + LangGraph, NATS JetStream, DTMF/SMS/WebSocket acceptance, circuit breakers, and an evaluation pipeline that catches regressions before they wake a tech at 3 AM.

AI Voice Agents

WebRTC Mobile Testing with BrowserStack + Sauce Labs (2026)

BrowserStack offers 30,000+ real devices; Sauce Labs ships deep Appium automation. Here is how AI voice agent teams use both for WebRTC mobile QA in 2026.