---
title: "FastAPI Dependency Injection for AI Agents: Managing LLM Clients and Sessions"
description: "Master FastAPI's Depends system to inject LLM clients, database sessions, and agent configurations into your AI agent endpoints. Covers scoped dependencies, sub-dependencies, and testing with overrides."
canonical: https://callsphere.ai/blog/fastapi-dependency-injection-ai-agents-llm-clients-sessions
category: "Learn Agentic AI"
tags: ["FastAPI", "Dependency Injection", "AI Agents", "Python", "Testing"]
author: "CallSphere Team"
published: 2026-03-17T00:00:00.000Z
updated: 2026-05-06T01:02:45.174Z
---

# FastAPI Dependency Injection for AI Agents: Managing LLM Clients and Sessions

> Master FastAPI's Depends system to inject LLM clients, database sessions, and agent configurations into your AI agent endpoints. Covers scoped dependencies, sub-dependencies, and testing with overrides.

## Why Dependency Injection Matters for AI Agents

AI agent backends have several dependencies that need careful lifecycle management: LLM clients that should be shared across requests, database sessions that must be scoped to a single request and properly closed, and agent configurations that vary by environment. FastAPI's `Depends` system solves all of these by letting you declare what each endpoint needs, while the framework handles instantiation, sharing, and cleanup.

Without dependency injection, you end up with global variables, manual resource cleanup in try/finally blocks, and test suites that cannot swap real LLM calls for mocks. With `Depends`, your endpoints declare their dependencies explicitly, making the code readable, testable, and maintainable.

## Database Session Dependencies

The most common dependency pattern is a database session that is created per request and closed afterward:

```mermaid
flowchart LR
    PR(["PR opened"])
    UNIT["Unit tests"]
    EVAL["Eval harness
PromptFoo or Braintrust"]
    GOLD[("Golden set
200 tagged cases")]
    JUDGE["LLM as judge
plus regex graders"]
    SCORE["Aggregate score
and per slice"]
    GATE{"Score regress
more than 2 percent?"}
    BLOCK(["Block merge"])
    MERGE(["Merge to main"])
    PR --> UNIT --> EVAL --> GOLD --> JUDGE --> SCORE --> GATE
    GATE -->|Yes| BLOCK
    GATE -->|No| MERGE
    style EVAL fill:#4f46e5,stroke:#4338ca,color:#fff
    style GATE fill:#f59e0b,stroke:#d97706,color:#1f2937
    style BLOCK fill:#dc2626,stroke:#b91c1c,color:#fff
    style MERGE fill:#059669,stroke:#047857,color:#fff
```

```python
from sqlalchemy.ext.asyncio import (
    create_async_engine,
    async_sessionmaker,
    AsyncSession,
)
from typing import AsyncGenerator

engine = create_async_engine(
    "postgresql+asyncpg://user:pass@localhost/agents_db",
    pool_size=20,
    max_overflow=10,
)
async_session_factory = async_sessionmaker(
    engine, expire_on_commit=False
)

async def get_db() -> AsyncGenerator[AsyncSession, None]:
    async with async_session_factory() as session:
        try:
            yield session
            await session.commit()
        except Exception:
            await session.rollback()
            raise

# Usage in an endpoint
@router.post("/conversations")
async def create_conversation(
    request: CreateConversationRequest,
    db: AsyncSession = Depends(get_db),
):
    conversation = Conversation(
        user_id=request.user_id,
        agent_type=request.agent_type,
    )
    db.add(conversation)
    # commit happens automatically when the dependency closes
    return {"id": str(conversation.id)}
```

The `yield` in `get_db` separates setup from cleanup. Everything before `yield` runs before the endpoint. Everything after runs when the response is complete, even if the endpoint raises an exception.

## LLM Client Injection

LLM clients should be created once and shared across all requests. Combine lifespan events with a dependency that retrieves the shared client:

```python
from openai import AsyncOpenAI
from fastapi import Request

# Created once in lifespan
async def get_llm_client(request: Request) -> AsyncOpenAI:
    return request.app.state.llm_client

# Higher-level service dependency
class LLMService:
    def __init__(self, client: AsyncOpenAI, model: str):
        self.client = client
        self.model = model

    async def generate(self, messages: list[dict]) -> str:
        response = await self.client.chat.completions.create(
            model=self.model,
            messages=messages,
        )
        return response.choices[0].message.content

async def get_llm_service(
    client: AsyncOpenAI = Depends(get_llm_client),
    settings: Settings = Depends(get_settings),
) -> LLMService:
    return LLMService(
        client=client,
        model=settings.openai_model,
    )
```

Notice how `get_llm_service` depends on both `get_llm_client` and `get_settings`. FastAPI resolves this dependency chain automatically, building the `LLMService` with all the pieces it needs.

## Agent Factory Pattern

When you have multiple specialized agents, use a factory dependency that returns the right agent based on the request:

```python
from enum import Enum

class AgentType(str, Enum):
    RESEARCH = "research"
    SUPPORT = "support"
    CODING = "coding"

class AgentFactory:
    def __init__(self, llm_service: LLMService, db: AsyncSession):
        self.llm_service = llm_service
        self.db = db

    def create(self, agent_type: AgentType) -> BaseAgent:
        agents = {
            AgentType.RESEARCH: ResearchAgent,
            AgentType.SUPPORT: SupportAgent,
            AgentType.CODING: CodingAgent,
        }
        agent_class = agents.get(agent_type)
        if not agent_class:
            raise ValueError(f"Unknown agent type: {agent_type}")

        return agent_class(
            llm_service=self.llm_service,
            db=self.db,
        )

async def get_agent_factory(
    llm_service: LLMService = Depends(get_llm_service),
    db: AsyncSession = Depends(get_db),
) -> AgentFactory:
    return AgentFactory(llm_service=llm_service, db=db)

@router.post("/agents/{agent_type}/chat")
async def chat(
    agent_type: AgentType,
    request: ChatRequest,
    factory: AgentFactory = Depends(get_agent_factory),
):
    agent = factory.create(agent_type)
    response = await agent.process(request.message)
    return {"response": response}
```

## Testing with Dependency Overrides

The real power of dependency injection shines in testing. Override any dependency to swap real services for mocks:

```python
import pytest
from httpx import AsyncClient, ASGITransport

class MockLLMService:
    async def generate(self, messages):
        return "This is a mock response"

    async def stream_generate(self, message):
        for word in ["Hello", " from", " mock"]:
            yield word

@pytest.fixture
def app_with_mocks():
    app.dependency_overrides[get_llm_service] = (
        lambda: MockLLMService()
    )
    app.dependency_overrides[get_db] = get_test_db
    yield app
    app.dependency_overrides.clear()

@pytest.mark.anyio
async def test_chat_endpoint(app_with_mocks):
    transport = ASGITransport(app=app_with_mocks)
    async with AsyncClient(
        transport=transport, base_url="http://test"
    ) as client:
        response = await client.post(
            "/agents/research/chat",
            json={"message": "test query"},
        )
        assert response.status_code == 200
        assert "mock response" in response.json()["response"]
```

No real LLM calls, no real database connections. The test runs in milliseconds and is completely deterministic.

## FAQ

### Can I use class-based dependencies in FastAPI?

Yes. Define a class with a `__call__` method and use it with `Depends`. FastAPI will instantiate the class and call its `__call__` method. For async cleanup, implement `__aenter__` and `__aexit__` and use the class as an async context manager in a generator dependency. However, function-based dependencies with `yield` are more common and usually simpler to understand.

### How do I share a single dependency instance across multiple endpoints in the same request?

FastAPI automatically caches dependency results within a single request. If two endpoints in the same request both depend on `get_db`, they get the same session instance. This is the default behavior and requires no configuration. If you explicitly want a new instance each time, use `Depends(get_db, use_cache=False)`.

### What happens if a dependency raises an exception?

If a dependency raises an exception before the `yield`, the endpoint never runs, and FastAPI returns an appropriate error response. If an exception occurs after `yield` during cleanup, it is logged but does not affect the response already sent to the client. Always put critical cleanup logic in a `finally` block inside your generator dependency to ensure it runs regardless of exceptions.

---

#FastAPI #DependencyInjection #AIAgents #Python #Testing #AgenticAI #LearnAI #AIEngineering

---

Source: https://callsphere.ai/blog/fastapi-dependency-injection-ai-agents-llm-clients-sessions
