Skip to content
Learn Agentic AI
Learn Agentic AI11 min read1 views

FastAPI Middleware for AI Agents: Logging, Auth, and Rate Limiting

Build a production middleware stack for AI agent APIs in FastAPI. Covers structured request logging, Bearer token authentication, sliding window rate limiting, and CORS configuration for agent frontends.

The Middleware Stack for AI Agent APIs

Middleware sits between the incoming HTTP request and your endpoint handler. For AI agent backends, a proper middleware stack handles cross-cutting concerns: logging every request for debugging, authenticating callers before they reach agent endpoints, rate limiting to prevent LLM cost overruns, and adding CORS headers for browser-based agent frontends.

FastAPI middleware executes in the order it is added, wrapping your endpoint like layers of an onion. The first middleware added is the outermost layer, meaning it sees the request first and the response last.

Structured Request Logging

Every AI agent request should be logged with enough context to debug issues in production. This middleware captures timing, status codes, and request metadata:

flowchart TD
    START["FastAPI Middleware for AI Agents: Logging, Auth, …"] --> A
    A["The Middleware Stack for AI Agent APIs"]
    A --> B
    B["Structured Request Logging"]
    B --> C
    C["Token-Based Authentication Middleware"]
    C --> D
    D["Sliding Window Rate Limiting"]
    D --> E
    E["CORS Configuration"]
    E --> F
    F["FAQ"]
    F --> DONE["Key Takeaways"]
    style START fill:#4f46e5,stroke:#4338ca,color:#fff
    style DONE fill:#059669,stroke:#047857,color:#fff
import time
import uuid
import logging
from fastapi import Request

logger = logging.getLogger("agent_api")

@app.middleware("http")
async def logging_middleware(request: Request, call_next):
    request_id = str(uuid.uuid4())[:8]
    request.state.request_id = request_id

    start_time = time.monotonic()

    # Log request
    logger.info(
        "request_started",
        extra={
            "request_id": request_id,
            "method": request.method,
            "path": request.url.path,
            "client_ip": request.client.host,
        },
    )

    try:
        response = await call_next(request)
        duration_ms = (time.monotonic() - start_time) * 1000

        logger.info(
            "request_completed",
            extra={
                "request_id": request_id,
                "status_code": response.status_code,
                "duration_ms": round(duration_ms, 2),
                "path": request.url.path,
            },
        )

        response.headers["X-Request-ID"] = request_id
        response.headers["X-Response-Time"] = f"{duration_ms:.0f}ms"
        return response

    except Exception as e:
        duration_ms = (time.monotonic() - start_time) * 1000
        logger.error(
            "request_failed",
            extra={
                "request_id": request_id,
                "error": str(e),
                "duration_ms": round(duration_ms, 2),
            },
        )
        raise

The X-Request-ID header lets clients and support teams correlate frontend errors with backend logs.

Token-Based Authentication Middleware

AI agent APIs should authenticate every request. This middleware validates Bearer tokens and attaches user context to the request:

from fastapi import Request, HTTPException
from fastapi.responses import JSONResponse
import jwt

SKIP_AUTH_PATHS = {"/health", "/docs", "/openapi.json"}

@app.middleware("http")
async def auth_middleware(request: Request, call_next):
    if request.url.path in SKIP_AUTH_PATHS:
        return await call_next(request)

    auth_header = request.headers.get("Authorization")
    if not auth_header or not auth_header.startswith("Bearer "):
        return JSONResponse(
            status_code=401,
            content={"error": "Missing or invalid auth token"},
        )

    token = auth_header.split(" ", 1)[1]

    try:
        payload = jwt.decode(
            token,
            settings.jwt_secret,
            algorithms=["HS256"],
        )
        request.state.user_id = payload["sub"]
        request.state.user_tier = payload.get("tier", "free")
    except jwt.ExpiredSignatureError:
        return JSONResponse(
            status_code=401,
            content={"error": "Token expired"},
        )
    except jwt.InvalidTokenError:
        return JSONResponse(
            status_code=401,
            content={"error": "Invalid token"},
        )

    return await call_next(request)

Notice this uses JSONResponse instead of raising HTTPException. Inside middleware, raising exceptions can bypass other middleware layers. Returning a response directly is safer.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Sliding Window Rate Limiting

AI agent APIs are expensive because every request triggers LLM calls. Rate limiting prevents abuse and cost overruns. This implementation uses Redis for a sliding window algorithm:

import redis.asyncio as redis

redis_client = redis.from_url("redis://localhost:6379/2")

RATE_LIMITS = {
    "free": {"requests": 20, "window_seconds": 3600},
    "pro": {"requests": 200, "window_seconds": 3600},
    "enterprise": {"requests": 2000, "window_seconds": 3600},
}

@app.middleware("http")
async def rate_limit_middleware(request: Request, call_next):
    if request.url.path in SKIP_AUTH_PATHS:
        return await call_next(request)

    user_id = getattr(request.state, "user_id", "anonymous")
    user_tier = getattr(request.state, "user_tier", "free")
    limits = RATE_LIMITS[user_tier]

    key = f"ratelimit:{user_id}"
    now = time.time()
    window_start = now - limits["window_seconds"]

    pipe = redis_client.pipeline()
    # Remove old entries outside the window
    pipe.zremrangebyscore(key, 0, window_start)
    # Count remaining entries
    pipe.zcard(key)
    # Add current request
    pipe.zadd(key, {str(now): now})
    # Set expiry on the key
    pipe.expire(key, limits["window_seconds"])
    results = await pipe.execute()

    request_count = results[1]

    if request_count >= limits["requests"]:
        retry_after = int(limits["window_seconds"])
        return JSONResponse(
            status_code=429,
            content={
                "error": "Rate limit exceeded",
                "limit": limits["requests"],
                "window": f"{limits['window_seconds']}s",
                "retry_after": retry_after,
            },
            headers={"Retry-After": str(retry_after)},
        )

    response = await call_next(request)
    remaining = limits["requests"] - request_count - 1
    response.headers["X-RateLimit-Limit"] = str(limits["requests"])
    response.headers["X-RateLimit-Remaining"] = str(max(0, remaining))
    return response

The Redis sorted set tracks each request timestamp. On each new request, old entries outside the window are pruned, the current count is checked, and the new request is added. This gives an accurate sliding window rather than a fixed window that resets.

CORS Configuration

Browser-based agent frontends need proper CORS headers:

from fastapi.middleware.cors import CORSMiddleware

app.add_middleware(
    CORSMiddleware,
    allow_origins=[
        "https://app.yourdomain.com",
        "http://localhost:3000",
    ],
    allow_credentials=True,
    allow_methods=["GET", "POST", "PUT", "DELETE"],
    allow_headers=["Authorization", "Content-Type"],
    expose_headers=[
        "X-Request-ID",
        "X-RateLimit-Remaining",
    ],
)

Add CORS middleware last so it is the outermost layer and properly handles preflight OPTIONS requests before any other middleware runs.

FAQ

What is the correct order for middleware in a FastAPI AI agent API?

Add middleware in this order: CORS (outermost, handles preflight), logging (captures all requests including rejected ones), authentication (rejects unauthenticated requests early), rate limiting (checks limits for authenticated users). Since FastAPI middleware wraps in reverse order of addition, add CORS last in your code so it executes first. This ensures OPTIONS preflight requests get CORS headers without triggering auth or rate limiting.

Should I use middleware or Dependencies for authentication?

Middleware is better when every endpoint needs authentication because it runs automatically without any per-endpoint configuration. Dependencies are better when only some endpoints need auth, or when different endpoints need different auth levels. A common pattern is using middleware for basic token validation and a dependency for fine-grained permission checks on specific endpoints.

How do I handle rate limiting for streaming endpoints?

Count the initial request, not individual streamed chunks. A streaming response that sends 500 tokens is still one API request from a rate limiting perspective. However, you may want to track token usage separately for billing purposes. Use the logging middleware to record total tokens consumed per request and apply token-based quotas as a separate check from request-count rate limiting.


#FastAPI #Middleware #Authentication #RateLimiting #AIAgents #AgenticAI #LearnAI #AIEngineering

Share
C

Written by

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

Use Cases

Automating Client Document Collection: How AI Agents Chase Missing Tax Documents and Reduce Filing Delays

See how AI agents automate tax document collection — chasing missing W-2s, 1099s, and receipts via calls and texts to eliminate the #1 CPA bottleneck.

Learn Agentic AI

Agent Gateway Pattern: Rate Limiting, Authentication, and Request Routing for AI Agents

Implementing an agent gateway with API key management, per-agent rate limiting, intelligent request routing, audit logging, and cost tracking for enterprise AI systems.

Learn Agentic AI

API Design for AI Agent Tool Functions: Best Practices and Anti-Patterns

How to design tool functions that LLMs can use effectively with clear naming, enum parameters, structured responses, informative error messages, and documentation.

Learn Agentic AI

Computer Use in GPT-5.4: Building AI Agents That Navigate Desktop Applications

Technical guide to GPT-5.4's computer use capabilities for building AI agents that interact with desktop UIs, browser automation, and real-world application workflows.

Learn Agentic AI

Prompt Engineering for AI Agents: System Prompts, Tool Descriptions, and Few-Shot Patterns

Agent-specific prompt engineering techniques: crafting effective system prompts, writing clear tool descriptions for function calling, and few-shot examples that improve complex task performance.

Learn Agentic AI

AI Agents for IT Helpdesk: L1 Automation, Ticket Routing, and Knowledge Base Integration

Build IT helpdesk AI agents with multi-agent architecture for triage, device, network, and security issues. RAG-powered knowledge base, automated ticket creation, routing, and escalation.