API Security Headers for AI Agent Services: CORS, CSP, and Rate Limit Headers

Security Headers: Your API's First Line of Defense

HTTP security headers protect your AI agent API from common attack vectors: cross-origin abuse, content injection, information leakage, and protocol downgrade attacks. Unlike authentication and authorization (which verify who is making the request), security headers define how the request and response should be handled by browsers, proxies, and clients.

For AI agent APIs, security headers serve a dual purpose. They protect browser-based agent interfaces from XSS and clickjacking, and they communicate rate limiting information so agents can self-throttle rather than hitting walls.

CORS Configuration

Cross-Origin Resource Sharing controls which domains can call your API from a browser. For AI agent APIs, you need to balance accessibility (agents running on various domains) with security (preventing unauthorized cross-origin requests).

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

flowchart LR
    CLIENT(["Client SDK"])
    GW["API Gateway<br/>auth plus rate limit"]
    APP["FastAPI app<br/>handlers and DI"]
    VAL["Pydantic validation"]
    SVC["Service layer<br/>business logic"]
    DB[(Database)]
    QUEUE[(Background queue)]
    OBS[(Tracing)]
    CLIENT --> GW --> APP --> VAL --> SVC
    SVC --> DB
    SVC --> QUEUE
    SVC --> OBS
    SVC --> CLIENT
    style GW fill:#4f46e5,stroke:#4338ca,color:#fff
    style APP fill:#f59e0b,stroke:#d97706,color:#1f2937
    style DB fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b

from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware

app = FastAPI()

# Production CORS: restrict to known origins
app.add_middleware(
    CORSMiddleware,
    allow_origins=[
        "https://app.example.com",
        "https://dashboard.example.com",
        "https://playground.example.com",
    ],
    allow_credentials=True,
    allow_methods=["GET", "POST", "PUT", "PATCH", "DELETE"],
    allow_headers=[
        "Authorization",
        "Content-Type",
        "X-API-Key",
        "X-Request-ID",
        "Idempotency-Key",
    ],
    expose_headers=[
        "X-Request-ID",
        "X-RateLimit-Limit",
        "X-RateLimit-Remaining",
        "X-RateLimit-Reset",
        "Retry-After",
    ],
    max_age=3600,
)

The expose_headers configuration is often overlooked. By default, browsers only expose a handful of response headers to JavaScript. Without listing your rate limit headers here, browser-based agents cannot read them, even though server-to-server agents can.

Rate Limit Headers

Rate limiting is essential for AI agent APIs where a single agent can generate hundreds of requests per minute. Communicate limits clearly using standardized headers so agents can self-regulate.

from starlette.middleware.base import BaseHTTPMiddleware
from fastapi import Request
from fastapi.responses import JSONResponse
import time

class RateLimitMiddleware(BaseHTTPMiddleware):
    def __init__(self, app, requests_per_minute: int = 60):
        super().__init__(app)
        self.rpm = requests_per_minute
        # In production, use Redis with sliding window
        self.buckets: dict[str, dict] = {}

    async def dispatch(self, request: Request, call_next):
        client_id = self._get_client_id(request)
        now = time.time()

        bucket = self.buckets.get(client_id, {
            "count": 0, "reset_at": now + 60,
        })

        if now > bucket["reset_at"]:
            bucket = {"count": 0, "reset_at": now + 60}

        bucket["count"] += 1
        self.buckets[client_id] = bucket

        remaining = max(0, self.rpm - bucket["count"])
        reset_at = int(bucket["reset_at"])

        rate_headers = {
            "X-RateLimit-Limit": str(self.rpm),
            "X-RateLimit-Remaining": str(remaining),
            "X-RateLimit-Reset": str(reset_at),
        }

        if bucket["count"] > self.rpm:
            retry_after = int(bucket["reset_at"] - now)
            return JSONResponse(
                status_code=429,
                content={
                    "type": "https://api.example.com/errors/rate-limit",
                    "title": "Rate Limit Exceeded",
                    "detail": f"Limit: {self.rpm} requests/minute",
                    "retryable": True,
                    "retry_after_seconds": retry_after,
                },
                headers={
                    **rate_headers,
                    "Retry-After": str(retry_after),
                },
            )

        response = await call_next(request)
        for key, value in rate_headers.items():
            response.headers[key] = value
        return response

    def _get_client_id(self, request: Request) -> str:
        api_key = request.headers.get("X-API-Key", "")
        if api_key:
            return f"key:{api_key}"
        forwarded = request.headers.get("X-Forwarded-For", "")
        return f"ip:{forwarded or request.client.host}"

app.add_middleware(RateLimitMiddleware, requests_per_minute=100)

Comprehensive Security Headers Middleware

Beyond CORS and rate limiting, add headers that prevent common web attacks and information leakage.

class SecurityHeadersMiddleware(BaseHTTPMiddleware):
    async def dispatch(self, request: Request, call_next):
        response = await call_next(request)

        # Prevent MIME type sniffing
        response.headers["X-Content-Type-Options"] = "nosniff"

        # Prevent clickjacking
        response.headers["X-Frame-Options"] = "DENY"

        # Control referrer information
        response.headers["Referrer-Policy"] = "strict-origin-when-cross-origin"

        # Force HTTPS
        response.headers["Strict-Transport-Security"] = (
            "max-age=31536000; includeSubDomains; preload"
        )

        # Remove server identification
        response.headers.pop("Server", None)

        # Permissions Policy - disable unused browser features
        response.headers["Permissions-Policy"] = (
            "camera=(), microphone=(), geolocation=(), "
            "payment=(), usb=(), magnetometer=()"
        )

        # Content Security Policy for API responses
        if "text/html" in response.headers.get("content-type", ""):
            response.headers["Content-Security-Policy"] = (
                "default-src 'none'; "
                "script-src 'self'; "
                "style-src 'self' 'unsafe-inline'; "
                "img-src 'self' data:; "
                "font-src 'self'; "
                "connect-src 'self'"
            )

        return response

app.add_middleware(SecurityHeadersMiddleware)

Request ID Tracking

Assign a unique ID to every request for distributed tracing. If the client sends an X-Request-ID header, propagate it; otherwise, generate one. This is invaluable for debugging agent interactions across multiple services.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

import uuid

class RequestIDMiddleware(BaseHTTPMiddleware):
    async def dispatch(self, request: Request, call_next):
        request_id = request.headers.get(
            "X-Request-ID", str(uuid.uuid4())
        )
        request.state.request_id = request_id

        response = await call_next(request)
        response.headers["X-Request-ID"] = request_id
        return response

app.add_middleware(RequestIDMiddleware)

FAQ

Should I use wildcard CORS (`*`) for my AI agent API?

Never use wildcard CORS in production for APIs that use cookies or bearer tokens. A wildcard origin with allow_credentials=True is actually rejected by browsers for security reasons. For public APIs that use API keys in headers rather than cookies, a wildcard origin is acceptable but still not recommended. List specific allowed origins and use environment variables to configure them per deployment environment.

What is the difference between X-RateLimit headers and the standard Retry-After header?

They serve complementary purposes. The X-RateLimit-* headers are informational and sent on every response, telling the client their current quota status (limit, remaining, reset time). The Retry-After header is directive and only sent with 429 or 503 responses, telling the client exactly how many seconds to wait before retrying. Always include both: the rate limit headers for proactive throttling and Retry-After for reactive recovery.

Should I apply rate limiting per API key or per IP address?

Apply rate limiting per API key for authenticated requests and per IP for unauthenticated requests. API key-based limiting is more accurate since multiple users may share an IP (corporate NATs, VPNs). Consider tiered rate limits based on the subscription plan — a free tier might get 10 requests per minute while an enterprise tier gets 1000. Always communicate the current tier's limits in the rate limit response headers.

#APISecurity #CORS #RateLimiting #HTTPHeaders #FastAPI #AgenticAI #LearnAI #AIEngineering

API Security Headers for AI Agent Services: CORS, CSP, and Rate Limit Headers

Security Headers: Your API's First Line of Defense

CORS Configuration

Rate Limit Headers

Comprehensive Security Headers Middleware

Request ID Tracking

FAQ

Should I use wildcard CORS (`*`) for my AI agent API?

What is the difference between X-RateLimit headers and the standard Retry-After header?

Should I apply rate limiting per API key or per IP address?

Try CallSphere AI Voice Agents

Related Articles You May Like

Build a Voice Agent on AWS App Runner with FastAPI + Bedrock (2026)

Rate Limiting and Burst Handling for LLM APIs

Build a Voice Agent on Render: FastAPI + OpenAI Realtime (2026)

Build a Voice Agent on Railway: One-Click FastAPI Deploy (2026)

Chat Agent Rate Limiting and Abuse Prevention: 2026 Token-Based Patterns

AI Voice Rate Limiting in 2026: Token-Aware Quotas That Actually Cap LLM Spend

Security Headers: Your API's First Line of Defense

CORS Configuration

Rate Limit Headers

Comprehensive Security Headers Middleware

Request ID Tracking

FAQ

Should I use wildcard CORS (*) for my AI agent API?

What is the difference between X-RateLimit headers and the standard Retry-After header?

Should I apply rate limiting per API key or per IP address?

Try CallSphere AI Voice Agents

Related Articles You May Like

Build a Voice Agent on AWS App Runner with FastAPI + Bedrock (2026)

Rate Limiting and Burst Handling for LLM APIs

Build a Voice Agent on Render: FastAPI + OpenAI Realtime (2026)

Build a Voice Agent on Railway: One-Click FastAPI Deploy (2026)

Chat Agent Rate Limiting and Abuse Prevention: 2026 Token-Based Patterns

AI Voice Rate Limiting in 2026: Token-Aware Quotas That Actually Cap LLM Spend

Should I use wildcard CORS (`*`) for my AI agent API?