Skip to content
Learn Agentic AI
Learn Agentic AI14 min read2 views

Secure API Gateway for AI Agents: Kong, Traefik, and Custom Gateway Patterns

Set up a secure API gateway for AI agent systems using Kong, Traefik, and custom FastAPI patterns. Covers authentication plugins, rate limiting, request transformation, and routing strategies.

Why AI Agent Platforms Need an API Gateway

An API gateway is a single entry point that sits in front of your AI agent services and handles cross-cutting concerns: authentication, rate limiting, request routing, logging, and protocol translation. Without a gateway, every agent service must independently implement these concerns, leading to inconsistency and duplicated security logic.

For AI agent platforms specifically, a gateway provides three critical capabilities: it enforces rate limits to prevent a single tenant from exhausting GPU resources, it routes requests to different agent versions for A/B testing, and it transforms requests between the public API format and the internal service format.

Gateway Architecture for Multi-Agent Systems

A typical architecture places the gateway between the public internet and your internal agent services:

flowchart TD
    START["Secure API Gateway for AI Agents: Kong, Traefik, …"] --> A
    A["Why AI Agent Platforms Need an API Gate…"]
    A --> B
    B["Gateway Architecture for Multi-Agent Sy…"]
    B --> C
    C["Kong Gateway Configuration"]
    C --> D
    D["Traefik Configuration for Kubernetes"]
    D --> E
    E["Building a Custom FastAPI Gateway"]
    E --> F
    F["Content-Based Routing"]
    F --> G
    G["Gateway-Level Rate Limiting with Redis"]
    G --> H
    H["FAQ"]
    H --> DONE["Key Takeaways"]
    style START fill:#4f46e5,stroke:#4338ca,color:#fff
    style DONE fill:#059669,stroke:#047857,color:#fff
Client --> API Gateway --> Triage Agent --> Research Agent
                      --> Tool Executor
                      --> Conversation Service
                      --> Billing Service

The gateway handles TLS termination, authentication, rate limiting, and routing. Internal services communicate via mTLS or service tokens as discussed in previous posts.

Kong Gateway Configuration

Kong is a widely deployed API gateway with a rich plugin ecosystem. Configure it for an AI agent platform using its declarative YAML format:

# kong.yml
_format_version: "3.0"

services:
  - name: agent-api
    url: http://agent-service:8000
    routes:
      - name: agent-routes
        paths:
          - /api/agents
        strip_path: false
    plugins:
      - name: jwt
        config:
          claims_to_verify:
            - exp
          header_names:
            - Authorization
      - name: rate-limiting
        config:
          minute: 60
          hour: 1000
          policy: redis
          redis_host: redis
          redis_port: 6379
      - name: request-transformer
        config:
          add:
            headers:
              - "X-Gateway-Request-Id:$(uuid())"
              - "X-Gateway-Timestamp:$(now())"
      - name: cors
        config:
          origins:
            - "https://app.example.com"
          methods:
            - GET
            - POST
            - PUT
            - DELETE
          headers:
            - Authorization
            - Content-Type
            - X-Session-Id
          max_age: 3600

Traefik Configuration for Kubernetes

Traefik integrates natively with Kubernetes through IngressRoute custom resources, making it a natural choice for agent platforms running on K8s:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

# traefik-ingress.yaml
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
  name: agent-api
  namespace: ai-agents
spec:
  entryPoints:
    - websecure
  routes:
    - match: Host(`api.agents.example.com`) && PathPrefix(`/api/agents`)
      kind: Rule
      services:
        - name: agent-service
          port: 8000
      middlewares:
        - name: agent-auth
        - name: agent-rate-limit
        - name: agent-headers
  tls:
    certResolver: letsencrypt
---
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: agent-rate-limit
  namespace: ai-agents
spec:
  rateLimit:
    average: 60
    burst: 20
    period: 1m
    sourceCriterion:
      requestHeaderName: X-API-Key
---
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: agent-headers
  namespace: ai-agents
spec:
  headers:
    customRequestHeaders:
      X-Gateway: "traefik"
    customResponseHeaders:
      X-Content-Type-Options: "nosniff"
      X-Frame-Options: "DENY"
      Strict-Transport-Security: "max-age=31536000; includeSubDomains"

Building a Custom FastAPI Gateway

For full control, build a lightweight gateway directly in FastAPI. This is ideal when your routing logic depends on request content (like routing to different agent versions based on the model parameter):

# gateway/main.py
import time
import uuid
import httpx
from fastapi import FastAPI, Request, HTTPException, Depends
from fastapi.responses import StreamingResponse

app = FastAPI(title="Agent API Gateway")

# Service registry
SERVICES = {
    "agents": "http://agent-service:8000",
    "tools": "http://tool-service:8001",
    "conversations": "http://conversation-service:8002",
}


@app.middleware("http")
async def gateway_middleware(request: Request, call_next):
    # Add request tracking headers
    request_id = str(uuid.uuid4())
    start_time = time.time()

    response = await call_next(request)

    # Add response headers
    duration_ms = (time.time() - start_time) * 1000
    response.headers["X-Request-Id"] = request_id
    response.headers["X-Response-Time-Ms"] = f"{duration_ms:.2f}"
    return response

Content-Based Routing

Route requests to different backend services based on the request body. This is useful for directing agent execution requests to specialized model servers:

@app.post("/api/agents/execute")
async def route_agent_execution(
    request: Request,
    user: TokenPayload = Depends(get_current_user),
):
    body = await request.json()
    model = body.get("model", "default")

    # Route to different backends based on model
    routing_table = {
        "gpt-4": "http://openai-agent-service:8000",
        "claude-3": "http://anthropic-agent-service:8000",
        "local-llama": "http://local-agent-service:8000",
        "default": SERVICES["agents"],
    }

    target_url = routing_table.get(model, routing_table["default"])

    async with httpx.AsyncClient() as client:
        response = await client.post(
            f"{target_url}/api/agents/execute",
            json=body,
            headers={
                "Authorization": request.headers.get("Authorization"),
                "X-Org-Id": user.org_id,
                "X-User-Id": user.sub,
            },
            timeout=120.0,
        )

    return response.json()

Gateway-Level Rate Limiting with Redis

Implement tiered rate limiting based on the user's subscription plan:

import redis.asyncio as redis

redis_client = redis.from_url("redis://redis:6379/0")

PLAN_LIMITS = {
    "free": {"rpm": 10, "rpd": 100},
    "pro": {"rpm": 60, "rpd": 5000},
    "enterprise": {"rpm": 300, "rpd": 50000},
}


async def check_rate_limit(user: TokenPayload = Depends(get_current_user)):
    plan = await get_user_plan(user.sub)
    limits = PLAN_LIMITS.get(plan, PLAN_LIMITS["free"])

    minute_key = f"rl:{user.sub}:minute:{int(time.time()) // 60}"
    day_key = f"rl:{user.sub}:day:{int(time.time()) // 86400}"

    pipe = redis_client.pipeline()
    pipe.incr(minute_key)
    pipe.expire(minute_key, 60)
    pipe.incr(day_key)
    pipe.expire(day_key, 86400)
    results = await pipe.execute()

    minute_count = results[0]
    day_count = results[2]

    if minute_count > limits["rpm"]:
        raise HTTPException(
            status_code=429,
            detail="Rate limit exceeded (per minute)",
            headers={"Retry-After": "60"},
        )
    if day_count > limits["rpd"]:
        raise HTTPException(
            status_code=429,
            detail="Daily rate limit exceeded",
            headers={"Retry-After": "3600"},
        )

FAQ

Should I use Kong, Traefik, or a custom gateway?

Use Kong if you need a mature plugin ecosystem with built-in support for JWT, OAuth2, OIDC, and advanced rate limiting out of the box. Use Traefik if you are on Kubernetes and want auto-discovery of services through ingress annotations. Build a custom FastAPI gateway when you need content-based routing, complex request transformation, or business logic in the gateway layer. Many teams start with Traefik for basic routing and add a thin FastAPI gateway behind it for application-specific logic.

How do I handle streaming responses through a gateway?

AI agent responses often stream via SSE (Server-Sent Events). Your gateway must proxy the response as a stream without buffering the entire body. In a custom FastAPI gateway, use httpx.AsyncClient.stream() and return a StreamingResponse. In Kong and Traefik, disable response buffering for streaming endpoints. Test latency carefully — gateways that buffer before forwarding add significant time-to-first-token latency.

How should I version my AI agent API through the gateway?

Use URL path versioning (/v1/agents, /v2/agents) routed to different backend services. The gateway maintains a routing table that maps version prefixes to the appropriate service version. Support a Sunset response header on deprecated versions to give clients advance notice. Allow enterprise customers to pin to specific versions while gradually migrating the default version for new users.


#APIGateway #Kong #Traefik #FastAPI #AIAgents #RateLimiting #AgenticAI #LearnAI #AIEngineering

Share
C

Written by

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

Use Cases

Automating Client Document Collection: How AI Agents Chase Missing Tax Documents and Reduce Filing Delays

See how AI agents automate tax document collection — chasing missing W-2s, 1099s, and receipts via calls and texts to eliminate the #1 CPA bottleneck.

Learn Agentic AI

Agent Gateway Pattern: Rate Limiting, Authentication, and Request Routing for AI Agents

Implementing an agent gateway with API key management, per-agent rate limiting, intelligent request routing, audit logging, and cost tracking for enterprise AI systems.

Learn Agentic AI

API Design for AI Agent Tool Functions: Best Practices and Anti-Patterns

How to design tool functions that LLMs can use effectively with clear naming, enum parameters, structured responses, informative error messages, and documentation.

Learn Agentic AI

Computer Use in GPT-5.4: Building AI Agents That Navigate Desktop Applications

Technical guide to GPT-5.4's computer use capabilities for building AI agents that interact with desktop UIs, browser automation, and real-world application workflows.

Learn Agentic AI

Prompt Engineering for AI Agents: System Prompts, Tool Descriptions, and Few-Shot Patterns

Agent-specific prompt engineering techniques: crafting effective system prompts, writing clear tool descriptions for function calling, and few-shot examples that improve complex task performance.

Learn Agentic AI

AI Agents for IT Helpdesk: L1 Automation, Ticket Routing, and Knowledge Base Integration

Build IT helpdesk AI agents with multi-agent architecture for triage, device, network, and security issues. RAG-powered knowledge base, automated ticket creation, routing, and escalation.