Skip to content
Learn Agentic AI
Learn Agentic AI10 min read2 views

API Error Design for AI Agent Services: Problem Details, Error Codes, and Retry Hints

Design machine-readable API error responses for AI agents using RFC 7807 Problem Details, structured error codes, and retry hints. Build error responses that agents can parse and act on programmatically.

Why Error Design Matters More for AI Agents

When a human encounters an API error, they read the message, understand the context, and decide what to do. An AI agent has none of that intuition. It needs structured, machine-readable error responses that tell it exactly what went wrong, whether to retry, and how long to wait. Poor error design turns every transient failure into a hard failure for autonomous agents.

The best API error format for AI agents follows RFC 7807 (Problem Details for HTTP APIs), augmented with agent-specific fields like retry hints and error taxonomies.

RFC 7807 Problem Details Format

RFC 7807 defines a standard JSON structure for API errors. It includes a type URI for machine identification, a human-readable title and detail, the HTTP status code, and an optional instance URI pointing to the specific occurrence.

flowchart TD
    START["API Error Design for AI Agent Services: Problem D…"] --> A
    A["Why Error Design Matters More for AI Ag…"]
    A --> B
    B["RFC 7807 Problem Details Format"]
    B --> C
    C["Error Taxonomy for AI Agent Services"]
    C --> D
    D["Applying the Error Pattern to Endpoints"]
    D --> E
    E["Global Exception Handlers"]
    E --> F
    F["Client-Side Error Handling for Agents"]
    F --> G
    G["FAQ"]
    G --> DONE["Key Takeaways"]
    style START fill:#4f46e5,stroke:#4338ca,color:#fff
    style DONE fill:#059669,stroke:#047857,color:#fff
from fastapi import FastAPI, Request
from fastapi.responses import JSONResponse
from fastapi.exceptions import RequestValidationError
from pydantic import BaseModel

app = FastAPI()

class ProblemDetail(BaseModel):
    type: str
    title: str
    status: int
    detail: str
    instance: str | None = None
    # Agent-specific extensions
    error_code: str | None = None
    retryable: bool = False
    retry_after_seconds: int | None = None

def problem_response(
    status: int,
    error_type: str,
    title: str,
    detail: str,
    error_code: str | None = None,
    retryable: bool = False,
    retry_after: int | None = None,
    instance: str | None = None,
) -> JSONResponse:
    body = ProblemDetail(
        type=f"https://api.example.com/errors/{error_type}",
        title=title,
        status=status,
        detail=detail,
        instance=instance,
        error_code=error_code,
        retryable=retryable,
        retry_after_seconds=retry_after,
    )
    headers = {}
    if retry_after is not None:
        headers["Retry-After"] = str(retry_after)

    return JSONResponse(
        status_code=status,
        content=body.model_dump(exclude_none=True),
        media_type="application/problem+json",
        headers=headers,
    )

Error Taxonomy for AI Agent Services

Define a clear error taxonomy so agents can programmatically classify errors and decide on the appropriate recovery strategy.

class ErrorCodes:
    # Authentication & Authorization
    AUTH_TOKEN_EXPIRED = "auth.token_expired"
    AUTH_TOKEN_INVALID = "auth.token_invalid"
    AUTH_INSUFFICIENT_SCOPE = "auth.insufficient_scope"

    # Rate Limiting
    RATE_LIMIT_EXCEEDED = "rate_limit.exceeded"
    RATE_LIMIT_TOKENS = "rate_limit.token_budget_exceeded"

    # Model Errors
    MODEL_OVERLOADED = "model.overloaded"
    MODEL_NOT_FOUND = "model.not_found"
    MODEL_CONTEXT_LENGTH = "model.context_length_exceeded"

    # Validation
    VALIDATION_FAILED = "validation.failed"
    VALIDATION_CONTENT_FILTER = "validation.content_filter"

    # Resource Errors
    RESOURCE_NOT_FOUND = "resource.not_found"
    RESOURCE_CONFLICT = "resource.conflict"
    RESOURCE_QUOTA_EXCEEDED = "resource.quota_exceeded"

    # Internal
    INTERNAL_ERROR = "internal.error"
    INTERNAL_TIMEOUT = "internal.timeout"

Applying the Error Pattern to Endpoints

Here is how these error responses look in practice across common failure scenarios in an AI agent API.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

@app.post("/v1/chat/completions")
async def chat_completions(request: dict):
    model = request.get("model")
    messages = request.get("messages", [])

    if not model:
        return problem_response(
            status=422,
            error_type="validation-error",
            title="Validation Failed",
            detail="The 'model' field is required.",
            error_code=ErrorCodes.VALIDATION_FAILED,
        )

    token_count = estimate_tokens(messages)
    if token_count > 128000:
        return problem_response(
            status=400,
            error_type="context-length-exceeded",
            title="Context Length Exceeded",
            detail=(
                f"Request contains {token_count} tokens, "
                f"exceeding the model maximum of 128000."
            ),
            error_code=ErrorCodes.MODEL_CONTEXT_LENGTH,
        )

    if is_rate_limited(request):
        return problem_response(
            status=429,
            error_type="rate-limit-exceeded",
            title="Rate Limit Exceeded",
            detail="You have exceeded 100 requests per minute.",
            error_code=ErrorCodes.RATE_LIMIT_EXCEEDED,
            retryable=True,
            retry_after=30,
        )

    try:
        result = await call_llm(model, messages)
        return result
    except ModelOverloadedError:
        return problem_response(
            status=503,
            error_type="model-overloaded",
            title="Model Overloaded",
            detail="The model is currently at capacity. Please retry.",
            error_code=ErrorCodes.MODEL_OVERLOADED,
            retryable=True,
            retry_after=5,
        )

Global Exception Handlers

Register global exception handlers to ensure every error follows the Problem Details format, even unhandled exceptions.

@app.exception_handler(RequestValidationError)
async def validation_exception_handler(
    request: Request, exc: RequestValidationError
):
    errors = exc.errors()
    detail_parts = []
    for err in errors:
        field = " -> ".join(str(loc) for loc in err["loc"])
        detail_parts.append(f"{field}: {err['msg']}")

    return problem_response(
        status=422,
        error_type="validation-error",
        title="Request Validation Failed",
        detail="; ".join(detail_parts),
        error_code=ErrorCodes.VALIDATION_FAILED,
    )

@app.exception_handler(Exception)
async def generic_exception_handler(request: Request, exc: Exception):
    # Log the full exception internally
    import logging
    logging.exception("Unhandled exception")

    return problem_response(
        status=500,
        error_type="internal-error",
        title="Internal Server Error",
        detail="An unexpected error occurred. Please retry or contact support.",
        error_code=ErrorCodes.INTERNAL_ERROR,
        retryable=True,
        retry_after=10,
    )

Client-Side Error Handling for Agents

On the agent side, the structured error format enables intelligent retry logic.

import httpx

async def call_with_retry(url: str, body: dict, max_retries: int = 3):
    for attempt in range(max_retries + 1):
        response = await httpx.AsyncClient().post(url, json=body)

        if response.status_code < 400:
            return response.json()

        error = response.json()
        retryable = error.get("retryable", False)
        retry_after = error.get("retry_after_seconds", 2 ** attempt)

        if not retryable or attempt == max_retries:
            raise AgentAPIError(
                code=error.get("error_code"),
                detail=error.get("detail"),
                status=response.status_code,
            )

        await asyncio.sleep(retry_after)

FAQ

Why use RFC 7807 instead of a custom error format?

RFC 7807 is an IETF standard that most HTTP client libraries and API gateways understand. Using it means your errors work with existing tooling out of the box. The application/problem+json media type signals to clients that the response follows a known structure. You can extend it with custom fields like retryable and error_code without breaking the standard.

How should AI agents decide whether to retry an error?

Agents should check the retryable field first. If true, use the retry_after_seconds value as the delay. If the field is absent, use HTTP status code heuristics: 429 (rate limit) and 503 (service unavailable) are generally retryable; 400, 401, 403, 404, and 422 are not. Always cap retries with a maximum attempt count and total timeout to prevent infinite retry loops.

Should I include stack traces in error responses?

Never in production. Stack traces expose internal implementation details, file paths, library versions, and potentially sensitive data. Log the full stack trace server-side with a correlation ID, and include that correlation ID in the instance field of the Problem Details response so your support team can locate the relevant logs.


#APIErrorDesign #RFC7807 #ErrorHandling #FastAPI #AIAgents #AgenticAI #LearnAI #AIEngineering

Share
C

Written by

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

Use Cases

Automating Client Document Collection: How AI Agents Chase Missing Tax Documents and Reduce Filing Delays

See how AI agents automate tax document collection — chasing missing W-2s, 1099s, and receipts via calls and texts to eliminate the #1 CPA bottleneck.

Learn Agentic AI

API Design for AI Agent Tool Functions: Best Practices and Anti-Patterns

How to design tool functions that LLMs can use effectively with clear naming, enum parameters, structured responses, informative error messages, and documentation.

Learn Agentic AI

AI Agents for IT Helpdesk: L1 Automation, Ticket Routing, and Knowledge Base Integration

Build IT helpdesk AI agents with multi-agent architecture for triage, device, network, and security issues. RAG-powered knowledge base, automated ticket creation, routing, and escalation.

Learn Agentic AI

Computer Use in GPT-5.4: Building AI Agents That Navigate Desktop Applications

Technical guide to GPT-5.4's computer use capabilities for building AI agents that interact with desktop UIs, browser automation, and real-world application workflows.

Learn Agentic AI

Prompt Engineering for AI Agents: System Prompts, Tool Descriptions, and Few-Shot Patterns

Agent-specific prompt engineering techniques: crafting effective system prompts, writing clear tool descriptions for function calling, and few-shot examples that improve complex task performance.

Learn Agentic AI

Google Cloud AI Agent Trends Report 2026: Key Findings and Developer Implications

Analysis of Google Cloud's 2026 AI agent trends report covering Gemini-powered agents, Google ADK, Vertex AI agent builder, and enterprise adoption patterns.