Building AI Agents with Next.js API Routes: Full-Stack Agent Applications

Why Next.js for AI Agent Applications

Next.js provides the rare combination of a React frontend, a server-side API layer, and deployment infrastructure in a single framework. For AI agent applications, this means you can define your agent logic in API routes, stream responses to React components, and deploy everything as one unit — no separate backend service required.

The App Router's route handlers, combined with the Vercel AI SDK or raw streaming APIs, make Next.js one of the fastest paths from idea to deployed agent application.

Basic Agent API Route

Create a route handler that processes messages and returns agent responses:

sequenceDiagram
    autonumber
    participant Client
    participant Edge as Edge Worker
    participant LLM as LLM Provider
    participant DB as Logs and Trace
    Client->>Edge: POST /chat (stream=true)
    Edge->>LLM: messages.create(stream=true)
    loop Each token
        LLM-->>Edge: SSE chunk delta
        Edge-->>Client: SSE chunk delta
        Edge->>DB: append token to span
    end
    LLM-->>Edge: stop_reason=end_turn
    Edge-->>Client: event: done
    Edge->>DB: finalize trace

// app/api/agent/route.ts
import { NextRequest, NextResponse } from "next/server";
import OpenAI from "openai";

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

export async function POST(req: NextRequest) {
  const { messages, threadId } = await req.json();

  if (!messages || !Array.isArray(messages)) {
    return NextResponse.json(
      { error: "messages array is required" },
      { status: 400 }
    );
  }

  const completion = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [
      { role: "system", content: "You are a helpful assistant." },
      ...messages,
    ],
  });

  return NextResponse.json({
    message: completion.choices[0].message,
    usage: completion.usage,
  });
}

Streaming Responses from API Routes

For real-time UIs, stream tokens instead of waiting for the full response:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

// app/api/agent/stream/route.ts
import { NextRequest } from "next/server";
import OpenAI from "openai";

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

export async function POST(req: NextRequest) {
  const { messages } = await req.json();

  const stream = await client.chat.completions.create({
    model: "gpt-4o",
    messages,
    stream: true,
  });

  const encoder = new TextEncoder();

  const readable = new ReadableStream({
    async start(controller) {
      for await (const chunk of stream) {
        const text = chunk.choices[0]?.delta?.content;
        if (text) {
          controller.enqueue(
            encoder.encode(`data: ${JSON.stringify({ text })}

`)
          );
        }
      }
      controller.enqueue(encoder.encode("data: [DONE]

"));
      controller.close();
    },
  });

  return new Response(readable, {
    headers: {
      "Content-Type": "text/event-stream",
      "Cache-Control": "no-cache",
      Connection: "keep-alive",
    },
  });
}

This implements Server-Sent Events (SSE) manually. The client connects to this endpoint and receives tokens as they arrive from the LLM.

Authentication Middleware

Protect your agent endpoints with middleware that validates session tokens:

// middleware.ts
import { NextResponse } from "next/server";
import type { NextRequest } from "next/server";

export function middleware(request: NextRequest) {
  if (request.nextUrl.pathname.startsWith("/api/agent")) {
    const authHeader = request.headers.get("authorization");

    if (!authHeader?.startsWith("Bearer ")) {
      return NextResponse.json(
        { error: "Authentication required" },
        { status: 401 }
      );
    }

    // Validate the token (JWT verification, database lookup, etc.)
    const token = authHeader.slice(7);
    // Add your token validation logic here
  }

  return NextResponse.next();
}

export const config = {
  matcher: "/api/agent/:path*",
};

Conversation Persistence

Store conversation history so users can resume sessions:

// app/api/agent/route.ts
import { prisma } from "@/lib/prisma";

export async function POST(req: NextRequest) {
  const { message, conversationId } = await req.json();
  const userId = req.headers.get("x-user-id")!;

  // Load or create conversation
  let conversation = conversationId
    ? await prisma.conversation.findUnique({
        where: { id: conversationId, userId },
        include: { messages: { orderBy: { createdAt: "asc" } } },
      })
    : await prisma.conversation.create({
        data: { userId },
        include: { messages: true },
      });

  if (!conversation) {
    return NextResponse.json({ error: "Not found" }, { status: 404 });
  }

  // Build messages array from history
  const chatMessages = conversation.messages.map((m) => ({
    role: m.role as "user" | "assistant",
    content: m.content,
  }));
  chatMessages.push({ role: "user", content: message });

  // Call LLM
  const completion = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [
      { role: "system", content: "You are a helpful assistant." },
      ...chatMessages,
    ],
  });

  const reply = completion.choices[0].message.content ?? "";

  // Persist both messages
  await prisma.message.createMany({
    data: [
      { conversationId: conversation.id, role: "user", content: message },
      { conversationId: conversation.id, role: "assistant", content: reply },
    ],
  });

  return NextResponse.json({
    conversationId: conversation.id,
    reply,
  });
}

Rate Limiting

Protect your agent endpoint from abuse:

// lib/rate-limit.ts
const rateLimitMap = new Map<string, { count: number; resetTime: number }>();

export function checkRateLimit(
  userId: string,
  maxRequests: number = 20,
  windowMs: number = 60_000
): boolean {
  const now = Date.now();
  const entry = rateLimitMap.get(userId);

  if (!entry || now > entry.resetTime) {
    rateLimitMap.set(userId, { count: 1, resetTime: now + windowMs });
    return true;
  }

  if (entry.count >= maxRequests) {
    return false;
  }

  entry.count++;
  return true;
}

Use it in your route handler:

if (!checkRateLimit(userId)) {
  return NextResponse.json(
    { error: "Rate limit exceeded. Try again in a minute." },
    { status: 429 }
  );
}

Edge Runtime Considerations

Next.js route handlers can run on the Edge Runtime for lower latency. However, agents often need Node.js APIs (database drivers, file system access). Use edge selectively:

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

// This route can run on edge — it only calls external APIs
export const runtime = "edge";

export async function POST(req: Request) {
  // OpenAI SDK works on edge
  const stream = await client.chat.completions.create({
    model: "gpt-4o",
    messages: await req.json().then((b) => b.messages),
    stream: true,
  });
  // ...stream response
}

For routes that need Prisma, Redis, or other Node.js-dependent libraries, keep the default Node.js runtime.

FAQ

Should I use API routes or Server Actions for AI agents?

Use API routes for agent interactions. Server Actions are designed for form mutations and do not support streaming responses. API route handlers give you full control over the response format, headers, and streaming behavior that AI agents require.

How do I handle long-running agent tasks that exceed the serverless timeout?

For tasks longer than the default timeout (60 seconds on Vercel Hobby, 300 seconds on Pro), use the maxDuration export in your route handler: export const maxDuration = 300;. For even longer tasks, offload to a background job queue (Inngest, Trigger.dev) and poll for results from the client.

Can I deploy a Next.js agent app to platforms other than Vercel?

Yes. Next.js deploys to any platform that supports Node.js: Railway, Fly.io, AWS (via SST or standalone mode), Docker containers, or a traditional VPS. The only features that are Vercel-specific are edge middleware optimizations and some caching behaviors.

#Nextjs #APIRoutes #FullStack #AIAgents #Streaming #EdgeRuntime #AgenticAI #LearnAI #AIEngineering

Building AI Agents with Next.js API Routes: Full-Stack Agent Applications

Why Next.js for AI Agent Applications

Basic Agent API Route

Streaming Responses from API Routes

Authentication Middleware

Conversation Persistence

Rate Limiting

Edge Runtime Considerations

FAQ

Should I use API routes or Server Actions for AI agents?

How do I handle long-running agent tasks that exceed the serverless timeout?

Can I deploy a Next.js agent app to platforms other than Vercel?

Try CallSphere AI Voice Agents

Related Articles You May Like

Personal AI Assistant: How to Pick One for Business in 2026

Free AI Agents in 2026: When Free Wins and When It Costs You

Graphiti: How Temporal Knowledge Graphs Give AI Voice Agents Persistent Memory (2026 Guide)

Chatbot App vs ChatGPT: What's the Difference, and Which Do I Need?

Building an HVAC After-Hours Emergency Escalation System: A Complete Engineering Guide

GPT-Realtime-Whisper vs Deepgram: Streaming STT in 2026