Skip to content
Production LangGraph: Deploying Stateful Agents with LangGraph Cloud
Learn Agentic AI12 min read23 views

Production LangGraph: Deploying Stateful Agents with LangGraph Cloud

Deploy LangGraph agents to production using LangGraph Cloud with API endpoints, cron triggers, monitoring, scaling strategies, and operational best practices for stateful agent workflows.

From Development to Production

Building a LangGraph agent locally is straightforward. Running one in production — handling concurrent users, persisting state across restarts, monitoring execution, recovering from failures, and scaling under load — requires careful architecture. LangGraph Cloud provides managed infrastructure for deploying stateful agents, but you can also self-host with the right patterns.

Structuring Your Project for Deployment

LangGraph Cloud expects a specific project layout:

flowchart TD
    USER(["User input"])
    SUPER["Supervisor node<br/>routes by state"]
    A["Specialist node A<br/>research"]
    B["Specialist node B<br/>writing"]
    TOOL{"Tool call<br/>needed?"}
    EXEC["Tool executor<br/>ToolNode"]
    CHK[("Postgres<br/>checkpointer")]
    INT{"interrupt for<br/>human approval?"}
    HUMAN(["Human reviewer"])
    OUT(["Final response"])
    USER --> SUPER
    SUPER --> A
    SUPER --> B
    A --> TOOL
    B --> TOOL
    TOOL -->|Yes| EXEC --> SUPER
    TOOL -->|No| INT
    INT -->|Yes| HUMAN --> SUPER
    INT -->|No| OUT
    SUPER <--> CHK
    style SUPER fill:#4f46e5,stroke:#4338ca,color:#fff
    style CHK fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff
    style HUMAN fill:#f59e0b,stroke:#d97706,color:#1f2937
# langgraph.json — deployment configuration
{
    "dependencies": ["."],
    "graphs": {
        "my_agent": "./agent/graph.py:graph"
    },
    "env": ".env"
}

The graphs field maps endpoint names to compiled graph objects. Your graph module exports the compiled graph:

# agent/graph.py
from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langgraph.prebuilt import ToolNode

class AgentState(TypedDict):
    messages: Annotated[list, add_messages]

@tool
def lookup_order(order_id: str) -> str:
    """Look up order details by ID."""
    # Production implementation here
    return f"Order {order_id}: shipped, arriving March 20"

tools = [lookup_order]
llm = ChatOpenAI(model="gpt-4o-mini").bind_tools(tools)
tool_node = ToolNode(tools)

def agent(state: AgentState) -> dict:
    return {"messages": [llm.invoke(state["messages"])]}

def should_continue(state: AgentState):
    last = state["messages"][-1]
    if hasattr(last, "tool_calls") and last.tool_calls:
        return "tools"
    return "end"

builder = StateGraph(AgentState)
builder.add_node("agent", agent)
builder.add_node("tools", tool_node)
builder.add_edge(START, "agent")
builder.add_conditional_edges("agent", should_continue, {
    "tools": "tools",
    "end": END,
})
builder.add_edge("tools", "agent")

graph = builder.compile()

Deploying to LangGraph Cloud

Deploy using the LangGraph CLI:

pip install langgraph-cli
langgraph up

This starts a local development server. For cloud deployment:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
langgraph deploy --project my-agent

The deployment creates API endpoints for your graph with built-in persistence, streaming, and thread management.

API Endpoints

Once deployed, LangGraph Cloud exposes REST endpoints:

# Create a new thread
curl -X POST https://your-deployment.langgraph.app/threads \
  -H "Content-Type: application/json" \
  -d '{}'

# Run the agent on a thread
curl -X POST https://your-deployment.langgraph.app/threads/{thread_id}/runs \
  -H "Content-Type: application/json" \
  -d '{
    "assistant_id": "my_agent",
    "input": {
      "messages": [{"role": "human", "content": "Track order 12345"}]
    }
  }'

# Stream responses
curl -X POST https://your-deployment.langgraph.app/threads/{thread_id}/runs/stream \
  -H "Content-Type: application/json" \
  -d '{
    "assistant_id": "my_agent",
    "input": {
      "messages": [{"role": "human", "content": "What is the status?"}]
    },
    "stream_mode": "messages"
  }'

Using the Python SDK

The LangGraph SDK provides a typed client for interacting with deployed agents:

from langgraph_sdk import get_client

client = get_client(url="https://your-deployment.langgraph.app")

# Create a thread
thread = await client.threads.create()

# Run the agent
result = await client.runs.create(
    thread_id=thread["thread_id"],
    assistant_id="my_agent",
    input={"messages": [{"role": "human", "content": "Track order 12345"}]},
)

# Stream responses
async for chunk in client.runs.stream(
    thread_id=thread["thread_id"],
    assistant_id="my_agent",
    input={"messages": [{"role": "human", "content": "Any updates?"}]},
    stream_mode="messages",
):
    print(chunk)

Cron Triggers for Scheduled Agents

Run agents on a schedule for monitoring, reporting, or maintenance tasks:

# langgraph.json
{
    "dependencies": ["."],
    "graphs": {
        "monitor": "./agent/monitor.py:graph"
    },
    "crons": {
        "daily_report": {
            "graph": "monitor",
            "schedule": "0 9 * * *",
            "input": {
                "messages": [{"role": "human", "content": "Generate daily status report"}]
            }
        }
    }
}

The cron trigger creates a new thread for each execution, runs the graph, and stores the result. You can query past cron runs through the API.

Monitoring and Observability

LangGraph integrates with LangSmith for tracing and monitoring:

# Set environment variables for tracing
import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "your-langsmith-key"
os.environ["LANGCHAIN_PROJECT"] = "production-agent"

Every graph execution is traced end-to-end, showing node timings, LLM calls, tool invocations, and state transitions. Set up alerts for error rates, latency spikes, and token usage.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Self-Hosted Production Patterns

If you prefer to self-host rather than use LangGraph Cloud, here are the essential patterns:

# Use PostgreSQL for production checkpointing
from langgraph.checkpoint.postgres import PostgresSaver

DB_URI = os.environ["DATABASE_URL"]

with PostgresSaver.from_conn_string(DB_URI) as checkpointer:
    checkpointer.setup()
    graph = builder.compile(checkpointer=checkpointer)

    # Wrap in FastAPI for HTTP access
    from fastapi import FastAPI
    app = FastAPI()

    @app.post("/chat/{thread_id}")
    async def chat(thread_id: str, message: str):
        config = {"configurable": {"thread_id": thread_id}}
        result = await graph.ainvoke(
            {"messages": [{"role": "human", "content": message}]},
            config,
        )
        return {"response": result["messages"][-1].content}

Use PostgreSQL for state persistence, Redis for caching, and a process manager like Gunicorn with Uvicorn workers for concurrency.

Scaling Considerations

Stateful agents require careful scaling. Each thread is independent, so you can distribute threads across workers. But a single thread's execution must happen on one worker since the in-progress state is in memory. Use sticky sessions or a queue-based architecture where each run is claimed by exactly one worker.

FAQ

How much does LangGraph Cloud cost?

LangGraph Cloud pricing is based on compute time and storage. Check the LangSmith pricing page for current rates. For high-volume deployments, self-hosting with PostgreSQL and your own compute is typically more cost-effective.

Can I deploy multiple graph versions simultaneously?

Yes. LangGraph Cloud supports versioned deployments. You can route traffic between versions using assistant IDs, enabling canary deployments and A/B testing of different agent configurations.

How do I handle secrets and API keys in production?

Never hardcode secrets. Use environment variables configured through the .env file referenced in langgraph.json or through your cloud provider's secrets management. LangGraph Cloud encrypts environment variables at rest and injects them at runtime.


#LangGraph #Production #Deployment #LangGraphCloud #Python #AgenticAI #LearnAI #AIEngineering

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

HVAC

Building an HVAC After-Hours Emergency Escalation System: A Complete Engineering Guide

How we built a fault-tolerant HVAC emergency triage and tech-dispatch platform on Kubernetes — three-tier CQRS, 11 micro-agents on the OpenAI Agents SDK + LangGraph, NATS JetStream, DTMF/SMS/WebSocket acceptance, circuit breakers, and an evaluation pipeline that catches regressions before they wake a tech at 3 AM.

AI Engineering

Azure AI Foundry + GPT-Realtime-2: Practical Deployment Guide

Deploy GPT-Realtime-2 on Azure AI Foundry. Region availability, networking, data residency, BAA, and the gotchas teams hit in the first 48 hours.

Voice AI

GPT-Realtime-2 128K Context: What It Unlocks for Voice Agents

OpenAI's GPT-Realtime-2 quadruples voice context to 128K tokens. Here is exactly what the 32K-to-128K jump changes for production phone agents.

Agentic AI

Human-in-the-Loop Hybrid Agents: 73% Fewer Errors in 2026

Fully autonomous agents are still a fantasy in production. LangGraph's interrupt() lets you pause for human approval mid-graph without losing state. We cover approve/edit/reject/respond actions and CallSphere's escalation ladder.

Agentic AI

Evaluating Agent Memory: Recall, Precision, and the Eval Pipeline Most Teams Don't Build

Memory is supposed to make agents better — but does it? Build a memory eval pipeline that measures recall, precision, contradiction rate, and the freshness/staleness tradeoff.

Agentic AI

Streaming Agent Responses with OpenAI Agents SDK and LangChain in 2026

How to stream tokens, tool-call deltas, and intermediate steps from an agent — with code for both the OpenAI Agents SDK and LangChain — and the gotchas that bite in production.