Production LangGraph: Deploying Stateful Agents with LangGraph Cloud

From Development to Production

Building a LangGraph agent locally is straightforward. Running one in production — handling concurrent users, persisting state across restarts, monitoring execution, recovering from failures, and scaling under load — requires careful architecture. LangGraph Cloud provides managed infrastructure for deploying stateful agents, but you can also self-host with the right patterns.

Structuring Your Project for Deployment

LangGraph Cloud expects a specific project layout:

flowchart TD
    USER(["User input"])
    SUPER["Supervisor node<br/>routes by state"]
    A["Specialist node A<br/>research"]
    B["Specialist node B<br/>writing"]
    TOOL{"Tool call<br/>needed?"}
    EXEC["Tool executor<br/>ToolNode"]
    CHK[("Postgres<br/>checkpointer")]
    INT{"interrupt for<br/>human approval?"}
    HUMAN(["Human reviewer"])
    OUT(["Final response"])
    USER --> SUPER
    SUPER --> A
    SUPER --> B
    A --> TOOL
    B --> TOOL
    TOOL -->|Yes| EXEC --> SUPER
    TOOL -->|No| INT
    INT -->|Yes| HUMAN --> SUPER
    INT -->|No| OUT
    SUPER <--> CHK
    style SUPER fill:#4f46e5,stroke:#4338ca,color:#fff
    style CHK fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff
    style HUMAN fill:#f59e0b,stroke:#d97706,color:#1f2937

# langgraph.json — deployment configuration
{
    "dependencies": ["."],
    "graphs": {
        "my_agent": "./agent/graph.py:graph"
    },
    "env": ".env"
}

The graphs field maps endpoint names to compiled graph objects. Your graph module exports the compiled graph:

# agent/graph.py
from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langgraph.prebuilt import ToolNode

class AgentState(TypedDict):
    messages: Annotated[list, add_messages]

@tool
def lookup_order(order_id: str) -> str:
    """Look up order details by ID."""
    # Production implementation here
    return f"Order {order_id}: shipped, arriving March 20"

tools = [lookup_order]
llm = ChatOpenAI(model="gpt-4o-mini").bind_tools(tools)
tool_node = ToolNode(tools)

def agent(state: AgentState) -> dict:
    return {"messages": [llm.invoke(state["messages"])]}

def should_continue(state: AgentState):
    last = state["messages"][-1]
    if hasattr(last, "tool_calls") and last.tool_calls:
        return "tools"
    return "end"

builder = StateGraph(AgentState)
builder.add_node("agent", agent)
builder.add_node("tools", tool_node)
builder.add_edge(START, "agent")
builder.add_conditional_edges("agent", should_continue, {
    "tools": "tools",
    "end": END,
})
builder.add_edge("tools", "agent")

graph = builder.compile()

Deploying to LangGraph Cloud

Deploy using the LangGraph CLI:

pip install langgraph-cli
langgraph up

This starts a local development server. For cloud deployment:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

langgraph deploy --project my-agent

The deployment creates API endpoints for your graph with built-in persistence, streaming, and thread management.

API Endpoints

Once deployed, LangGraph Cloud exposes REST endpoints:

# Create a new thread
curl -X POST https://your-deployment.langgraph.app/threads \
  -H "Content-Type: application/json" \
  -d '{}'

# Run the agent on a thread
curl -X POST https://your-deployment.langgraph.app/threads/{thread_id}/runs \
  -H "Content-Type: application/json" \
  -d '{
    "assistant_id": "my_agent",
    "input": {
      "messages": [{"role": "human", "content": "Track order 12345"}]
    }
  }'

# Stream responses
curl -X POST https://your-deployment.langgraph.app/threads/{thread_id}/runs/stream \
  -H "Content-Type: application/json" \
  -d '{
    "assistant_id": "my_agent",
    "input": {
      "messages": [{"role": "human", "content": "What is the status?"}]
    },
    "stream_mode": "messages"
  }'

Using the Python SDK

The LangGraph SDK provides a typed client for interacting with deployed agents:

from langgraph_sdk import get_client

client = get_client(url="https://your-deployment.langgraph.app")

# Create a thread
thread = await client.threads.create()

# Run the agent
result = await client.runs.create(
    thread_id=thread["thread_id"],
    assistant_id="my_agent",
    input={"messages": [{"role": "human", "content": "Track order 12345"}]},
)

# Stream responses
async for chunk in client.runs.stream(
    thread_id=thread["thread_id"],
    assistant_id="my_agent",
    input={"messages": [{"role": "human", "content": "Any updates?"}]},
    stream_mode="messages",
):
    print(chunk)

Cron Triggers for Scheduled Agents

Run agents on a schedule for monitoring, reporting, or maintenance tasks:

# langgraph.json
{
    "dependencies": ["."],
    "graphs": {
        "monitor": "./agent/monitor.py:graph"
    },
    "crons": {
        "daily_report": {
            "graph": "monitor",
            "schedule": "0 9 * * *",
            "input": {
                "messages": [{"role": "human", "content": "Generate daily status report"}]
            }
        }
    }
}

The cron trigger creates a new thread for each execution, runs the graph, and stores the result. You can query past cron runs through the API.

Monitoring and Observability

LangGraph integrates with LangSmith for tracing and monitoring:

# Set environment variables for tracing
import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "your-langsmith-key"
os.environ["LANGCHAIN_PROJECT"] = "production-agent"

Every graph execution is traced end-to-end, showing node timings, LLM calls, tool invocations, and state transitions. Set up alerts for error rates, latency spikes, and token usage.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Self-Hosted Production Patterns

If you prefer to self-host rather than use LangGraph Cloud, here are the essential patterns:

# Use PostgreSQL for production checkpointing
from langgraph.checkpoint.postgres import PostgresSaver

DB_URI = os.environ["DATABASE_URL"]

with PostgresSaver.from_conn_string(DB_URI) as checkpointer:
    checkpointer.setup()
    graph = builder.compile(checkpointer=checkpointer)

    # Wrap in FastAPI for HTTP access
    from fastapi import FastAPI
    app = FastAPI()

    @app.post("/chat/{thread_id}")
    async def chat(thread_id: str, message: str):
        config = {"configurable": {"thread_id": thread_id}}
        result = await graph.ainvoke(
            {"messages": [{"role": "human", "content": message}]},
            config,
        )
        return {"response": result["messages"][-1].content}

Use PostgreSQL for state persistence, Redis for caching, and a process manager like Gunicorn with Uvicorn workers for concurrency.

Scaling Considerations

Stateful agents require careful scaling. Each thread is independent, so you can distribute threads across workers. But a single thread's execution must happen on one worker since the in-progress state is in memory. Use sticky sessions or a queue-based architecture where each run is claimed by exactly one worker.

FAQ

How much does LangGraph Cloud cost?

LangGraph Cloud pricing is based on compute time and storage. Check the LangSmith pricing page for current rates. For high-volume deployments, self-hosting with PostgreSQL and your own compute is typically more cost-effective.

Can I deploy multiple graph versions simultaneously?

Yes. LangGraph Cloud supports versioned deployments. You can route traffic between versions using assistant IDs, enabling canary deployments and A/B testing of different agent configurations.

How do I handle secrets and API keys in production?

Never hardcode secrets. Use environment variables configured through the .env file referenced in langgraph.json or through your cloud provider's secrets management. LangGraph Cloud encrypts environment variables at rest and injects them at runtime.

#LangGraph #Production #Deployment #LangGraphCloud #Python #AgenticAI #LearnAI #AIEngineering

Production LangGraph: Deploying Stateful Agents with LangGraph Cloud

From Development to Production

Structuring Your Project for Deployment

Deploying to LangGraph Cloud

API Endpoints

Using the Python SDK

Cron Triggers for Scheduled Agents

Monitoring and Observability

Self-Hosted Production Patterns

Scaling Considerations

FAQ

How much does LangGraph Cloud cost?

Can I deploy multiple graph versions simultaneously?

How do I handle secrets and API keys in production?

Try CallSphere AI Voice Agents

Related Articles You May Like

Building an HVAC After-Hours Emergency Escalation System: A Complete Engineering Guide

Azure AI Foundry + GPT-Realtime-2: Practical Deployment Guide

GPT-Realtime-2 128K Context: What It Unlocks for Voice Agents

Human-in-the-Loop Hybrid Agents: 73% Fewer Errors in 2026

Evaluating Agent Memory: Recall, Precision, and the Eval Pipeline Most Teams Don't Build

Streaming Agent Responses with OpenAI Agents SDK and LangChain in 2026