---
title: "Production LangGraph: Deploying Stateful Agents with LangGraph Cloud"
description: "Deploy LangGraph agents to production using LangGraph Cloud with API endpoints, cron triggers, monitoring, scaling strategies, and operational best practices for stateful agent workflows."
canonical: https://callsphere.ai/blog/production-langgraph-deploying-stateful-agents-langgraph-cloud
category: "Learn Agentic AI"
tags: ["LangGraph", "Production", "Deployment", "LangGraph Cloud", "Python"]
author: "CallSphere Team"
published: 2026-03-17T00:00:00.000Z
updated: 2026-05-08T22:40:58.679Z
---

# Production LangGraph: Deploying Stateful Agents with LangGraph Cloud

> Deploy LangGraph agents to production using LangGraph Cloud with API endpoints, cron triggers, monitoring, scaling strategies, and operational best practices for stateful agent workflows.

## From Development to Production

Building a LangGraph agent locally is straightforward. Running one in production — handling concurrent users, persisting state across restarts, monitoring execution, recovering from failures, and scaling under load — requires careful architecture. LangGraph Cloud provides managed infrastructure for deploying stateful agents, but you can also self-host with the right patterns.

## Structuring Your Project for Deployment

LangGraph Cloud expects a specific project layout:

```mermaid
flowchart TD
    USER(["User input"])
    SUPER["Supervisor node
routes by state"]
    A["Specialist node A
research"]
    B["Specialist node B
writing"]
    TOOL{"Tool call
needed?"}
    EXEC["Tool executor
ToolNode"]
    CHK[("Postgres
checkpointer")]
    INT{"interrupt for
human approval?"}
    HUMAN(["Human reviewer"])
    OUT(["Final response"])
    USER --> SUPER
    SUPER --> A
    SUPER --> B
    A --> TOOL
    B --> TOOL
    TOOL -->|Yes| EXEC --> SUPER
    TOOL -->|No| INT
    INT -->|Yes| HUMAN --> SUPER
    INT -->|No| OUT
    SUPER  CHK
    style SUPER fill:#4f46e5,stroke:#4338ca,color:#fff
    style CHK fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff
    style HUMAN fill:#f59e0b,stroke:#d97706,color:#1f2937
```

```python
# langgraph.json — deployment configuration
{
    "dependencies": ["."],
    "graphs": {
        "my_agent": "./agent/graph.py:graph"
    },
    "env": ".env"
}
```

The `graphs` field maps endpoint names to compiled graph objects. Your graph module exports the compiled graph:

```python
# agent/graph.py
from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langgraph.prebuilt import ToolNode

class AgentState(TypedDict):
    messages: Annotated[list, add_messages]

@tool
def lookup_order(order_id: str) -> str:
    """Look up order details by ID."""
    # Production implementation here
    return f"Order {order_id}: shipped, arriving March 20"

tools = [lookup_order]
llm = ChatOpenAI(model="gpt-4o-mini").bind_tools(tools)
tool_node = ToolNode(tools)

def agent(state: AgentState) -> dict:
    return {"messages": [llm.invoke(state["messages"])]}

def should_continue(state: AgentState):
    last = state["messages"][-1]
    if hasattr(last, "tool_calls") and last.tool_calls:
        return "tools"
    return "end"

builder = StateGraph(AgentState)
builder.add_node("agent", agent)
builder.add_node("tools", tool_node)
builder.add_edge(START, "agent")
builder.add_conditional_edges("agent", should_continue, {
    "tools": "tools",
    "end": END,
})
builder.add_edge("tools", "agent")

graph = builder.compile()
```

## Deploying to LangGraph Cloud

Deploy using the LangGraph CLI:

```bash
pip install langgraph-cli
langgraph up
```

This starts a local development server. For cloud deployment:

```bash
langgraph deploy --project my-agent
```

The deployment creates API endpoints for your graph with built-in persistence, streaming, and thread management.

## API Endpoints

Once deployed, LangGraph Cloud exposes REST endpoints:

```bash
# Create a new thread
curl -X POST https://your-deployment.langgraph.app/threads \
  -H "Content-Type: application/json" \
  -d '{}'

# Run the agent on a thread
curl -X POST https://your-deployment.langgraph.app/threads/{thread_id}/runs \
  -H "Content-Type: application/json" \
  -d '{
    "assistant_id": "my_agent",
    "input": {
      "messages": [{"role": "human", "content": "Track order 12345"}]
    }
  }'

# Stream responses
curl -X POST https://your-deployment.langgraph.app/threads/{thread_id}/runs/stream \
  -H "Content-Type: application/json" \
  -d '{
    "assistant_id": "my_agent",
    "input": {
      "messages": [{"role": "human", "content": "What is the status?"}]
    },
    "stream_mode": "messages"
  }'
```

## Using the Python SDK

The LangGraph SDK provides a typed client for interacting with deployed agents:

```python
from langgraph_sdk import get_client

client = get_client(url="https://your-deployment.langgraph.app")

# Create a thread
thread = await client.threads.create()

# Run the agent
result = await client.runs.create(
    thread_id=thread["thread_id"],
    assistant_id="my_agent",
    input={"messages": [{"role": "human", "content": "Track order 12345"}]},
)

# Stream responses
async for chunk in client.runs.stream(
    thread_id=thread["thread_id"],
    assistant_id="my_agent",
    input={"messages": [{"role": "human", "content": "Any updates?"}]},
    stream_mode="messages",
):
    print(chunk)
```

## Cron Triggers for Scheduled Agents

Run agents on a schedule for monitoring, reporting, or maintenance tasks:

```python
# langgraph.json
{
    "dependencies": ["."],
    "graphs": {
        "monitor": "./agent/monitor.py:graph"
    },
    "crons": {
        "daily_report": {
            "graph": "monitor",
            "schedule": "0 9 * * *",
            "input": {
                "messages": [{"role": "human", "content": "Generate daily status report"}]
            }
        }
    }
}
```

The cron trigger creates a new thread for each execution, runs the graph, and stores the result. You can query past cron runs through the API.

## Monitoring and Observability

LangGraph integrates with LangSmith for tracing and monitoring:

```python
# Set environment variables for tracing
import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "your-langsmith-key"
os.environ["LANGCHAIN_PROJECT"] = "production-agent"
```

Every graph execution is traced end-to-end, showing node timings, LLM calls, tool invocations, and state transitions. Set up alerts for error rates, latency spikes, and token usage.

## Self-Hosted Production Patterns

If you prefer to self-host rather than use LangGraph Cloud, here are the essential patterns:

```python
# Use PostgreSQL for production checkpointing
from langgraph.checkpoint.postgres import PostgresSaver

DB_URI = os.environ["DATABASE_URL"]

with PostgresSaver.from_conn_string(DB_URI) as checkpointer:
    checkpointer.setup()
    graph = builder.compile(checkpointer=checkpointer)

    # Wrap in FastAPI for HTTP access
    from fastapi import FastAPI
    app = FastAPI()

    @app.post("/chat/{thread_id}")
    async def chat(thread_id: str, message: str):
        config = {"configurable": {"thread_id": thread_id}}
        result = await graph.ainvoke(
            {"messages": [{"role": "human", "content": message}]},
            config,
        )
        return {"response": result["messages"][-1].content}
```

Use PostgreSQL for state persistence, Redis for caching, and a process manager like Gunicorn with Uvicorn workers for concurrency.

## Scaling Considerations

Stateful agents require careful scaling. Each thread is independent, so you can distribute threads across workers. But a single thread's execution must happen on one worker since the in-progress state is in memory. Use sticky sessions or a queue-based architecture where each run is claimed by exactly one worker.

## FAQ

### How much does LangGraph Cloud cost?

LangGraph Cloud pricing is based on compute time and storage. Check the LangSmith pricing page for current rates. For high-volume deployments, self-hosting with PostgreSQL and your own compute is typically more cost-effective.

### Can I deploy multiple graph versions simultaneously?

Yes. LangGraph Cloud supports versioned deployments. You can route traffic between versions using assistant IDs, enabling canary deployments and A/B testing of different agent configurations.

### How do I handle secrets and API keys in production?

Never hardcode secrets. Use environment variables configured through the `.env` file referenced in `langgraph.json` or through your cloud provider's secrets management. LangGraph Cloud encrypts environment variables at rest and injects them at runtime.

---

#LangGraph #Production #Deployment #LangGraphCloud #Python #AgenticAI #LearnAI #AIEngineering

---

Source: https://callsphere.ai/blog/production-langgraph-deploying-stateful-agents-langgraph-cloud
