LangGraph Checkpointing: Persistence and Time Travel for Agent Workflows

Why Checkpointing Matters

Without checkpointing, a LangGraph workflow is ephemeral. If the process crashes mid-execution, all state is lost and you must start over. Checkpointing solves this by saving the graph state after every node execution. This enables three critical capabilities: crash recovery, conversation memory across sessions, and time travel to inspect or replay past states.

MemorySaver: In-Memory Checkpointing

The simplest checkpointer stores state in a Python dictionary. It is perfect for development and testing:

flowchart TD
    USER(["User input"])
    SUPER["Supervisor node<br/>routes by state"]
    A["Specialist node A<br/>research"]
    B["Specialist node B<br/>writing"]
    TOOL{"Tool call<br/>needed?"}
    EXEC["Tool executor<br/>ToolNode"]
    CHK[("Postgres<br/>checkpointer")]
    INT{"interrupt for<br/>human approval?"}
    HUMAN(["Human reviewer"])
    OUT(["Final response"])
    USER --> SUPER
    SUPER --> A
    SUPER --> B
    A --> TOOL
    B --> TOOL
    TOOL -->|Yes| EXEC --> SUPER
    TOOL -->|No| INT
    INT -->|Yes| HUMAN --> SUPER
    INT -->|No| OUT
    SUPER <--> CHK
    style SUPER fill:#4f46e5,stroke:#4338ca,color:#fff
    style CHK fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff
    style HUMAN fill:#f59e0b,stroke:#d97706,color:#1f2937

from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import StateGraph, START, END
from typing import TypedDict, Annotated
from langgraph.graph.message import add_messages

class State(TypedDict):
    messages: Annotated[list, add_messages]

def echo(state: State) -> dict:
    last = state["messages"][-1].content
    return {"messages": [{"role": "assistant", "content": f"Echo: {last}"}]}

builder = StateGraph(State)
builder.add_node("echo", echo)
builder.add_edge(START, "echo")
builder.add_edge("echo", END)

memory = MemorySaver()
graph = builder.compile(checkpointer=memory)

All state is lost when the process exits. Use this only for development.

Thread IDs for Conversation Isolation

Each conversation gets its own thread ID. This lets multiple users share the same graph instance:

from langchain_core.messages import HumanMessage

# Conversation 1
config1 = {"configurable": {"thread_id": "user-alice"}}
graph.invoke({"messages": [HumanMessage(content="Hi, I'm Alice")]}, config1)
graph.invoke({"messages": [HumanMessage(content="What's my name?")]}, config1)

# Conversation 2 — completely isolated
config2 = {"configurable": {"thread_id": "user-bob"}}
graph.invoke({"messages": [HumanMessage(content="Hi, I'm Bob")]}, config2)

Each thread maintains its own state history. Alice and Bob never see each other's messages.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

SqliteSaver: Persistent Local Storage

For persistence that survives process restarts, use the SQLite checkpointer:

from langgraph.checkpoint.sqlite import SqliteSaver
import sqlite3

conn = sqlite3.connect("checkpoints.db", check_same_thread=False)
sqlite_saver = SqliteSaver(conn)

graph = builder.compile(checkpointer=sqlite_saver)

# State persists to disk
config = {"configurable": {"thread_id": "persistent-thread"}}
graph.invoke({"messages": [HumanMessage(content="Remember this")]}, config)

# Later, even after restart, the conversation continues
result = graph.invoke(
    {"messages": [HumanMessage(content="What did I say?")]},
    config,
)

The SQLite file contains the full state history for every thread, including all intermediate checkpoints.

PostgresSaver: Production Persistence

For production deployments, use PostgreSQL:

from langgraph.checkpoint.postgres import PostgresSaver

DB_URI = "postgresql://user:password@localhost:5432/langgraph_db"

with PostgresSaver.from_conn_string(DB_URI) as pg_saver:
    pg_saver.setup()  # Creates tables on first run
    graph = builder.compile(checkpointer=pg_saver)

    config = {"configurable": {"thread_id": "prod-session-123"}}
    result = graph.invoke(
        {"messages": [HumanMessage(content="Process this order")]},
        config,
    )

PostgresSaver handles concurrent access, transactions, and connection pooling. Call setup() once to create the required checkpoint tables.

Time Travel: Inspecting Past States

Every node execution creates a checkpoint. You can list and inspect all checkpoints for a thread:

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

config = {"configurable": {"thread_id": "my-thread"}}

# Get current state
current = graph.get_state(config)
print("Current messages:", len(current.values["messages"]))

# List all checkpoints (state history)
history = list(graph.get_state_history(config))
for i, state in enumerate(history):
    print(f"Checkpoint {i}: {len(state.values['messages'])} messages")
    print(f"  Created by node: {state.metadata.get('source', 'unknown')}")

Replaying from a Past Checkpoint

You can resume execution from any historical checkpoint by providing its ID:

# Get the second-to-last checkpoint
history = list(graph.get_state_history(config))
past_state = history[2]  # Go back two steps

# Resume from that point with new input
past_config = {
    "configurable": {
        "thread_id": "my-thread",
        "checkpoint_id": past_state.config["configurable"]["checkpoint_id"],
    }
}

result = graph.invoke(
    {"messages": [HumanMessage(content="Try a different approach")]},
    past_config,
)

This creates a new branch in the state history. The original checkpoints remain untouched, giving you a full audit trail of every execution path.

FAQ

Does checkpointing add significant overhead?

MemorySaver adds negligible overhead. SqliteSaver and PostgresSaver add serialization and I/O time proportional to state size. For typical chat agents with dozens of messages, each checkpoint takes a few milliseconds. For agents with very large state objects, consider keeping state lean and storing bulk data externally.

Can I delete old checkpoints to save storage?

There is no built-in pruning API in the core library. For PostgresSaver, you can write SQL queries to delete checkpoints older than a retention period. For SqliteSaver, you can run a cleanup job against the database file directly.

Is the checkpoint format portable between saver backends?

No. Each saver serializes state in its own format. You cannot migrate checkpoints from SQLite to PostgreSQL directly. If you need to migrate, you would read state from one saver and write it to another programmatically.

#LangGraph #Checkpointing #Persistence #TimeTravel #Python #AgenticAI #LearnAI #AIEngineering

LangGraph Checkpointing: Persistence and Time Travel for Agent Workflows

Why Checkpointing Matters

MemorySaver: In-Memory Checkpointing

Thread IDs for Conversation Isolation

SqliteSaver: Persistent Local Storage

PostgresSaver: Production Persistence

Time Travel: Inspecting Past States

Replaying from a Past Checkpoint

FAQ

Does checkpointing add significant overhead?

Can I delete old checkpoints to save storage?

Is the checkpoint format portable between saver backends?

Try CallSphere AI Voice Agents

Related Articles You May Like

Building an HVAC After-Hours Emergency Escalation System: A Complete Engineering Guide

Human-in-the-Loop Hybrid Agents: 73% Fewer Errors in 2026

Evaluating Agent Memory: Recall, Precision, and the Eval Pipeline Most Teams Don't Build

Streaming Agent Responses with OpenAI Agents SDK and LangChain in 2026

Agentic RAG with LangGraph: Iterative Retrieval, Self-Correction, and Eval Pipelines

Token-Level Evaluation of Streaming Agents: TTFT, Stream Smoothness, and Mid-Stream Hallucination Detection