---
title: "LangChain Memory: ConversationBufferMemory, Summary, and Vector Store Memory"
description: "Explore LangChain's memory types for building conversational AI — from simple buffer memory to summarization and vector-store-backed long-term memory with persistence strategies."
canonical: https://callsphere.ai/blog/langchain-memory-conversation-buffer-summary-vector-store
category: "Learn Agentic AI"
tags: ["LangChain", "Memory", "Conversational AI", "Vector Store", "Python"]
author: "CallSphere Team"
published: 2026-03-17T00:00:00.000Z
updated: 2026-05-08T19:20:37.685Z
---

# LangChain Memory: ConversationBufferMemory, Summary, and Vector Store Memory

> Explore LangChain's memory types for building conversational AI — from simple buffer memory to summarization and vector-store-backed long-term memory with persistence strategies.

## Why Agents Need Memory

Large language models are stateless. Each API call starts fresh with no knowledge of previous interactions. For multi-turn conversations or agents that need to reference past information, you must explicitly manage state. LangChain provides memory abstractions that handle this — storing conversation history, summarizing it, or persisting it in a vector store for semantic retrieval.

Understanding the tradeoffs between memory types is essential. Too much context fills your token window and increases costs. Too little context makes the assistant forget important details mid-conversation.

## ConversationBufferMemory

The simplest memory type stores every message verbatim.

```mermaid
flowchart TD
    DOC(["Document"])
    CHUNK["Chunker
recursive plus overlap"]
    EMB["Embedding model"]
    META["Attach metadata
source, page, tenant"]
    INDEX[("HNSW or IVF index
in vector store")]
    Q(["Query"])
    QEMB["Embed query"]
    SEARCH["ANN search
cosine similarity"]
    FILTER["Metadata filter
tenant or date"]
    HITS(["Top-k chunks"])
    DOC --> CHUNK --> EMB --> META --> INDEX
    Q --> QEMB --> SEARCH
    INDEX --> SEARCH --> FILTER --> HITS
    style INDEX fill:#4f46e5,stroke:#4338ca,color:#fff
    style HITS fill:#059669,stroke:#047857,color:#fff
```

```python
from langchain.memory import ConversationBufferMemory
from langchain_openai import ChatOpenAI
from langchain.chains import ConversationChain

memory = ConversationBufferMemory(return_messages=True)
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

chain = ConversationChain(llm=llm, memory=memory, verbose=True)

chain.invoke({"input": "My name is Alice."})
chain.invoke({"input": "What is my name?"})
# The model correctly responds "Alice" because it sees the full history
```

`return_messages=True` stores history as message objects rather than a single string, which is preferred for chat models. The downside is obvious: as the conversation grows, you eventually exceed the model's context window.

## ConversationBufferWindowMemory

This variant keeps only the last `k` turns, discarding older messages.

```python
from langchain.memory import ConversationBufferWindowMemory

memory = ConversationBufferWindowMemory(k=5, return_messages=True)
```

Setting `k=5` retains the most recent 5 exchanges. This bounds token usage but means the agent will forget information from earlier in the conversation.

## ConversationSummaryMemory

Instead of dropping old messages, this memory type summarizes the conversation history using an LLM. The summary is updated after each turn.

```python
from langchain.memory import ConversationSummaryMemory
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

memory = ConversationSummaryMemory(
    llm=llm,
    return_messages=True,
)

# After many turns, instead of storing all messages,
# the memory holds a running summary like:
# "The user's name is Alice. She asked about Python decorators
#  and was interested in async patterns."
```

The tradeoff is that summarization costs extra LLM calls and may lose nuance. It works well for long conversations where the gist matters more than exact wording.

## ConversationSummaryBufferMemory

This hybrid keeps recent messages in full while summarizing older ones. You set a `max_token_limit` — once the buffer exceeds that limit, the oldest messages are summarized.

```python
from langchain.memory import ConversationSummaryBufferMemory

memory = ConversationSummaryBufferMemory(
    llm=llm,
    max_token_limit=500,
    return_messages=True,
)
```

This gives you the best of both worlds: precise recent context and compressed long-term context.

## Vector Store Memory

For agents that need to recall specific facts from potentially thousands of past interactions, vector store memory embeds conversation snippets and retrieves them via semantic search.

```python
from langchain.memory import VectorStoreRetrieverMemory
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS

# Create or load a vector store
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_texts([], embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

memory = VectorStoreRetrieverMemory(retriever=retriever)

# Save facts
memory.save_context(
    {"input": "I prefer Python over JavaScript"},
    {"output": "Noted, you prefer Python."},
)
memory.save_context(
    {"input": "My project deadline is March 30th"},
    {"output": "Got it, your deadline is March 30th."},
)

# Later, only semantically relevant memories are retrieved
relevant = memory.load_memory_variables(
    {"input": "What programming language should we use?"}
)
print(relevant)
# Returns the Python preference memory, not the deadline memory
```

Vector store memory scales to thousands of interactions because retrieval is based on relevance, not recency.

## Memory with LCEL Chains

In modern LCEL-based chains, you typically manage history explicitly using `RunnableWithMessageHistory`.

```python
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

store = {}

def get_session_history(session_id: str):
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    MessagesPlaceholder("history"),
    ("human", "{input}"),
])

chain = prompt | ChatOpenAI(model="gpt-4o-mini")

with_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="history",
)

# Each session maintains its own history
response = with_history.invoke(
    {"input": "My name is Bob"},
    config={"configurable": {"session_id": "user-123"}},
)
```

This approach gives you full control over where history is stored — in memory, Redis, a database, or any custom backend.

## FAQ

### Which memory type should I use for a production chatbot?

For most production chatbots, start with `ConversationSummaryBufferMemory` or the LCEL `RunnableWithMessageHistory` with a persistent backend like Redis or PostgreSQL. The summary buffer approach balances cost, context window usage, and information retention. For applications that need to recall specific facts across many sessions, add vector store memory.

### Can I combine multiple memory types?

Yes. A common pattern is to use buffer memory for the current conversation and vector store memory for cross-session recall. You can inject both into the prompt — recent messages from the buffer and relevant past facts from the vector store.

### How do I persist memory across server restarts?

In-memory stores like `ChatMessageHistory` are lost on restart. Use persistent backends: `RedisChatMessageHistory`, `SQLChatMessageHistory`, or implement a custom `BaseChatMessageHistory` class that reads from and writes to your database.

---

#LangChain #Memory #ConversationalAI #VectorStore #Python #AgenticAI #LearnAI #AIEngineering

---

Source: https://callsphere.ai/blog/langchain-memory-conversation-buffer-summary-vector-store
