Skip to content
AI Engineering
AI Engineering10 min read0 views

mem0 in 2026: The Open-Source Memory Layer for Any Agent Stack

mem0 hit 37k+ GitHub stars and ships v1.0.4 with metadata filtering, project-level config, and timestamp backfills. Here is how to wire it as a drop-in memory bolt-on.

TL;DR — mem0 ("mem-zero") is the lightest agent memory layer that works. You import it, call memory.add() and memory.search(), and your agent now has long-term memory. v1.0.4 (Feb 2026) adds metadata filtering, scoped config, and backfill timestamps. 37k+ GitHub stars, framework-agnostic.

The pitch

flowchart TD
  Client[MCP client · Claude Desktop] --> MCP[MCP server]
  MCP --> Tool1[Tool: Calendar]
  MCP --> Tool2[Tool: CRM]
  MCP --> Tool3[Tool: KB search]
  Tool1 --> SaaS1[(Calendly)]
  Tool2 --> SaaS2[(Salesforce)]
  Tool3 --> SaaS3[(Notion)]
CallSphere reference architecture

mem0 is a memory library, not an agent runtime. You keep your existing agent stack — OpenAI Agents SDK, LangGraph, CrewAI, smolagents, whatever — and bolt on memory in two function calls:

from mem0 import Memory
m = Memory()

# After a turn, store what was learned
m.add("User prefers Modal over Docker for sandboxes", user_id="sagar")

# Before next turn, recall
related = m.search("which sandbox does the user prefer?", user_id="sagar")

That's the whole API surface. Behind it: an LLM extracts memorable facts from raw text, a vector store indexes them, retrieval finds the relevant ones at recall time. The library handles deduplication, conflict resolution (new fact contradicts old fact → update), and decay.

What's in mem0 in 2026

  • v1.0.0 brought metadata filtering — write structured metadata alongside memories and filter at search time. Scoped queries like "retrieve only memories tagged with this project" or "retrieve only memories from this time range" became first-class.
  • v1.0.3 (Jan 2026) added inclusion/exclusion prompts, memory depth, and use-case settings as project-level config. You can now configure mem0's behavior per project rather than globally.
  • v1.0.4 (Feb 2026) added a timestamp parameter on update() for backfilling memory updates with accurate creation times — important for migrations.
  • Two ways to run: as a library inside your app (Python or Node), or as a self-hosted server with a dashboard, per-user API keys, and request audit logs.

Where mem0 fits

Pick mem0 when:

  • You already have an agent stack and want to add memory without rewriting.
  • You need per-user memory isolation at scale (mem0 has user_id, agent_id, run_id partitions out of the box).
  • You want OSS-first — running on your infra with your vector DB.
  • You're integrating with AWS — recent integrations with ElastiCache for Valkey and Neptune Analytics make mem0 a natural fit on AWS.

Skip mem0 when:

  • Your agent needs temporal knowledge graphs (use Zep / Graphiti).
  • Your agent needs to edit its own context as part of the loop (use Letta).
  • Your workflow is stateless — adding memory adds latency and cost for no benefit.

How CallSphere uses it

mem0 powers our per-prospect outbound research memory. When CallSphere's GTM engine reaches out to a prospect, it stores everything it learned (LinkedIn role, company funding stage, tech-stack signals, prior conversation snippets) under user_id=<prospect_email>. The next outbound touch retrieves that memory before drafting the email, so the second touch never asks "what does your company do?" — it references the first touch's context.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

For our Real Estate OneRoof, mem0 stores buyer preferences across the months-long buyer journey: school district priorities, must-haves, deal-breakers, family stage. The agent searches by buyer ID before each conversation.

For our IT Services UrackIT deployment, mem0 sits next to the ChromaDB RAG layer — ChromaDB has the company's ticket corpus; mem0 has the per-customer learnings the agent picks up during live troubleshooting.

Pricing: $149 / $499 / $1499. 14-day trial. 22% affiliate.

Build steps — drop-in memory in 10 minutes

  1. pip install mem0ai (or npm i mem0ai).
  2. Configure a vector store (Pinecone, Qdrant, pgvector, Chroma).
  3. Configure an LLM (the extractor) — OpenAI, Anthropic, or any LiteLLM provider.
  4. Initialize: m = Memory.from_config({...}).
  5. After each agent turn, call m.add(turn_text, user_id=...).
  6. Before each agent turn, call m.search(user_query, user_id=...) and prepend results to the system prompt.
  7. Add metadata filters once you have multiple workflows: m.search(..., filters={"workflow": "outbound_research"}).

Code: mem0 with metadata filters (v1.0+)

from mem0 import Memory

m = Memory()

m.add(
    "Prefers async meetings, EST 9am-2pm, no Mondays",
    user_id="sagar",
    metadata={"workflow": "scheduling", "source": "email"},
)

# Targeted retrieval
results = m.search(
    "when can we schedule the call?",
    user_id="sagar",
    filters={"metadata.workflow": "scheduling"},
)

Build steps — self-hosted server

  1. docker run -p 8000:8000 mem0ai/mem0:latest (or use the docker-compose).
  2. Configure Postgres for persistence.
  3. Mount your vector DB credentials.
  4. Create per-user API keys via the dashboard.
  5. Point your apps at the self-hosted endpoint.
  6. Wire audit logs to your SIEM.
  7. Back up the Postgres + vector index daily.

Memory extraction — what mem0 actually stores

The internal LLM extractor doesn't store raw conversation text. It distills each input into atomic facts ("user prefers async meetings," "user works in EST timezone"). These atoms are what get vectorized and indexed. On retrieval the agent receives a list of relevant atoms, not raw turns.

The benefits compound:

  • Storage stays bounded — atoms compress conversation into facts.
  • Retrieval is more focused — semantic similarity matches atoms, not noisy chat turns.
  • Conflicts are explicit — when "user prefers EST" meets "user moved to PST," the framework can see and reconcile.

The cost: you pay an LLM call per write. Budget this in your cost model. For high-write workloads (chatbots with many turns per session), batch writes or downsample.

Mem0 vs vector DB — when do you not need mem0?

If your "memory" is a static knowledge base (product docs, support articles, past tickets), you don't need mem0 — you need a regular vector DB with a retrieval layer. mem0's value kicks in when memories are generated during conversations and need extraction, conflict resolution, and per-user partitioning.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

The simple test: if you're storing things you typed, use a vector DB. If you're storing things the user said, use mem0.

Self-hosted server vs library mode

Two operational profiles:

  • Library mode: import mem0 directly; the vector DB is your responsibility. Best for single-tenant apps and full control.
  • Server mode: run the mem0 server; your apps call it over HTTP. Best for multi-tenant apps where you want a central memory service with per-user API keys, audit logs, and a dashboard.

CallSphere runs server mode behind our existing API gateway. Each tenant gets their own API key and namespace; the dashboard is for ops to debug "why didn't the agent remember X?" by viewing the actual stored atoms.

FAQ

mem0 vs Letta vs Zep? mem0 is a library you import; Letta is a runtime your agent lives in; Zep is a managed temporal-graph platform. Pick by integration depth.

What vector store should I use? pgvector if you already run Postgres. Pinecone if you want managed. Chroma for local dev.

Does mem0 work with MCP? It can be exposed as an MCP server (community implementations exist) so your agents can read/write memories as tool calls.

Is the OSS version production-ready? Yes — 37k+ stars, AWS integrations, well-tested API. We run it in production.

How do I demo this on CallSphere? Book a demo; we'll show the per-prospect memory feeding our outbound engine.

Sources

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

AI Engineering

Build a Chat Agent with Haystack RAG + Open LLM (Llama 3.2, 2026)

Haystack 2.7's Agent component plus an Ollama-served Llama 3.2 gives you tool-calling RAG with citations. Here's a complete pipeline against your own document store.

Agentic AI

Agent Memory in LangGraph 2026: Short-Term, Long-Term, and the Patterns That Survive Production

How short-term (thread-scoped) and long-term (cross-thread) memory actually work in LangGraph, with code, schemas, and the eviction policies that keep cost predictable.

Agentic AI

Production RAG Agents with LangChain and RAGAS Evaluation in 2026

Build a production RAG agent with LangChain, then measure faithfulness, answer relevance, and context precision with RAGAS. The four metrics that matter and how to wire them up.

Agentic AI

Evaluating Agent Memory: Recall, Precision, and the Eval Pipeline Most Teams Don't Build

Memory is supposed to make agents better — but does it? Build a memory eval pipeline that measures recall, precision, contradiction rate, and the freshness/staleness tradeoff.

Agentic AI

Agentic RAG with LangGraph: Iterative Retrieval, Self-Correction, and Eval Pipelines

Beyond single-shot RAG — agentic RAG with LangGraph that re-retrieves, self-grades, and rewrites queries. With evals that catch silent retrieval drift.

AI Infrastructure

Agent Personalization at Scale: Patterns That Work for 1M Users

Personalizing agents for one user is easy. Personalizing them for a million users is a memory-tier problem. The hot/warm/cold split and what each tier optimizes for.