mem0 in 2026: The Open-Source Memory Layer for Any Agent Stack
mem0 hit 37k+ GitHub stars and ships v1.0.4 with metadata filtering, project-level config, and timestamp backfills. Here is how to wire it as a drop-in memory bolt-on.
TL;DR — mem0 ("mem-zero") is the lightest agent memory layer that works. You import it, call
memory.add()andmemory.search(), and your agent now has long-term memory. v1.0.4 (Feb 2026) adds metadata filtering, scoped config, and backfill timestamps. 37k+ GitHub stars, framework-agnostic.
The pitch
flowchart TD
Client[MCP client · Claude Desktop] --> MCP[MCP server]
MCP --> Tool1[Tool: Calendar]
MCP --> Tool2[Tool: CRM]
MCP --> Tool3[Tool: KB search]
Tool1 --> SaaS1[(Calendly)]
Tool2 --> SaaS2[(Salesforce)]
Tool3 --> SaaS3[(Notion)]mem0 is a memory library, not an agent runtime. You keep your existing agent stack — OpenAI Agents SDK, LangGraph, CrewAI, smolagents, whatever — and bolt on memory in two function calls:
from mem0 import Memory
m = Memory()
# After a turn, store what was learned
m.add("User prefers Modal over Docker for sandboxes", user_id="sagar")
# Before next turn, recall
related = m.search("which sandbox does the user prefer?", user_id="sagar")
That's the whole API surface. Behind it: an LLM extracts memorable facts from raw text, a vector store indexes them, retrieval finds the relevant ones at recall time. The library handles deduplication, conflict resolution (new fact contradicts old fact → update), and decay.
What's in mem0 in 2026
- v1.0.0 brought metadata filtering — write structured metadata alongside memories and filter at search time. Scoped queries like "retrieve only memories tagged with this project" or "retrieve only memories from this time range" became first-class.
- v1.0.3 (Jan 2026) added inclusion/exclusion prompts, memory depth, and use-case settings as project-level config. You can now configure mem0's behavior per project rather than globally.
- v1.0.4 (Feb 2026) added a
timestampparameter onupdate()for backfilling memory updates with accurate creation times — important for migrations. - Two ways to run: as a library inside your app (Python or Node), or as a self-hosted server with a dashboard, per-user API keys, and request audit logs.
Where mem0 fits
Pick mem0 when:
- You already have an agent stack and want to add memory without rewriting.
- You need per-user memory isolation at scale (mem0 has
user_id,agent_id,run_idpartitions out of the box). - You want OSS-first — running on your infra with your vector DB.
- You're integrating with AWS — recent integrations with ElastiCache for Valkey and Neptune Analytics make mem0 a natural fit on AWS.
Skip mem0 when:
- Your agent needs temporal knowledge graphs (use Zep / Graphiti).
- Your agent needs to edit its own context as part of the loop (use Letta).
- Your workflow is stateless — adding memory adds latency and cost for no benefit.
How CallSphere uses it
mem0 powers our per-prospect outbound research memory. When CallSphere's GTM engine reaches out to a prospect, it stores everything it learned (LinkedIn role, company funding stage, tech-stack signals, prior conversation snippets) under user_id=<prospect_email>. The next outbound touch retrieves that memory before drafting the email, so the second touch never asks "what does your company do?" — it references the first touch's context.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
For our Real Estate OneRoof, mem0 stores buyer preferences across the months-long buyer journey: school district priorities, must-haves, deal-breakers, family stage. The agent searches by buyer ID before each conversation.
For our IT Services UrackIT deployment, mem0 sits next to the ChromaDB RAG layer — ChromaDB has the company's ticket corpus; mem0 has the per-customer learnings the agent picks up during live troubleshooting.
Pricing: $149 / $499 / $1499. 14-day trial. 22% affiliate.
Build steps — drop-in memory in 10 minutes
pip install mem0ai(ornpm i mem0ai).- Configure a vector store (Pinecone, Qdrant, pgvector, Chroma).
- Configure an LLM (the extractor) — OpenAI, Anthropic, or any LiteLLM provider.
- Initialize:
m = Memory.from_config({...}). - After each agent turn, call
m.add(turn_text, user_id=...). - Before each agent turn, call
m.search(user_query, user_id=...)and prepend results to the system prompt. - Add metadata filters once you have multiple workflows:
m.search(..., filters={"workflow": "outbound_research"}).
Code: mem0 with metadata filters (v1.0+)
from mem0 import Memory
m = Memory()
m.add(
"Prefers async meetings, EST 9am-2pm, no Mondays",
user_id="sagar",
metadata={"workflow": "scheduling", "source": "email"},
)
# Targeted retrieval
results = m.search(
"when can we schedule the call?",
user_id="sagar",
filters={"metadata.workflow": "scheduling"},
)
Build steps — self-hosted server
docker run -p 8000:8000 mem0ai/mem0:latest(or use the docker-compose).- Configure Postgres for persistence.
- Mount your vector DB credentials.
- Create per-user API keys via the dashboard.
- Point your apps at the self-hosted endpoint.
- Wire audit logs to your SIEM.
- Back up the Postgres + vector index daily.
Memory extraction — what mem0 actually stores
The internal LLM extractor doesn't store raw conversation text. It distills each input into atomic facts ("user prefers async meetings," "user works in EST timezone"). These atoms are what get vectorized and indexed. On retrieval the agent receives a list of relevant atoms, not raw turns.
The benefits compound:
- Storage stays bounded — atoms compress conversation into facts.
- Retrieval is more focused — semantic similarity matches atoms, not noisy chat turns.
- Conflicts are explicit — when "user prefers EST" meets "user moved to PST," the framework can see and reconcile.
The cost: you pay an LLM call per write. Budget this in your cost model. For high-write workloads (chatbots with many turns per session), batch writes or downsample.
Mem0 vs vector DB — when do you not need mem0?
If your "memory" is a static knowledge base (product docs, support articles, past tickets), you don't need mem0 — you need a regular vector DB with a retrieval layer. mem0's value kicks in when memories are generated during conversations and need extraction, conflict resolution, and per-user partitioning.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
The simple test: if you're storing things you typed, use a vector DB. If you're storing things the user said, use mem0.
Self-hosted server vs library mode
Two operational profiles:
- Library mode: import
mem0directly; the vector DB is your responsibility. Best for single-tenant apps and full control. - Server mode: run the mem0 server; your apps call it over HTTP. Best for multi-tenant apps where you want a central memory service with per-user API keys, audit logs, and a dashboard.
CallSphere runs server mode behind our existing API gateway. Each tenant gets their own API key and namespace; the dashboard is for ops to debug "why didn't the agent remember X?" by viewing the actual stored atoms.
FAQ
mem0 vs Letta vs Zep? mem0 is a library you import; Letta is a runtime your agent lives in; Zep is a managed temporal-graph platform. Pick by integration depth.
What vector store should I use? pgvector if you already run Postgres. Pinecone if you want managed. Chroma for local dev.
Does mem0 work with MCP? It can be exposed as an MCP server (community implementations exist) so your agents can read/write memories as tool calls.
Is the OSS version production-ready? Yes — 37k+ stars, AWS integrations, well-tested API. We run it in production.
How do I demo this on CallSphere? Book a demo; we'll show the per-prospect memory feeding our outbound engine.
Sources
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.