Skip to content
Agentic AI
Agentic AI12 min read0 views

Letta (formerly MemGPT) in 2026: The OS for Stateful Agent Memory

Letta treats the LLM like an OS that manages its own RAM, recall, and archival memory. Here is when this paradigm beats simple vector stores.

TL;DR — Letta (formerly MemGPT) is an "LLM-as-OS" runtime where the agent manages its own memory tiers like an operating system manages RAM and disk. If your agent needs to learn across sessions and edit its own context, Letta is the most mature option in 2026.

The mental model

flowchart TD
  Client[MCP client · Claude Desktop] --> MCP[MCP server]
  MCP --> Tool1[Tool: Calendar]
  MCP --> Tool2[Tool: CRM]
  MCP --> Tool3[Tool: KB search]
  Tool1 --> SaaS1[(Calendly)]
  Tool2 --> SaaS2[(Salesforce)]
  Tool3 --> SaaS3[(Notion)]
CallSphere reference architecture

Traditional LLM apps treat memory as something the application layer fetches and stuffs into the prompt. Letta inverts that: the agent decides what to keep in context, what to push to recall, what to archive. The model has tools to read and write its own memory tiers.

Three tiers:

  1. Core Memory — a small block that lives in the context window, like RAM. The agent reads and writes it directly each turn. Holds the agent's persona and the most important facts about the user.
  2. Recall Memory — searchable conversation history outside context, like a disk cache. The agent queries it via tool calls when needed.
  3. Archival Memory — long-term storage the agent queries via tool calls. Cold storage. Vector-indexed.

When the context is about to overflow, the agent receives a system message ("you are running out of context") and must decide what to evict to recall, what to summarize into core, and what to archive. This is the OS analogy made literal.

What changed in 2026

The MemGPT open-source project was absorbed into Letta. The platform now ships:

  • Letta Code — a memory-first coding agent that ranks #1 on the Terminal-Bench leaderboard for model-agnostic OSS coding agents.
  • Conversations API — agents share memory across parallel user experiences.
  • A rearchitected agent loop that draws lessons from ReAct, MemGPT, and Claude Code, with cleaner tool dispatch and better long-running task handling.

When to pick Letta

Pick Letta when:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
  • Your agent must remember things across sessions without an external app layer fetching memory.
  • You want the agent to edit its own persona and facts as it learns about the user.
  • You need a first-class agent runtime, not just a memory bolt-on.
  • You're building an assistant that runs for days, weeks, or indefinitely.

Skip Letta when:

  • Your workflow is stateless (one-shot tool calls).
  • You only need a vector store with metadata — that's simpler and cheaper.
  • You're already deeply invested in another agent framework and just need a memory plugin (use mem0 or Zep instead).

How CallSphere thinks about this

CallSphere's voice agents are mostly session-bounded — a single inbound or outbound call is the unit of work. We don't need Letta for that.

But our after-hours product (7 agents with explicit escalation) is exactly Letta-shaped. When a customer's caretaker calls at 11 PM about a recurring issue, the agent benefits from remembering the prior week's escalations, the family member's preferences, the on-call doctor's instructions. That state lives in our Postgres today; we've prototyped a Letta-backed version that lets the agent edit its own "what I know about this household" core memory after each call.

For our Real Estate OneRoof deployment (10 specialist agents), the buyer-journey use case is similar — a buyer searches for 6 months, talks to the agent dozens of times, and the agent should learn their preferences over that span. That's a Letta-shaped problem.

Pricing: $149 / $499 / $1499. 14-day trial. 22% affiliate.

Build steps — your first Letta agent

  1. pip install letta or run the Letta server: docker run -p 8283:8283 letta/letta.
  2. Create an agent with a persona and human profile in core memory.
  3. Add tools for whatever the agent does (DB queries, web search, internal APIs).
  4. Connect via the Letta SDK from your application.
  5. Send messages; the agent's core memory updates automatically as it learns.
  6. Inspect memory via the dashboard or agent.memory.get().
  7. Persist with the Postgres-backed deployment for production.

Code: a Letta agent that learns about the user

from letta_client import Letta

client = Letta(base_url="http://localhost:8283")

agent = client.agents.create(
    name="callsphere-after-hours",
    memory_blocks=[
        {"label": "persona", "value": "I am a calm, careful after-hours support agent."},
        {"label": "human", "value": "Unknown caller. I will learn as we talk."},
    ],
    tools=["lookup_account", "page_on_call_doctor"],
    model="openai/gpt-5",
    embedding="openai/text-embedding-3-large",
)

response = client.agents.messages.create(
    agent_id=agent.id,
    messages=[{"role": "user", "content": "It's about my mom again, the breathing thing"}],
)

# Internally the agent updated its 'human' core memory block to record the
# caller's relationship and the recurring concern. Next call benefits.

Memory tier sizing — what to put where

Sizing the three tiers correctly is the difference between a useful Letta agent and a confused one:

  • Core memory should be small and curated. A few hundred tokens at most. Persona, the most important user facts, current task. Anything else competes with the user's input for space.
  • Recall is your conversation log. Store everything; the agent searches when needed. Don't manually prune unless you have a privacy reason.
  • Archival is for learned knowledge — distilled facts and summaries the agent generated. The agent decides what to write here through tool calls.

Letta's automatic eviction logic moves content between tiers as context fills. The default behavior is reasonable; we override it only when we want a specific fact (like "this household speaks Spanish primarily") to stay in core forever.

The "agent eats its own context" failure mode

A common Letta failure is the agent over-writing core memory until it loses sight of its persona. Mitigations:

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

  1. Mark persona blocks as read-only so the agent can't accidentally overwrite them.
  2. Cap the size of writeable core blocks — Letta lets you set per-block max tokens.
  3. Run a periodic "memory health" eval — feed the agent a question it should know based on prior sessions; if it doesn't, your memory pipeline is broken.

We hit this in early prototyping; the read-only persona block fix made the agent's identity stable across hundreds of turns.

Conversations API — shared memory across users

The Conversations API lets multiple users interact with the same agent and share memory. For a household agent (mom, dad, kids all calling about the same elderly relative's care), this is exactly the model. The agent knows it's a household, not three individual relationships, and reasons accordingly.

We're prototyping this for our after-hours product where caretakers and family members both interact with the same support agent.

FAQ

Letta vs mem0 vs Zep? Letta is a runtime (the agent lives in Letta). mem0 is a memory library you wire into your existing agent. Zep is a managed memory platform with temporal knowledge graphs. Pick by where you want the agent to live.

Is Letta production-ready? Yes. The Letta Code agent ranks #1 on Terminal-Bench among OSS model-agnostic coding agents — that's a strong production signal.

Does it work with MCP? Yes — Letta agents can mount MCP servers as toolsets.

What's the licensing? Apache 2.0 for the OSS server. Letta also offers a managed cloud.

Where do I see this on CallSphere? Book a demo and we'll walk through our after-hours Letta prototype.

Sources

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

Agentic AI

Evaluating Agent Memory: Recall, Precision, and the Eval Pipeline Most Teams Don't Build

Memory is supposed to make agents better — but does it? Build a memory eval pipeline that measures recall, precision, contradiction rate, and the freshness/staleness tradeoff.

Agentic AI

LangGraph Checkpointers in Production: Durable, Resumable Agents with Eval Replay

Use LangGraph's checkpointer to make agents resumable across crashes and human-in-the-loop pauses, then replay any checkpoint into your eval pipeline.

Agentic AI

LangGraph State-Machine Architecture: A Principal-Engineer Deep Dive (2026)

How LangGraph's StateGraph, channels, and reducers actually work — with a working multi-step agent, eval hooks at every node, and the patterns that survive production.

Agentic AI

Agent Memory in LangGraph 2026: Short-Term, Long-Term, and the Patterns That Survive Production

How short-term (thread-scoped) and long-term (cross-thread) memory actually work in LangGraph, with code, schemas, and the eviction policies that keep cost predictable.

AI Infrastructure

Agent Personalization at Scale: Patterns That Work for 1M Users

Personalizing agents for one user is easy. Personalizing them for a million users is a memory-tier problem. The hot/warm/cold split and what each tier optimizes for.

Agentic AI

Neo4j Knowledge Graph Memory for AI Agents in 2026

Neo4j's agent-memory project ships short-term, long-term, and reasoning memory in one graph. Microsoft Agent Framework and LangChain both wire it in. Here is the production pattern.