TL;DR — Letta (formerly MemGPT) is an "LLM-as-OS" runtime where the agent manages its own memory tiers like an operating system manages RAM and disk. If your agent needs to learn across sessions and edit its own context, Letta is the most mature option in 2026.

The mental model

flowchart TD
  Client[MCP client · Claude Desktop] --> MCP[MCP server]
  MCP --> Tool1[Tool: Calendar]
  MCP --> Tool2[Tool: CRM]
  MCP --> Tool3[Tool: KB search]
  Tool1 --> SaaS1[(Calendly)]
  Tool2 --> SaaS2[(Salesforce)]
  Tool3 --> SaaS3[(Notion)]

CallSphere reference architecture

Traditional LLM apps treat memory as something the application layer fetches and stuffs into the prompt. Letta inverts that: the agent decides what to keep in context, what to push to recall, what to archive. The model has tools to read and write its own memory tiers.

Three tiers:

Core Memory — a small block that lives in the context window, like RAM. The agent reads and writes it directly each turn. Holds the agent's persona and the most important facts about the user.
Recall Memory — searchable conversation history outside context, like a disk cache. The agent queries it via tool calls when needed.
Archival Memory — long-term storage the agent queries via tool calls. Cold storage. Vector-indexed.

When the context is about to overflow, the agent receives a system message ("you are running out of context") and must decide what to evict to recall, what to summarize into core, and what to archive. This is the OS analogy made literal.

What changed in 2026

The MemGPT open-source project was absorbed into Letta. The platform now ships:

Letta Code — a memory-first coding agent that ranks #1 on the Terminal-Bench leaderboard for model-agnostic OSS coding agents.
Conversations API — agents share memory across parallel user experiences.
A rearchitected agent loop that draws lessons from ReAct, MemGPT, and Claude Code, with cleaner tool dispatch and better long-running task handling.

When to pick Letta

Pick Letta when:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

Your agent must remember things across sessions without an external app layer fetching memory.
You want the agent to edit its own persona and facts as it learns about the user.
You need a first-class agent runtime, not just a memory bolt-on.
You're building an assistant that runs for days, weeks, or indefinitely.

Skip Letta when:

Your workflow is stateless (one-shot tool calls).
You only need a vector store with metadata — that's simpler and cheaper.
You're already deeply invested in another agent framework and just need a memory plugin (use mem0 or Zep instead).

How CallSphere thinks about this

CallSphere's voice agents are mostly session-bounded — a single inbound or outbound call is the unit of work. We don't need Letta for that.

But our after-hours product (7 agents with explicit escalation) is exactly Letta-shaped. When a customer's caretaker calls at 11 PM about a recurring issue, the agent benefits from remembering the prior week's escalations, the family member's preferences, the on-call doctor's instructions. That state lives in our Postgres today; we've prototyped a Letta-backed version that lets the agent edit its own "what I know about this household" core memory after each call.

For our Real Estate OneRoof deployment (10 specialist agents), the buyer-journey use case is similar — a buyer searches for 6 months, talks to the agent dozens of times, and the agent should learn their preferences over that span. That's a Letta-shaped problem.

Pricing: $149 / $499 / $1499. 14-day trial. 22% affiliate.

Build steps — your first Letta agent

pip install letta or run the Letta server: docker run -p 8283:8283 letta/letta.
Create an agent with a persona and human profile in core memory.
Add tools for whatever the agent does (DB queries, web search, internal APIs).
Connect via the Letta SDK from your application.
Send messages; the agent's core memory updates automatically as it learns.
Inspect memory via the dashboard or agent.memory.get().
Persist with the Postgres-backed deployment for production.

Code: a Letta agent that learns about the user

from letta_client import Letta

client = Letta(base_url="http://localhost:8283")

agent = client.agents.create(
    name="callsphere-after-hours",
    memory_blocks=[
        {"label": "persona", "value": "I am a calm, careful after-hours support agent."},
        {"label": "human", "value": "Unknown caller. I will learn as we talk."},
    ],
    tools=["lookup_account", "page_on_call_doctor"],
    model="openai/gpt-5",
    embedding="openai/text-embedding-3-large",
)

response = client.agents.messages.create(
    agent_id=agent.id,
    messages=[{"role": "user", "content": "It's about my mom again, the breathing thing"}],
)

# Internally the agent updated its 'human' core memory block to record the
# caller's relationship and the recurring concern. Next call benefits.

Memory tier sizing — what to put where

Sizing the three tiers correctly is the difference between a useful Letta agent and a confused one:

Core memory should be small and curated. A few hundred tokens at most. Persona, the most important user facts, current task. Anything else competes with the user's input for space.
Recall is your conversation log. Store everything; the agent searches when needed. Don't manually prune unless you have a privacy reason.
Archival is for learned knowledge — distilled facts and summaries the agent generated. The agent decides what to write here through tool calls.

Letta's automatic eviction logic moves content between tiers as context fills. The default behavior is reasonable; we override it only when we want a specific fact (like "this household speaks Spanish primarily") to stay in core forever.

The "agent eats its own context" failure mode

A common Letta failure is the agent over-writing core memory until it loses sight of its persona. Mitigations:

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Mark persona blocks as read-only so the agent can't accidentally overwrite them.
Cap the size of writeable core blocks — Letta lets you set per-block max tokens.
Run a periodic "memory health" eval — feed the agent a question it should know based on prior sessions; if it doesn't, your memory pipeline is broken.

We hit this in early prototyping; the read-only persona block fix made the agent's identity stable across hundreds of turns.

Conversations API — shared memory across users

The Conversations API lets multiple users interact with the same agent and share memory. For a household agent (mom, dad, kids all calling about the same elderly relative's care), this is exactly the model. The agent knows it's a household, not three individual relationships, and reasons accordingly.

We're prototyping this for our after-hours product where caretakers and family members both interact with the same support agent.

FAQ

Letta vs mem0 vs Zep? Letta is a runtime (the agent lives in Letta). mem0 is a memory library you wire into your existing agent. Zep is a managed memory platform with temporal knowledge graphs. Pick by where you want the agent to live.

Is Letta production-ready? Yes. The Letta Code agent ranks #1 on Terminal-Bench among OSS model-agnostic coding agents — that's a strong production signal.

Does it work with MCP? Yes — Letta agents can mount MCP servers as toolsets.

What's the licensing? Apache 2.0 for the OSS server. Letta also offers a managed cloud.

Where do I see this on CallSphere? Book a demo and we'll walk through our after-hours Letta prototype.

Letta (formerly MemGPT) in 2026: The OS for Stateful Agent Memory

The mental model

What changed in 2026

When to pick Letta

How CallSphere thinks about this

Build steps — your first Letta agent

Code: a Letta agent that learns about the user

Memory tier sizing — what to put where

The "agent eats its own context" failure mode

Conversations API — shared memory across users

FAQ

Sources

Try CallSphere AI Voice Agents

Related Articles You May Like

Evaluating Agent Memory: Recall, Precision, and the Eval Pipeline Most Teams Don't Build

LangGraph Checkpointers in Production: Durable, Resumable Agents with Eval Replay

LangGraph State-Machine Architecture: A Principal-Engineer Deep Dive (2026)

Agent Memory in LangGraph 2026: Short-Term, Long-Term, and the Patterns That Survive Production

Agent Personalization at Scale: Patterns That Work for 1M Users

Neo4j Knowledge Graph Memory for AI Agents in 2026