The Long Conversation Problem

Every AI agent faces a fundamental constraint: the context window. A conversation that starts with a simple question and evolves over dozens of turns accumulates history. At some point, the raw history exceeds the model's context limit — or the input token cost becomes untenable.

Naive solutions (truncating the oldest messages, using a sliding window) throw away potentially important context. The user might reference something from the beginning of the conversation, and if you dropped it, the agent hallucinates or asks the user to repeat themselves.

Response compaction is a smarter approach: instead of dropping old messages, the system summarizes them — compressing the history into a shorter representation that preserves the essential information.

OpenAIResponsesCompactionSession

The OpenAI Agents SDK provides OpenAIResponsesCompactionSession — a session wrapper that automatically compacts conversation history when it gets too long.

flowchart LR
    INPUT(["User intent"])
    PARSE["Parse plus<br/>classify"]
    PLAN["Plan and tool<br/>selection"]
    AGENT["Agent loop<br/>LLM plus tools"]
    GUARD{"Guardrails<br/>and policy"}
    EXEC["Execute and<br/>verify result"]
    OBS[("Trace and metrics")]
    OUT(["Outcome plus<br/>next action"])
    INPUT --> PARSE --> PLAN --> AGENT --> GUARD
    GUARD -->|Pass| EXEC --> OUT
    GUARD -->|Fail| AGENT
    AGENT --> OBS
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OBS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff

from agents.extensions.sessions import (
    SQLiteSession,
    OpenAIResponsesCompactionSession,
)

base_session = SQLiteSession(db_path="./conversations.db")

compaction_session = OpenAIResponsesCompactionSession(
    session=base_session,
)

This wraps any base session with compaction capabilities. When the conversation history crosses a token threshold, the session automatically summarizes older turns before they are sent to the model.

How Auto-Compaction Works

The compaction session monitors the token count of the conversation history. When it crosses the configured threshold, it triggers compaction automatically:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

The session estimates the token count of all stored items.
If the count exceeds the threshold, compaction is triggered.
The older portion of the conversation is sent to the model for summarization.
The summary replaces the detailed history.
Recent messages are preserved in full detail.

from agents import Agent, Runner
from agents.extensions.sessions import (
    SQLiteSession,
    OpenAIResponsesCompactionSession,
)

base = SQLiteSession(db_path="./compact_demo.db")
session = OpenAIResponsesCompactionSession(session=base)

agent = Agent(
    name="LongConversationAgent",
    instructions="You are a research assistant helping with a long project.",
)

# This conversation can run for hundreds of turns
# Compaction kicks in automatically when history gets too long
async def research_session(session_id: str):
    questions = [
        "Let's research quantum computing applications.",
        "What about quantum error correction?",
        "How does surface code work?",
        # ... hundreds more turns
        "Summarize everything we've discussed about error correction.",
    ]

    for q in questions:
        result = await Runner.run(
            agent, q, session=session, session_id=session_id
        )
        print(result.final_output)

The agent can handle arbitrarily long conversations without hitting context limits or accumulating unbounded costs.

Manual Compaction with run_compaction()

Sometimes you want to trigger compaction explicitly — for example, at the end of a logical section of conversation, or before a handoff to another agent.

from agents.extensions.sessions import (
    SQLiteSession,
    OpenAIResponsesCompactionSession,
)

base = SQLiteSession(db_path="./sessions.db")
session = OpenAIResponsesCompactionSession(session=base)

# After a long discussion, manually compact
await session.run_compaction(session_id="project-alpha")

# Now the history is summarized and shorter
items = await session.get_items("project-alpha")
print(f"Items after compaction: {len(items)}")

Manual compaction is useful at natural conversation boundaries:

async def handle_conversation_phase(
    session: OpenAIResponsesCompactionSession,
    session_id: str,
    agent: Agent,
    messages: list[str],
):
    """Process a phase of conversation, then compact."""
    for msg in messages:
        await Runner.run(agent, msg, session=session, session_id=session_id)

    # Compact after each phase to keep history manageable
    await session.run_compaction(session_id)
    print(f"Phase complete, history compacted for {session_id}")

Disabling Auto-Compaction

If you want full control over when compaction happens, disable the automatic trigger:

session = OpenAIResponsesCompactionSession(
    session=base_session,
    auto_compact=False,  # Disable automatic compaction
)

# Now compaction only happens when you call it explicitly
await session.run_compaction(session_id)

This is useful when:

You have custom logic for when compaction should occur
You want to compact only at specific conversation milestones
You need to ensure compaction does not interrupt time-sensitive interactions

Custom Compaction Triggers with should_trigger_compaction

For fine-grained control, implement a custom callback that decides when compaction should fire:

from agents.extensions.sessions import (
    SQLiteSession,
    OpenAIResponsesCompactionSession,
)

def custom_trigger(items: list, token_estimate: int) -> bool:
    """Custom logic for when to trigger compaction."""
    # Compact if over 50,000 tokens
    if token_estimate > 50_000:
        return True

    # Compact if over 100 items regardless of token count
    if len(items) > 100:
        return True

    # Don't compact small conversations
    return False

base = SQLiteSession(db_path="./sessions.db")
session = OpenAIResponsesCompactionSession(
    session=base,
    should_trigger_compaction=custom_trigger,
)

Advanced: Time-Based Compaction

Compact history that is older than a certain threshold:

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

from datetime import datetime, timedelta

def time_based_trigger(items: list, token_estimate: int) -> bool:
    """Compact if the oldest item is more than 2 hours old."""
    if not items:
        return False

    oldest_timestamp = items[0].get("created_at")
    if oldest_timestamp:
        age = datetime.utcnow() - datetime.fromisoformat(oldest_timestamp)
        if age > timedelta(hours=2) and token_estimate > 10_000:
            return True

    return False

Token Management in Long Conversations

Compaction is one part of a broader token management strategy. Here is a complete approach:

Layer 1: Session Limits

Cap the number of items loaded from the session:

from agents.extensions.sessions import SessionSettings

settings = SessionSettings(limit=50)

Layer 2: Compaction

Summarize older history to reduce token usage:

session = OpenAIResponsesCompactionSession(session=base)

Layer 3: Token Budgeting

Track and budget token usage across the conversation:

class TokenBudgetManager:
    def __init__(self, max_input_tokens: int = 100_000):
        self.max_input_tokens = max_input_tokens
        self.total_input_tokens = 0
        self.total_output_tokens = 0

    def track_usage(self, result):
        """Track token usage from a run result."""
        usage = result.raw_responses[-1].usage
        self.total_input_tokens += usage.input_tokens
        self.total_output_tokens += usage.output_tokens

    def should_compact(self) -> bool:
        """Signal compaction when approaching budget."""
        return self.total_input_tokens > self.max_input_tokens * 0.8

    def get_report(self) -> dict:
        return {
            "total_input": self.total_input_tokens,
            "total_output": self.total_output_tokens,
            "budget_remaining": self.max_input_tokens - self.total_input_tokens,
        }

Combining All Layers

budget = TokenBudgetManager(max_input_tokens=200_000)

async def managed_conversation(session_id: str, message: str):
    result = await Runner.run(
        agent,
        message,
        session=compaction_session,
        session_id=session_id,
        session_settings=SessionSettings(limit=80),
    )

    budget.track_usage(result)

    if budget.should_compact():
        await compaction_session.run_compaction(session_id)
        print("Compacted due to token budget pressure")

    return result.final_output

What Gets Preserved During Compaction

Compaction is not lossy — it is a summarization. The model that performs compaction is instructed to preserve:

Key facts and decisions made during the conversation
User preferences and stated requirements
Action items and commitments
Names, dates, numbers, and other specific details
The overall trajectory and context of the conversation

What gets compressed:

Verbose explanations that can be summarized
Back-and-forth clarification exchanges
Redundant information repeated across turns
Tool call details (replaced with outcome summaries)

The result is a compact representation that captures the essence of the conversation while using far fewer tokens.

Sources:

Response Compaction: Managing Long Agent Conversations

The Long Conversation Problem

OpenAIResponsesCompactionSession

How Auto-Compaction Works

Manual Compaction with run_compaction()

Disabling Auto-Compaction

Custom Compaction Triggers with should_trigger_compaction

Advanced: Time-Based Compaction

Token Management in Long Conversations

Layer 1: Session Limits

Layer 2: Compaction

Layer 3: Token Budgeting

Combining All Layers

What Gets Preserved During Compaction

Try CallSphere AI Voice Agents

Related Articles You May Like

Desktop AI Agents in 2026: Project Arc, Claude Cowork, OpenAI Agents Compared

OpenAI Frontier: Model-Native Orchestration Is the Default in 2026

Gemini Enterprise vs Anthropic vs OpenAI Frontier: 2026 Comparison

Anthropic's Financial Services Platform: State of Play in May 2026

Model-Native Harness: Why OpenAI and Anthropic Are Killing ReAct Loops

GPT-Realtime-Whisper vs Deepgram: Streaming STT in 2026