Response Compaction: Managing Long Agent Conversations
Master OpenAIResponsesCompactionSession for automatic and manual compaction of long agent conversations including token management, custom triggers, and compaction strategies.
The Long Conversation Problem
Every AI agent faces a fundamental constraint: the context window. A conversation that starts with a simple question and evolves over dozens of turns accumulates history. At some point, the raw history exceeds the model's context limit — or the input token cost becomes untenable.
Naive solutions (truncating the oldest messages, using a sliding window) throw away potentially important context. The user might reference something from the beginning of the conversation, and if you dropped it, the agent hallucinates or asks the user to repeat themselves.
Response compaction is a smarter approach: instead of dropping old messages, the system summarizes them — compressing the history into a shorter representation that preserves the essential information.
OpenAIResponsesCompactionSession
The OpenAI Agents SDK provides OpenAIResponsesCompactionSession — a session wrapper that automatically compacts conversation history when it gets too long.
flowchart TD
START["Response Compaction: Managing Long Agent Conversa…"] --> A
A["The Long Conversation Problem"]
A --> B
B["OpenAIResponsesCompactionSession"]
B --> C
C["How Auto-Compaction Works"]
C --> D
D["Manual Compaction with run_compaction"]
D --> E
E["Disabling Auto-Compaction"]
E --> F
F["Custom Compaction Triggers with should_…"]
F --> G
G["Token Management in Long Conversations"]
G --> H
H["What Gets Preserved During Compaction"]
H --> DONE["Key Takeaways"]
style START fill:#4f46e5,stroke:#4338ca,color:#fff
style DONE fill:#059669,stroke:#047857,color:#fff
from agents.extensions.sessions import (
SQLiteSession,
OpenAIResponsesCompactionSession,
)
base_session = SQLiteSession(db_path="./conversations.db")
compaction_session = OpenAIResponsesCompactionSession(
session=base_session,
)
This wraps any base session with compaction capabilities. When the conversation history crosses a token threshold, the session automatically summarizes older turns before they are sent to the model.
How Auto-Compaction Works
The compaction session monitors the token count of the conversation history. When it crosses the configured threshold, it triggers compaction automatically:
- The session estimates the token count of all stored items.
- If the count exceeds the threshold, compaction is triggered.
- The older portion of the conversation is sent to the model for summarization.
- The summary replaces the detailed history.
- Recent messages are preserved in full detail.
from agents import Agent, Runner
from agents.extensions.sessions import (
SQLiteSession,
OpenAIResponsesCompactionSession,
)
base = SQLiteSession(db_path="./compact_demo.db")
session = OpenAIResponsesCompactionSession(session=base)
agent = Agent(
name="LongConversationAgent",
instructions="You are a research assistant helping with a long project.",
)
# This conversation can run for hundreds of turns
# Compaction kicks in automatically when history gets too long
async def research_session(session_id: str):
questions = [
"Let's research quantum computing applications.",
"What about quantum error correction?",
"How does surface code work?",
# ... hundreds more turns
"Summarize everything we've discussed about error correction.",
]
for q in questions:
result = await Runner.run(
agent, q, session=session, session_id=session_id
)
print(result.final_output)
The agent can handle arbitrarily long conversations without hitting context limits or accumulating unbounded costs.
Manual Compaction with run_compaction()
Sometimes you want to trigger compaction explicitly — for example, at the end of a logical section of conversation, or before a handoff to another agent.
flowchart TD
ROOT["Response Compaction: Managing Long Agent Con…"]
ROOT --> P0["Custom Compaction Triggers with should_…"]
P0 --> P0C0["Advanced: Time-Based Compaction"]
ROOT --> P1["Token Management in Long Conversations"]
P1 --> P1C0["Layer 1: Session Limits"]
P1 --> P1C1["Layer 2: Compaction"]
P1 --> P1C2["Layer 3: Token Budgeting"]
P1 --> P1C3["Combining All Layers"]
style ROOT fill:#4f46e5,stroke:#4338ca,color:#fff
style P0 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
style P1 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
from agents.extensions.sessions import (
SQLiteSession,
OpenAIResponsesCompactionSession,
)
base = SQLiteSession(db_path="./sessions.db")
session = OpenAIResponsesCompactionSession(session=base)
# After a long discussion, manually compact
await session.run_compaction(session_id="project-alpha")
# Now the history is summarized and shorter
items = await session.get_items("project-alpha")
print(f"Items after compaction: {len(items)}")
Manual compaction is useful at natural conversation boundaries:
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
async def handle_conversation_phase(
session: OpenAIResponsesCompactionSession,
session_id: str,
agent: Agent,
messages: list[str],
):
"""Process a phase of conversation, then compact."""
for msg in messages:
await Runner.run(agent, msg, session=session, session_id=session_id)
# Compact after each phase to keep history manageable
await session.run_compaction(session_id)
print(f"Phase complete, history compacted for {session_id}")
Disabling Auto-Compaction
If you want full control over when compaction happens, disable the automatic trigger:
session = OpenAIResponsesCompactionSession(
session=base_session,
auto_compact=False, # Disable automatic compaction
)
# Now compaction only happens when you call it explicitly
await session.run_compaction(session_id)
This is useful when:
- You have custom logic for when compaction should occur
- You want to compact only at specific conversation milestones
- You need to ensure compaction does not interrupt time-sensitive interactions
Custom Compaction Triggers with should_trigger_compaction
For fine-grained control, implement a custom callback that decides when compaction should fire:
flowchart TD
CENTER(("Core Concepts"))
CENTER --> N0["The session estimates the token count o…"]
CENTER --> N1["If the count exceeds the threshold, com…"]
CENTER --> N2["The older portion of the conversation i…"]
CENTER --> N3["The summary replaces the detailed histo…"]
CENTER --> N4["Recent messages are preserved in full d…"]
CENTER --> N5["You have custom logic for when compacti…"]
style CENTER fill:#4f46e5,stroke:#4338ca,color:#fff
from agents.extensions.sessions import (
SQLiteSession,
OpenAIResponsesCompactionSession,
)
def custom_trigger(items: list, token_estimate: int) -> bool:
"""Custom logic for when to trigger compaction."""
# Compact if over 50,000 tokens
if token_estimate > 50_000:
return True
# Compact if over 100 items regardless of token count
if len(items) > 100:
return True
# Don't compact small conversations
return False
base = SQLiteSession(db_path="./sessions.db")
session = OpenAIResponsesCompactionSession(
session=base,
should_trigger_compaction=custom_trigger,
)
Advanced: Time-Based Compaction
Compact history that is older than a certain threshold:
from datetime import datetime, timedelta
def time_based_trigger(items: list, token_estimate: int) -> bool:
"""Compact if the oldest item is more than 2 hours old."""
if not items:
return False
oldest_timestamp = items[0].get("created_at")
if oldest_timestamp:
age = datetime.utcnow() - datetime.fromisoformat(oldest_timestamp)
if age > timedelta(hours=2) and token_estimate > 10_000:
return True
return False
Token Management in Long Conversations
Compaction is one part of a broader token management strategy. Here is a complete approach:
Layer 1: Session Limits
Cap the number of items loaded from the session:
from agents.extensions.sessions import SessionSettings
settings = SessionSettings(limit=50)
Layer 2: Compaction
Summarize older history to reduce token usage:
session = OpenAIResponsesCompactionSession(session=base)
Layer 3: Token Budgeting
Track and budget token usage across the conversation:
class TokenBudgetManager:
def __init__(self, max_input_tokens: int = 100_000):
self.max_input_tokens = max_input_tokens
self.total_input_tokens = 0
self.total_output_tokens = 0
def track_usage(self, result):
"""Track token usage from a run result."""
usage = result.raw_responses[-1].usage
self.total_input_tokens += usage.input_tokens
self.total_output_tokens += usage.output_tokens
def should_compact(self) -> bool:
"""Signal compaction when approaching budget."""
return self.total_input_tokens > self.max_input_tokens * 0.8
def get_report(self) -> dict:
return {
"total_input": self.total_input_tokens,
"total_output": self.total_output_tokens,
"budget_remaining": self.max_input_tokens - self.total_input_tokens,
}
Combining All Layers
budget = TokenBudgetManager(max_input_tokens=200_000)
async def managed_conversation(session_id: str, message: str):
result = await Runner.run(
agent,
message,
session=compaction_session,
session_id=session_id,
session_settings=SessionSettings(limit=80),
)
budget.track_usage(result)
if budget.should_compact():
await compaction_session.run_compaction(session_id)
print("Compacted due to token budget pressure")
return result.final_output
What Gets Preserved During Compaction
Compaction is not lossy — it is a summarization. The model that performs compaction is instructed to preserve:
- Key facts and decisions made during the conversation
- User preferences and stated requirements
- Action items and commitments
- Names, dates, numbers, and other specific details
- The overall trajectory and context of the conversation
What gets compressed:
- Verbose explanations that can be summarized
- Back-and-forth clarification exchanges
- Redundant information repeated across turns
- Tool call details (replaced with outcome summaries)
The result is a compact representation that captures the essence of the conversation while using far fewer tokens.
Sources:
Written by
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.