Multi-Turn Chat with Context Management and Sessions
Master multi-turn chat agent context management using to_input_list(), session-based state, context compaction strategies, and persistent chat storage for production deployments.
The Context Management Challenge
Every chat agent faces the same fundamental problem: the conversation grows with each turn, but the model's context window is finite. A simple customer support chat might accumulate 50 turns with tool calls, system messages, and lengthy responses. Without context management, the agent either runs out of context space, becomes prohibitively expensive, or starts losing track of earlier conversation details.
The OpenAI Agents SDK provides to_input_list() on run results, which captures the full conversation state — including tool calls and their outputs — in a format ready for the next turn. But in production, you need more than just passing the full history forward. You need session persistence, context compaction, and strategies for long-running conversations.
Understanding to_input_list()
When you run an agent and get a result, result.to_input_list() returns the complete conversation history in the format the agent expects for the next turn. This includes user messages, assistant messages, tool call requests, and tool call results.
flowchart TD
START["Multi-Turn Chat with Context Management and Sessi…"] --> A
A["The Context Management Challenge"]
A --> B
B["Understanding to_input_list"]
B --> C
C["Session-Based Context Storage"]
C --> D
D["Context Compaction Strategies"]
D --> E
E["Persistent Chat Storage"]
E --> F
F["Integrating Compaction with Persistent …"]
F --> G
G["Key Takeaways"]
G --> DONE["Key Takeaways"]
style START fill:#4f46e5,stroke:#4338ca,color:#fff
style DONE fill:#059669,stroke:#047857,color:#fff
# basic_multi_turn.py
from agents import Agent, Runner, function_tool
@function_tool
def get_weather(city: str) -> str:
"""Get current weather for a city."""
return f"Weather in {city}: 72F, partly cloudy"
agent = Agent(
name="assistant",
model="gpt-4o",
instructions="You are a helpful assistant.",
tools=[get_weather],
)
async def multi_turn_example():
# Turn 1
result1 = await Runner.run(agent, input="What is the weather in Austin?")
print(f"Turn 1: {result1.final_output}")
# Turn 2 — pass full context from turn 1
input_list = result1.to_input_list()
input_list.append({"role": "user", "content": "How about Seattle?"})
result2 = await Runner.run(agent, input=input_list)
print(f"Turn 2: {result2.final_output}")
# Turn 3 — context now includes both previous turns
input_list = result2.to_input_list()
input_list.append({"role": "user", "content": "Which city is warmer?"})
result3 = await Runner.run(agent, input=input_list)
print(f"Turn 3: {result3.final_output}")
# The agent can compare because it has both weather results in context
The critical detail: to_input_list() preserves tool call items, not just the text. When the agent retrieved weather for Austin in turn 1, that tool call and its result are included in the context for turn 2. This is why the agent can answer "Which city is warmer?" in turn 3 — it has both tool results in its context.
Session-Based Context Storage
In a multi-user server, each user has their own conversation. The session manager maps session IDs to conversation state and manages lifecycle.
# session_store.py
import time
import json
from dataclasses import dataclass, field
from typing import Optional
@dataclass
class Session:
session_id: str
user_id: str
messages: list[dict] = field(default_factory=list)
last_result: Optional[object] = None
created_at: float = field(default_factory=time.time)
last_active: float = field(default_factory=time.time)
turn_count: int = 0
total_tokens_estimate: int = 0
def get_input_list(self) -> list[dict]:
"""Get the conversation history for the next agent run."""
if self.last_result is not None:
return self.last_result.to_input_list()
return self.messages
def add_user_message(self, content: str):
self.messages.append({"role": "user", "content": content})
self.last_active = time.time()
self.turn_count += 1
def update_result(self, result):
self.last_result = result
self.messages.append({
"role": "assistant",
"content": result.final_output,
})
# Rough token estimate: 4 chars per token
self.total_tokens_estimate += len(result.final_output) // 4
class SessionStore:
def __init__(self, ttl_seconds: int = 1800):
self._sessions: dict[str, Session] = {}
self._ttl = ttl_seconds
def get(self, session_id: str) -> Session | None:
session = self._sessions.get(session_id)
if session and (time.time() - session.last_active > self._ttl):
del self._sessions[session_id]
return None
return session
def create(self, session_id: str, user_id: str) -> Session:
session = Session(session_id=session_id, user_id=user_id)
self._sessions[session_id] = session
return session
def delete(self, session_id: str):
self._sessions.pop(session_id, None)
def cleanup_expired(self):
now = time.time()
expired = [
sid for sid, s in self._sessions.items()
if now - s.last_active > self._ttl
]
for sid in expired:
del self._sessions[sid]
Context Compaction Strategies
As conversations grow long, you need strategies to keep the context within the model's window while preserving the information the agent needs. There are three main approaches.
Strategy 1: Sliding Window
Keep only the most recent N turns. Simple but loses early context.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
def sliding_window_compact(input_list: list[dict], max_turns: int = 20) -> list[dict]:
"""Keep only the most recent max_turns exchanges."""
# Always keep system messages
system_msgs = [m for m in input_list if m.get("role") == "system"]
non_system = [m for m in input_list if m.get("role") != "system"]
# Each "turn" is a user message + assistant response
if len(non_system) > max_turns * 2:
non_system = non_system[-(max_turns * 2):]
return system_msgs + non_system
Strategy 2: Summarization
Use the model to summarize older conversation portions, then prefix the summary to the recent context.
from agents import Agent, Runner
summarizer = Agent(
name="summarizer",
model="gpt-4o-mini",
instructions="""Summarize the following conversation concisely.
Preserve: key facts, decisions made, tool results, and unresolved questions.
Omit: greetings, filler, and redundant exchanges.""",
)
async def summarize_and_compact(
input_list: list[dict],
keep_recent: int = 10,
) -> list[dict]:
"""Summarize older turns and keep recent ones intact."""
system_msgs = [m for m in input_list if m.get("role") == "system"]
non_system = [m for m in input_list if m.get("role") != "system"]
if len(non_system) <= keep_recent * 2:
return input_list # no compaction needed
# Split into old (to summarize) and recent (to keep)
old_messages = non_system[:-(keep_recent * 2)]
recent_messages = non_system[-(keep_recent * 2):]
# Summarize old messages
old_text = "\n".join(
f"{m['role']}: {m.get('content', '[tool call]')}"
for m in old_messages
if m.get("content")
)
summary_result = await Runner.run(
summarizer,
input=f"Summarize this conversation:\n\n{old_text}",
)
# Build compacted context
summary_msg = {
"role": "system",
"content": (
f"Summary of earlier conversation:\n"
f"{summary_result.final_output}"
),
}
return system_msgs + [summary_msg] + recent_messages
Strategy 3: Hybrid — Summarize + Preserve Key Items
The most effective approach combines summarization with selective preservation of important items like tool results and decisions.
async def hybrid_compact(
input_list: list[dict],
keep_recent: int = 8,
max_total_tokens: int = 50000,
) -> list[dict]:
"""Hybrid compaction: summarize old context, preserve key tool results."""
system_msgs = [m for m in input_list if m.get("role") == "system"]
non_system = [m for m in input_list if m.get("role") != "system"]
# Estimate current token count
total_chars = sum(len(str(m.get("content", ""))) for m in input_list)
estimated_tokens = total_chars // 4
if estimated_tokens < max_total_tokens:
return input_list # no compaction needed
old_messages = non_system[:-(keep_recent * 2)]
recent_messages = non_system[-(keep_recent * 2):]
# Extract key items to preserve verbatim
key_items = []
summarize_items = []
for msg in old_messages:
content = msg.get("content", "")
# Preserve tool results and decisions
if msg.get("role") == "tool" or "decision:" in content.lower():
key_items.append(msg)
else:
summarize_items.append(msg)
# Summarize the non-key items
if summarize_items:
text = "\n".join(
f"{m['role']}: {m.get('content', '')}"
for m in summarize_items
if m.get("content")
)
summary_result = await Runner.run(
summarizer,
input=f"Summarize this conversation concisely:\n\n{text}",
)
summary_msg = {
"role": "system",
"content": f"Earlier conversation summary:\n{summary_result.final_output}",
}
return system_msgs + [summary_msg] + key_items + recent_messages
return system_msgs + key_items + recent_messages
Persistent Chat Storage
In-memory sessions are lost when the server restarts. For production chat agents, persist sessions to a database so conversations survive deployments and can be resumed.
# persistent_store.py
import json
import asyncpg
from datetime import datetime
class PostgresSessionStore:
def __init__(self, pool: asyncpg.Pool):
self.pool = pool
async def initialize(self):
await self.pool.execute("""
CREATE TABLE IF NOT EXISTS chat_sessions (
session_id TEXT PRIMARY KEY,
user_id TEXT NOT NULL,
messages JSONB NOT NULL DEFAULT '[]',
turn_count INTEGER NOT NULL DEFAULT 0,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
last_active TIMESTAMPTZ NOT NULL DEFAULT NOW()
)
""")
await self.pool.execute("""
CREATE INDEX IF NOT EXISTS idx_sessions_user
ON chat_sessions (user_id)
""")
await self.pool.execute("""
CREATE INDEX IF NOT EXISTS idx_sessions_active
ON chat_sessions (last_active)
""")
async def save_session(
self, session_id: str, user_id: str, messages: list[dict], turn_count: int
):
await self.pool.execute(
"""
INSERT INTO chat_sessions (session_id, user_id, messages, turn_count, last_active)
VALUES ($1, $2, $3::jsonb, $4, NOW())
ON CONFLICT (session_id) DO UPDATE SET
messages = $3::jsonb,
turn_count = $4,
last_active = NOW()
""",
session_id,
user_id,
json.dumps(messages),
turn_count,
)
async def load_session(self, session_id: str) -> dict | None:
row = await self.pool.fetchrow(
"SELECT * FROM chat_sessions WHERE session_id = $1",
session_id,
)
if not row:
return None
return {
"session_id": row["session_id"],
"user_id": row["user_id"],
"messages": json.loads(row["messages"]),
"turn_count": row["turn_count"],
}
async def list_user_sessions(self, user_id: str) -> list[dict]:
rows = await self.pool.fetch(
"""
SELECT session_id, turn_count, created_at, last_active
FROM chat_sessions
WHERE user_id = $1
ORDER BY last_active DESC
LIMIT 50
""",
user_id,
)
return [dict(row) for row in rows]
async def cleanup_old_sessions(self, days: int = 30):
deleted = await self.pool.execute(
"DELETE FROM chat_sessions WHERE last_active < NOW() - $1::interval",
f"{days} days",
)
return deleted
Integrating Compaction with Persistent Storage
The final piece ties compaction into the chat loop. Before each agent run, check if the context needs compaction. After the run, persist the updated session.
# chat_service.py
from agents import Runner
from agents.support_agent import support_agent
from persistent_store import PostgresSessionStore
class ChatService:
def __init__(self, store: PostgresSessionStore):
self.store = store
self.max_context_tokens = 60000
async def handle_message(
self, session_id: str, user_id: str, message: str
) -> str:
# Load or create session
session_data = await self.store.load_session(session_id)
if session_data is None:
messages = []
turn_count = 0
else:
messages = session_data["messages"]
turn_count = session_data["turn_count"]
# Add user message
messages.append({"role": "user", "content": message})
turn_count += 1
# Compact if needed
input_list = await self._maybe_compact(messages)
# Run agent
result = await Runner.run(support_agent, input=input_list)
# Update messages with the full context from result
updated_messages = result.to_input_list()
# Add assistant message to our tracking list
messages.append({"role": "assistant", "content": result.final_output})
# Persist
await self.store.save_session(
session_id, user_id, messages, turn_count
)
return result.final_output
async def _maybe_compact(self, messages: list[dict]) -> list[dict]:
total_chars = sum(len(str(m.get("content", ""))) for m in messages)
estimated_tokens = total_chars // 4
if estimated_tokens > self.max_context_tokens:
return await hybrid_compact(
messages,
keep_recent=10,
max_total_tokens=self.max_context_tokens,
)
return messages
Key Takeaways
Always use to_input_list() to carry context between turns. It preserves tool calls and their results, which plain message lists lose.
Implement compaction early. Do not wait until users hit context limits. Build compaction into the session manager from the start, even if you set the threshold high initially.
Choose your compaction strategy based on the use case. Sliding window works for casual chat. Summarization works for support conversations. Hybrid compaction works for analytical sessions where tool results must be preserved.
Persist sessions to a database. In-memory sessions are acceptable for prototypes but unacceptable for production. Users expect to resume conversations after page refreshes and server deployments.
Monitor context size per session. Track the token count at each turn so you can tune compaction thresholds based on real usage patterns rather than guesses.
Multi-turn context management is the invisible infrastructure that makes chat agents feel intelligent. Users do not see the compaction, persistence, or session routing — they just experience a coherent conversation that remembers what was said and builds on it turn after turn.
Written by
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.