The Context Management Challenge

Every chat agent faces the same fundamental problem: the conversation grows with each turn, but the model's context window is finite. A simple customer support chat might accumulate 50 turns with tool calls, system messages, and lengthy responses. Without context management, the agent either runs out of context space, becomes prohibitively expensive, or starts losing track of earlier conversation details.

The OpenAI Agents SDK provides to_input_list() on run results, which captures the full conversation state — including tool calls and their outputs — in a format ready for the next turn. But in production, you need more than just passing the full history forward. You need session persistence, context compaction, and strategies for long-running conversations.

Understanding to_input_list()

When you run an agent and get a result, result.to_input_list() returns the complete conversation history in the format the agent expects for the next turn. This includes user messages, assistant messages, tool call requests, and tool call results.

flowchart LR
    INPUT(["User intent"])
    PARSE["Parse plus<br/>classify"]
    PLAN["Plan and tool<br/>selection"]
    AGENT["Agent loop<br/>LLM plus tools"]
    GUARD{"Guardrails<br/>and policy"}
    EXEC["Execute and<br/>verify result"]
    OBS[("Trace and metrics")]
    OUT(["Outcome plus<br/>next action"])
    INPUT --> PARSE --> PLAN --> AGENT --> GUARD
    GUARD -->|Pass| EXEC --> OUT
    GUARD -->|Fail| AGENT
    AGENT --> OBS
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OBS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff

# basic_multi_turn.py
from agents import Agent, Runner, function_tool

@function_tool
def get_weather(city: str) -> str:
    """Get current weather for a city."""
    return f"Weather in {city}: 72F, partly cloudy"

agent = Agent(
    name="assistant",
    model="gpt-4o",
    instructions="You are a helpful assistant.",
    tools=[get_weather],
)

async def multi_turn_example():
    # Turn 1
    result1 = await Runner.run(agent, input="What is the weather in Austin?")
    print(f"Turn 1: {result1.final_output}")

    # Turn 2 — pass full context from turn 1
    input_list = result1.to_input_list()
    input_list.append({"role": "user", "content": "How about Seattle?"})

    result2 = await Runner.run(agent, input=input_list)
    print(f"Turn 2: {result2.final_output}")

    # Turn 3 — context now includes both previous turns
    input_list = result2.to_input_list()
    input_list.append({"role": "user", "content": "Which city is warmer?"})

    result3 = await Runner.run(agent, input=input_list)
    print(f"Turn 3: {result3.final_output}")
    # The agent can compare because it has both weather results in context

The critical detail: to_input_list() preserves tool call items, not just the text. When the agent retrieved weather for Austin in turn 1, that tool call and its result are included in the context for turn 2. This is why the agent can answer "Which city is warmer?" in turn 3 — it has both tool results in its context.

Session-Based Context Storage

In a multi-user server, each user has their own conversation. The session manager maps session IDs to conversation state and manages lifecycle.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

# session_store.py
import time
import json
from dataclasses import dataclass, field
from typing import Optional

@dataclass
class Session:
    session_id: str
    user_id: str
    messages: list[dict] = field(default_factory=list)
    last_result: Optional[object] = None
    created_at: float = field(default_factory=time.time)
    last_active: float = field(default_factory=time.time)
    turn_count: int = 0
    total_tokens_estimate: int = 0

    def get_input_list(self) -> list[dict]:
        """Get the conversation history for the next agent run."""
        if self.last_result is not None:
            return self.last_result.to_input_list()
        return self.messages

    def add_user_message(self, content: str):
        self.messages.append({"role": "user", "content": content})
        self.last_active = time.time()
        self.turn_count += 1

    def update_result(self, result):
        self.last_result = result
        self.messages.append({
            "role": "assistant",
            "content": result.final_output,
        })
        # Rough token estimate: 4 chars per token
        self.total_tokens_estimate += len(result.final_output) // 4

class SessionStore:
    def __init__(self, ttl_seconds: int = 1800):
        self._sessions: dict[str, Session] = {}
        self._ttl = ttl_seconds

    def get(self, session_id: str) -> Session | None:
        session = self._sessions.get(session_id)
        if session and (time.time() - session.last_active > self._ttl):
            del self._sessions[session_id]
            return None
        return session

    def create(self, session_id: str, user_id: str) -> Session:
        session = Session(session_id=session_id, user_id=user_id)
        self._sessions[session_id] = session
        return session

    def delete(self, session_id: str):
        self._sessions.pop(session_id, None)

    def cleanup_expired(self):
        now = time.time()
        expired = [
            sid for sid, s in self._sessions.items()
            if now - s.last_active > self._ttl
        ]
        for sid in expired:
            del self._sessions[sid]

Context Compaction Strategies

As conversations grow long, you need strategies to keep the context within the model's window while preserving the information the agent needs. There are three main approaches.

Strategy 1: Sliding Window

Keep only the most recent N turns. Simple but loses early context.

def sliding_window_compact(input_list: list[dict], max_turns: int = 20) -> list[dict]:
    """Keep only the most recent max_turns exchanges."""
    # Always keep system messages
    system_msgs = [m for m in input_list if m.get("role") == "system"]
    non_system = [m for m in input_list if m.get("role") != "system"]

    # Each "turn" is a user message + assistant response
    if len(non_system) > max_turns * 2:
        non_system = non_system[-(max_turns * 2):]

    return system_msgs + non_system

Strategy 2: Summarization

Use the model to summarize older conversation portions, then prefix the summary to the recent context.

from agents import Agent, Runner

summarizer = Agent(
    name="summarizer",
    model="gpt-4o-mini",
    instructions="""Summarize the following conversation concisely.
Preserve: key facts, decisions made, tool results, and unresolved questions.
Omit: greetings, filler, and redundant exchanges.""",
)

async def summarize_and_compact(
    input_list: list[dict],
    keep_recent: int = 10,
) -> list[dict]:
    """Summarize older turns and keep recent ones intact."""
    system_msgs = [m for m in input_list if m.get("role") == "system"]
    non_system = [m for m in input_list if m.get("role") != "system"]

    if len(non_system) <= keep_recent * 2:
        return input_list  # no compaction needed

    # Split into old (to summarize) and recent (to keep)
    old_messages = non_system[:-(keep_recent * 2)]
    recent_messages = non_system[-(keep_recent * 2):]

    # Summarize old messages
    old_text = "\n".join(
        f"{m['role']}: {m.get('content', '[tool call]')}"
        for m in old_messages
        if m.get("content")
    )

    summary_result = await Runner.run(
        summarizer,
        input=f"Summarize this conversation:\n\n{old_text}",
    )

    # Build compacted context
    summary_msg = {
        "role": "system",
        "content": (
            f"Summary of earlier conversation:\n"
            f"{summary_result.final_output}"
        ),
    }

    return system_msgs + [summary_msg] + recent_messages

Strategy 3: Hybrid — Summarize + Preserve Key Items

The most effective approach combines summarization with selective preservation of important items like tool results and decisions.

async def hybrid_compact(
    input_list: list[dict],
    keep_recent: int = 8,
    max_total_tokens: int = 50000,
) -> list[dict]:
    """Hybrid compaction: summarize old context, preserve key tool results."""
    system_msgs = [m for m in input_list if m.get("role") == "system"]
    non_system = [m for m in input_list if m.get("role") != "system"]

    # Estimate current token count
    total_chars = sum(len(str(m.get("content", ""))) for m in input_list)
    estimated_tokens = total_chars // 4

    if estimated_tokens < max_total_tokens:
        return input_list  # no compaction needed

    old_messages = non_system[:-(keep_recent * 2)]
    recent_messages = non_system[-(keep_recent * 2):]

    # Extract key items to preserve verbatim
    key_items = []
    summarize_items = []

    for msg in old_messages:
        content = msg.get("content", "")
        # Preserve tool results and decisions
        if msg.get("role") == "tool" or "decision:" in content.lower():
            key_items.append(msg)
        else:
            summarize_items.append(msg)

    # Summarize the non-key items
    if summarize_items:
        text = "\n".join(
            f"{m['role']}: {m.get('content', '')}"
            for m in summarize_items
            if m.get("content")
        )
        summary_result = await Runner.run(
            summarizer,
            input=f"Summarize this conversation concisely:\n\n{text}",
        )
        summary_msg = {
            "role": "system",
            "content": f"Earlier conversation summary:\n{summary_result.final_output}",
        }
        return system_msgs + [summary_msg] + key_items + recent_messages

    return system_msgs + key_items + recent_messages

Persistent Chat Storage

In-memory sessions are lost when the server restarts. For production chat agents, persist sessions to a database so conversations survive deployments and can be resumed.

# persistent_store.py
import json
import asyncpg
from datetime import datetime

class PostgresSessionStore:
    def __init__(self, pool: asyncpg.Pool):
        self.pool = pool

    async def initialize(self):
        await self.pool.execute("""
            CREATE TABLE IF NOT EXISTS chat_sessions (
                session_id TEXT PRIMARY KEY,
                user_id TEXT NOT NULL,
                messages JSONB NOT NULL DEFAULT '[]',
                turn_count INTEGER NOT NULL DEFAULT 0,
                created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
                last_active TIMESTAMPTZ NOT NULL DEFAULT NOW()
            )
        """)
        await self.pool.execute("""
            CREATE INDEX IF NOT EXISTS idx_sessions_user
            ON chat_sessions (user_id)
        """)
        await self.pool.execute("""
            CREATE INDEX IF NOT EXISTS idx_sessions_active
            ON chat_sessions (last_active)
        """)

    async def save_session(
        self, session_id: str, user_id: str, messages: list[dict], turn_count: int
    ):
        await self.pool.execute(
            """
            INSERT INTO chat_sessions (session_id, user_id, messages, turn_count, last_active)
            VALUES ($1, $2, $3::jsonb, $4, NOW())
            ON CONFLICT (session_id) DO UPDATE SET
                messages = $3::jsonb,
                turn_count = $4,
                last_active = NOW()
            """,
            session_id,
            user_id,
            json.dumps(messages),
            turn_count,
        )

    async def load_session(self, session_id: str) -> dict | None:
        row = await self.pool.fetchrow(
            "SELECT * FROM chat_sessions WHERE session_id = $1",
            session_id,
        )
        if not row:
            return None
        return {
            "session_id": row["session_id"],
            "user_id": row["user_id"],
            "messages": json.loads(row["messages"]),
            "turn_count": row["turn_count"],
        }

    async def list_user_sessions(self, user_id: str) -> list[dict]:
        rows = await self.pool.fetch(
            """
            SELECT session_id, turn_count, created_at, last_active
            FROM chat_sessions
            WHERE user_id = $1
            ORDER BY last_active DESC
            LIMIT 50
            """,
            user_id,
        )
        return [dict(row) for row in rows]

    async def cleanup_old_sessions(self, days: int = 30):
        deleted = await self.pool.execute(
            "DELETE FROM chat_sessions WHERE last_active < NOW() - $1::interval",
            f"{days} days",
        )
        return deleted

Integrating Compaction with Persistent Storage

The final piece ties compaction into the chat loop. Before each agent run, check if the context needs compaction. After the run, persist the updated session.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

# chat_service.py
from agents import Runner

from agents.support_agent import support_agent
from persistent_store import PostgresSessionStore

class ChatService:
    def __init__(self, store: PostgresSessionStore):
        self.store = store
        self.max_context_tokens = 60000

    async def handle_message(
        self, session_id: str, user_id: str, message: str
    ) -> str:
        # Load or create session
        session_data = await self.store.load_session(session_id)
        if session_data is None:
            messages = []
            turn_count = 0
        else:
            messages = session_data["messages"]
            turn_count = session_data["turn_count"]

        # Add user message
        messages.append({"role": "user", "content": message})
        turn_count += 1

        # Compact if needed
        input_list = await self._maybe_compact(messages)

        # Run agent
        result = await Runner.run(support_agent, input=input_list)

        # Update messages with the full context from result
        updated_messages = result.to_input_list()

        # Add assistant message to our tracking list
        messages.append({"role": "assistant", "content": result.final_output})

        # Persist
        await self.store.save_session(
            session_id, user_id, messages, turn_count
        )

        return result.final_output

    async def _maybe_compact(self, messages: list[dict]) -> list[dict]:
        total_chars = sum(len(str(m.get("content", ""))) for m in messages)
        estimated_tokens = total_chars // 4

        if estimated_tokens > self.max_context_tokens:
            return await hybrid_compact(
                messages,
                keep_recent=10,
                max_total_tokens=self.max_context_tokens,
            )
        return messages

Key Takeaways

Always use to_input_list() to carry context between turns. It preserves tool calls and their results, which plain message lists lose.

Implement compaction early. Do not wait until users hit context limits. Build compaction into the session manager from the start, even if you set the threshold high initially.

Choose your compaction strategy based on the use case. Sliding window works for casual chat. Summarization works for support conversations. Hybrid compaction works for analytical sessions where tool results must be preserved.

Persist sessions to a database. In-memory sessions are acceptable for prototypes but unacceptable for production. Users expect to resume conversations after page refreshes and server deployments.

Monitor context size per session. Track the token count at each turn so you can tune compaction thresholds based on real usage patterns rather than guesses.

Multi-turn context management is the invisible infrastructure that makes chat agents feel intelligent. Users do not see the compaction, persistence, or session routing — they just experience a coherent conversation that remembers what was said and builds on it turn after turn.

Multi-Turn Chat with Context Management and Sessions

The Context Management Challenge

Understanding to_input_list()

Session-Based Context Storage

Context Compaction Strategies

Strategy 1: Sliding Window

Strategy 2: Summarization

Strategy 3: Hybrid — Summarize + Preserve Key Items

Persistent Chat Storage

Integrating Compaction with Persistent Storage

Key Takeaways

Try CallSphere AI Voice Agents

Related Articles You May Like

Desktop AI Agents in 2026: Project Arc, Claude Cowork, OpenAI Agents Compared

OpenAI Frontier: Model-Native Orchestration Is the Default in 2026

Gemini Enterprise vs Anthropic vs OpenAI Frontier: 2026 Comparison

Anthropic's Financial Services Platform: State of Play in May 2026

Model-Native Harness: Why OpenAI and Anthropic Are Killing ReAct Loops

GPT-Realtime-Whisper vs Deepgram: Streaming STT in 2026