---
title: "Multi-Turn Chat with Context Management and Sessions"
description: "Master multi-turn chat agent context management using to_input_list(), session-based state, context compaction strategies, and persistent chat storage for production deployments."
canonical: https://callsphere.ai/blog/multi-turn-chat-context-management-sessions
category: "Learn Agentic AI"
tags: ["OpenAI", "Multi-Turn", "Chat", "Context", "Sessions"]
author: "CallSphere Team"
published: 2026-03-14T00:00:00.000Z
updated: 2026-05-06T01:02:41.640Z
---

# Multi-Turn Chat with Context Management and Sessions

> Master multi-turn chat agent context management using to_input_list(), session-based state, context compaction strategies, and persistent chat storage for production deployments.

## The Context Management Challenge

Every chat agent faces the same fundamental problem: the conversation grows with each turn, but the model's context window is finite. A simple customer support chat might accumulate 50 turns with tool calls, system messages, and lengthy responses. Without context management, the agent either runs out of context space, becomes prohibitively expensive, or starts losing track of earlier conversation details.

The OpenAI Agents SDK provides `to_input_list()` on run results, which captures the full conversation state — including tool calls and their outputs — in a format ready for the next turn. But in production, you need more than just passing the full history forward. You need session persistence, context compaction, and strategies for long-running conversations.

## Understanding to_input_list()

When you run an agent and get a result, `result.to_input_list()` returns the complete conversation history in the format the agent expects for the next turn. This includes user messages, assistant messages, tool call requests, and tool call results.

```mermaid
flowchart LR
    INPUT(["User intent"])
    PARSE["Parse plus
classify"]
    PLAN["Plan and tool
selection"]
    AGENT["Agent loop
LLM plus tools"]
    GUARD{"Guardrails
and policy"}
    EXEC["Execute and
verify result"]
    OBS[("Trace and metrics")]
    OUT(["Outcome plus
next action"])
    INPUT --> PARSE --> PLAN --> AGENT --> GUARD
    GUARD -->|Pass| EXEC --> OUT
    GUARD -->|Fail| AGENT
    AGENT --> OBS
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OBS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff
```

```python
# basic_multi_turn.py
from agents import Agent, Runner, function_tool

@function_tool
def get_weather(city: str) -> str:
    """Get current weather for a city."""
    return f"Weather in {city}: 72F, partly cloudy"

agent = Agent(
    name="assistant",
    model="gpt-4o",
    instructions="You are a helpful assistant.",
    tools=[get_weather],
)

async def multi_turn_example():
    # Turn 1
    result1 = await Runner.run(agent, input="What is the weather in Austin?")
    print(f"Turn 1: {result1.final_output}")

    # Turn 2 — pass full context from turn 1
    input_list = result1.to_input_list()
    input_list.append({"role": "user", "content": "How about Seattle?"})

    result2 = await Runner.run(agent, input=input_list)
    print(f"Turn 2: {result2.final_output}")

    # Turn 3 — context now includes both previous turns
    input_list = result2.to_input_list()
    input_list.append({"role": "user", "content": "Which city is warmer?"})

    result3 = await Runner.run(agent, input=input_list)
    print(f"Turn 3: {result3.final_output}")
    # The agent can compare because it has both weather results in context
```

The critical detail: `to_input_list()` preserves tool call items, not just the text. When the agent retrieved weather for Austin in turn 1, that tool call and its result are included in the context for turn 2. This is why the agent can answer "Which city is warmer?" in turn 3 — it has both tool results in its context.

## Session-Based Context Storage

In a multi-user server, each user has their own conversation. The session manager maps session IDs to conversation state and manages lifecycle.

```python
# session_store.py
import time
import json
from dataclasses import dataclass, field
from typing import Optional

@dataclass
class Session:
    session_id: str
    user_id: str
    messages: list[dict] = field(default_factory=list)
    last_result: Optional[object] = None
    created_at: float = field(default_factory=time.time)
    last_active: float = field(default_factory=time.time)
    turn_count: int = 0
    total_tokens_estimate: int = 0

    def get_input_list(self) -> list[dict]:
        """Get the conversation history for the next agent run."""
        if self.last_result is not None:
            return self.last_result.to_input_list()
        return self.messages

    def add_user_message(self, content: str):
        self.messages.append({"role": "user", "content": content})
        self.last_active = time.time()
        self.turn_count += 1

    def update_result(self, result):
        self.last_result = result
        self.messages.append({
            "role": "assistant",
            "content": result.final_output,
        })
        # Rough token estimate: 4 chars per token
        self.total_tokens_estimate += len(result.final_output) // 4

class SessionStore:
    def __init__(self, ttl_seconds: int = 1800):
        self._sessions: dict[str, Session] = {}
        self._ttl = ttl_seconds

    def get(self, session_id: str) -> Session | None:
        session = self._sessions.get(session_id)
        if session and (time.time() - session.last_active > self._ttl):
            del self._sessions[session_id]
            return None
        return session

    def create(self, session_id: str, user_id: str) -> Session:
        session = Session(session_id=session_id, user_id=user_id)
        self._sessions[session_id] = session
        return session

    def delete(self, session_id: str):
        self._sessions.pop(session_id, None)

    def cleanup_expired(self):
        now = time.time()
        expired = [
            sid for sid, s in self._sessions.items()
            if now - s.last_active > self._ttl
        ]
        for sid in expired:
            del self._sessions[sid]
```

## Context Compaction Strategies

As conversations grow long, you need strategies to keep the context within the model's window while preserving the information the agent needs. There are three main approaches.

### Strategy 1: Sliding Window

Keep only the most recent N turns. Simple but loses early context.

```python
def sliding_window_compact(input_list: list[dict], max_turns: int = 20) -> list[dict]:
    """Keep only the most recent max_turns exchanges."""
    # Always keep system messages
    system_msgs = [m for m in input_list if m.get("role") == "system"]
    non_system = [m for m in input_list if m.get("role") != "system"]

    # Each "turn" is a user message + assistant response
    if len(non_system) > max_turns * 2:
        non_system = non_system[-(max_turns * 2):]

    return system_msgs + non_system
```

### Strategy 2: Summarization

Use the model to summarize older conversation portions, then prefix the summary to the recent context.

```python
from agents import Agent, Runner

summarizer = Agent(
    name="summarizer",
    model="gpt-4o-mini",
    instructions="""Summarize the following conversation concisely.
Preserve: key facts, decisions made, tool results, and unresolved questions.
Omit: greetings, filler, and redundant exchanges.""",
)

async def summarize_and_compact(
    input_list: list[dict],
    keep_recent: int = 10,
) -> list[dict]:
    """Summarize older turns and keep recent ones intact."""
    system_msgs = [m for m in input_list if m.get("role") == "system"]
    non_system = [m for m in input_list if m.get("role") != "system"]

    if len(non_system)  list[dict]:
    """Hybrid compaction: summarize old context, preserve key tool results."""
    system_msgs = [m for m in input_list if m.get("role") == "system"]
    non_system = [m for m in input_list if m.get("role") != "system"]

    # Estimate current token count
    total_chars = sum(len(str(m.get("content", ""))) for m in input_list)
    estimated_tokens = total_chars // 4

    if estimated_tokens  dict | None:
        row = await self.pool.fetchrow(
            "SELECT * FROM chat_sessions WHERE session_id = $1",
            session_id,
        )
        if not row:
            return None
        return {
            "session_id": row["session_id"],
            "user_id": row["user_id"],
            "messages": json.loads(row["messages"]),
            "turn_count": row["turn_count"],
        }

    async def list_user_sessions(self, user_id: str) -> list[dict]:
        rows = await self.pool.fetch(
            """
            SELECT session_id, turn_count, created_at, last_active
            FROM chat_sessions
            WHERE user_id = $1
            ORDER BY last_active DESC
            LIMIT 50
            """,
            user_id,
        )
        return [dict(row) for row in rows]

    async def cleanup_old_sessions(self, days: int = 30):
        deleted = await self.pool.execute(
            "DELETE FROM chat_sessions WHERE last_active  str:
        # Load or create session
        session_data = await self.store.load_session(session_id)
        if session_data is None:
            messages = []
            turn_count = 0
        else:
            messages = session_data["messages"]
            turn_count = session_data["turn_count"]

        # Add user message
        messages.append({"role": "user", "content": message})
        turn_count += 1

        # Compact if needed
        input_list = await self._maybe_compact(messages)

        # Run agent
        result = await Runner.run(support_agent, input=input_list)

        # Update messages with the full context from result
        updated_messages = result.to_input_list()

        # Add assistant message to our tracking list
        messages.append({"role": "assistant", "content": result.final_output})

        # Persist
        await self.store.save_session(
            session_id, user_id, messages, turn_count
        )

        return result.final_output

    async def _maybe_compact(self, messages: list[dict]) -> list[dict]:
        total_chars = sum(len(str(m.get("content", ""))) for m in messages)
        estimated_tokens = total_chars // 4

        if estimated_tokens > self.max_context_tokens:
            return await hybrid_compact(
                messages,
                keep_recent=10,
                max_total_tokens=self.max_context_tokens,
            )
        return messages
```

## Key Takeaways

**Always use `to_input_list()`** to carry context between turns. It preserves tool calls and their results, which plain message lists lose.

**Implement compaction early.** Do not wait until users hit context limits. Build compaction into the session manager from the start, even if you set the threshold high initially.

**Choose your compaction strategy based on the use case.** Sliding window works for casual chat. Summarization works for support conversations. Hybrid compaction works for analytical sessions where tool results must be preserved.

**Persist sessions to a database.** In-memory sessions are acceptable for prototypes but unacceptable for production. Users expect to resume conversations after page refreshes and server deployments.

**Monitor context size per session.** Track the token count at each turn so you can tune compaction thresholds based on real usage patterns rather than guesses.

Multi-turn context management is the invisible infrastructure that makes chat agents feel intelligent. Users do not see the compaction, persistence, or session routing — they just experience a coherent conversation that remembers what was said and builds on it turn after turn.

---

Source: https://callsphere.ai/blog/multi-turn-chat-context-management-sessions