---
title: "Chat Agent Context Management: Maintaining Coherent Multi-Turn Conversations"
description: "Master the techniques for managing conversation context in chat agents, including context window optimization, message pruning strategies, summarization, and topic tracking for coherent multi-turn interactions."
canonical: https://callsphere.ai/blog/chat-agent-context-management-multi-turn-conversations
category: "Learn Agentic AI"
tags: ["Context Management", "Conversation Memory", "Multi-Turn", "LLM", "Chat Agent"]
author: "CallSphere Team"
published: 2026-03-17T00:00:00.000Z
updated: 2026-05-06T19:10:50.839Z
---

# Chat Agent Context Management: Maintaining Coherent Multi-Turn Conversations

> Master the techniques for managing conversation context in chat agents, including context window optimization, message pruning strategies, summarization, and topic tracking for coherent multi-turn interactions.

## The Context Window Problem

Every LLM has a finite context window. GPT-4o supports 128K tokens, Claude supports up to 200K, but even these generous limits get consumed quickly in production chat agents. A busy customer support conversation with tool calls, system prompts, and previous messages can easily hit 50K tokens within 20 turns. Without active context management, your agent either crashes with a token limit error or starts losing track of earlier conversation details.

Context management is the discipline of deciding what information the model sees at each turn. Get it right, and your agent maintains coherent conversations across dozens of turns. Get it wrong, and users experience an agent that forgets what they said three messages ago.

## Strategy 1: Sliding Window with Priority

The simplest approach is a sliding window — keep the last N messages and drop everything else. But naive truncation drops important context. A better approach assigns priority levels:

```mermaid
flowchart TD
    MSG(["New message"])
    WORKING["Working memory
rolling window"]
    EPISODIC[("Episodic memory
past sessions")]
    SEMANTIC[("Semantic memory
facts and preferences")]
    SUM["Summarizer
compresses old turns"]
    ROUTER{"Retrieve
needed memories"}
    PROMPT["Assembled context"]
    LLM["LLM"]
    UPD["Memory updater
writes new facts"]
    MSG --> WORKING --> ROUTER
    ROUTER -->|Past sessions| EPISODIC
    ROUTER -->|User facts| SEMANTIC
    EPISODIC --> SUM --> PROMPT
    SEMANTIC --> PROMPT
    WORKING --> PROMPT --> LLM --> UPD
    UPD --> EPISODIC
    UPD --> SEMANTIC
    style ROUTER fill:#4f46e5,stroke:#4338ca,color:#fff
    style LLM fill:#f59e0b,stroke:#d97706,color:#1f2937
    style EPISODIC fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style SEMANTIC fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
```

```python
from dataclasses import dataclass, field
from enum import IntEnum

class Priority(IntEnum):
    SYSTEM = 0      # Always keep
    PINNED = 1      # User-critical context
    RECENT = 2      # Last N messages
    HISTORICAL = 3  # Older messages, drop first

@dataclass
class ContextMessage:
    role: str
    content: str
    priority: Priority
    token_count: int

class ContextManager:
    def __init__(self, max_tokens: int = 8000):
        self.max_tokens = max_tokens
        self.messages: list[ContextMessage] = []

    def add_message(self, role: str, content: str, priority: Priority = Priority.RECENT):
        tokens = len(content.split()) * 1.3  # Rough estimate
        self.messages.append(ContextMessage(role, content, priority, int(tokens)))

    def build_context(self) -> list[dict]:
        # Sort by priority (system first, historical last)
        sorted_msgs = sorted(self.messages, key=lambda m: m.priority)
        result = []
        used_tokens = 0

        for msg in sorted_msgs:
            if used_tokens + msg.token_count  str:
    summary_prompt = (
        "Summarize the following conversation history in 2-3 sentences. "
        "Focus on: the user's main issue, any decisions made, "
        "and any pending actions. Be factual and concise."
    )
    response = await openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": summary_prompt},
            *messages,
        ],
        max_tokens=200,
    )
    return response.choices[0].message.content

class SummarizingContextManager:
    def __init__(self, max_tokens: int = 8000, summarize_threshold: int = 6000):
        self.max_tokens = max_tokens
        self.summarize_threshold = summarize_threshold
        self.messages: list[dict] = []
        self.summary: str | None = None

    async def add_and_manage(self, role: str, content: str):
        self.messages.append({"role": role, "content": content})
        total_tokens = sum(len(m["content"].split()) for m in self.messages)

        if total_tokens * 1.3 > self.summarize_threshold:
            # Summarize older messages, keep last 4
            old_messages = self.messages[:-4]
            self.summary = await summarize_conversation(old_messages)
            self.messages = self.messages[-4:]

    def build_context(self, system_prompt: str) -> list[dict]:
        context = [{"role": "system", "content": system_prompt}]
        if self.summary:
            context.append({
                "role": "system",
                "content": f"Previous conversation summary: {self.summary}",
            })
        context.extend(self.messages)
        return context
```

The trick is choosing when to summarize. Set a threshold at roughly 75% of your token budget. When the conversation crosses that line, summarize everything except the last few messages.

## Strategy 3: Topic Tracking

Track what topics have been discussed so the agent can reference earlier context without keeping every message:

```python
from collections import defaultdict

class TopicTracker:
    def __init__(self):
        self.topics: dict[str, list[str]] = defaultdict(list)
        self.current_topic: str | None = None

    async def classify_topic(self, message: str) -> str:
        response = await openai.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{
                "role": "system",
                "content": (
                    "Classify this message into one topic category. "
                    "Return only the category name. Examples: "
                    "billing, technical_support, account, shipping, general"
                ),
            }, {
                "role": "user",
                "content": message,
            }],
            max_tokens=20,
        )
        return response.choices[0].message.content.strip().lower()

    async def track(self, role: str, content: str):
        topic = await self.classify_topic(content)
        self.topics[topic].append(f"{role}: {content}")
        self.current_topic = topic

    def get_relevant_context(self) -> str:
        if not self.current_topic:
            return ""
        relevant = self.topics[self.current_topic][-6:]
        return "\n".join(relevant)
```

Topic tracking is especially powerful for support agents where users switch between issues mid-conversation. The agent can pull in context about billing when the user returns to a billing question, even if several technical support messages intervened.

## Combining Strategies in TypeScript

Here is a TypeScript implementation that combines sliding window with summarization:

```typescript
interface ManagedMessage {
  role: "user" | "assistant" | "system";
  content: string;
  timestamp: number;
  pinned: boolean;
}

class ConversationContext {
  private messages: ManagedMessage[] = [];
  private summary: string | null = null;
  private readonly maxTokens = 8000;

  addMessage(role: ManagedMessage["role"], content: string, pinned = false) {
    this.messages.push({
      role, content, timestamp: Date.now(), pinned,
    });
  }

  async compact(summarizer: (msgs: ManagedMessage[]) => Promise) {
    const tokenEstimate = this.messages
      .reduce((sum, m) => sum + m.content.split(" ").length * 1.3, 0);

    if (tokenEstimate > this.maxTokens * 0.75) {
      const pinned = this.messages.filter((m) => m.pinned);
      const recent = this.messages.filter((m) => !m.pinned).slice(-4);
      const old = this.messages.filter(
        (m) => !m.pinned && !recent.includes(m)
      );
      this.summary = await summarizer(old);
      this.messages = [...pinned, ...recent];
    }
  }

  build(systemPrompt: string): Array {
    const ctx: Array = [
      { role: "system", content: systemPrompt },
    ];
    if (this.summary) {
      ctx.push({ role: "system", content: `Prior context: ${this.summary}` });
    }
    this.messages.forEach((m) => ctx.push({ role: m.role, content: m.content }));
    return ctx;
  }
}
```

## FAQ

### How do I count tokens accurately instead of estimating?

Use the `tiktoken` library for OpenAI models. Call `tiktoken.encoding_for_model("gpt-4o")` to get the tokenizer, then `len(encoding.encode(text))` for exact counts. For Claude, use Anthropic's token counting API endpoint. Accurate counting prevents both wasted context space and unexpected truncation errors.

### When should I summarize versus just truncate old messages?

Summarize when the conversation involves ongoing state — like a support ticket where the user described their problem early on and is now troubleshooting. Truncate when messages are mostly independent exchanges, like a FAQ bot where each question stands alone. The cost of a summarization call (latency and tokens) only pays off when the summary carries information the agent genuinely needs.

### How do I handle tool call results in context management?

Tool call results can be verbose. Store the full result in your database but inject only a condensed version into the context. For example, if a database query returns 50 rows, summarize it as "Query returned 50 orders, most recent from March 15, total value $4,230." This preserves the key facts while saving thousands of tokens.

---

#ContextManagement #ConversationMemory #MultiTurn #LLM #ChatAgent #AgenticAI #LearnAI #AIEngineering

---

Source: https://callsphere.ai/blog/chat-agent-context-management-multi-turn-conversations
