Skip to content
Learn Agentic AI
Learn Agentic AI9 min read4 views

Server-Managed Conversations with OpenAI Conversations API

Use the OpenAI Conversations API with conversations.create, previous_response_id chaining, and auto_previous_response_id for server-side history management in AI agents.

Two Approaches to Conversation History

The OpenAI Agents SDK supports two fundamentally different approaches to managing conversation history:

  1. Client-side sessions: Your application stores and retrieves history using a session backend (SQLite, Redis, SQLAlchemy). The full history is sent with each API request.

  2. Server-managed conversations: OpenAI's servers store the history. You reference it with an ID, and the server reconstructs the context. Your application only sends the new message.

Each approach has distinct tradeoffs. This post explores server-managed conversations and when they are the right choice.

How Server-Managed Conversations Work

With client-side sessions, every API call includes the full conversation history in the request payload. For a 50-turn conversation, you are sending all 50 turns every time.

flowchart TD
    START["Server-Managed Conversations with OpenAI Conversa…"] --> A
    A["Two Approaches to Conversation History"]
    A --> B
    B["How Server-Managed Conversations Work"]
    B --> C
    C["conversations.create and conversation_id"]
    C --> D
    D["previous_response_id Chaining"]
    D --> E
    E["Using Server-Managed Conversations with…"]
    E --> F
    F["Building a Chat Application with Server…"]
    F --> G
    G["Combining Server and Client-Side Approa…"]
    G --> H
    H["Server vs Client-Side: When to Use Each"]
    H --> DONE["Key Takeaways"]
    style START fill:#4f46e5,stroke:#4338ca,color:#fff
    style DONE fill:#059669,stroke:#047857,color:#fff

With server-managed conversations, OpenAI stores the conversation on their servers. Your API call includes only:

  • The new user message
  • A reference to the previous response (previous_response_id)

The server reconstructs the full context internally. This reduces your request payload size dramatically and simplifies your client code.

conversations.create() and conversation_id

The Conversations API lets you create a named conversation container:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

from openai import AsyncOpenAI

client = AsyncOpenAI()

# Create a conversation
conversation = await client.conversations.create()
print(f"Conversation ID: {conversation.id}")
# Output: conv_abc123...

The conversation_id is a persistent handle for the conversation. You can use it across multiple requests to maintain continuity.

previous_response_id Chaining

The core mechanism for server-managed multi-turn conversations is previous_response_id. Each response has a unique ID, and you pass it to the next request to chain them together.

flowchart TD
    ROOT["Server-Managed Conversations with OpenAI Con…"] 
    ROOT --> P0["Using Server-Managed Conversations with…"]
    P0 --> P0C0["auto_previous_response_id=True"]
    ROOT --> P1["Server vs Client-Side: When to Use Each"]
    P1 --> P1C0["Use Server-Managed When:"]
    P1 --> P1C1["Use Client-Side Sessions When:"]
    P1 --> P1C2["The Hybrid Approach"]
    style ROOT fill:#4f46e5,stroke:#4338ca,color:#fff
    style P0 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
    style P1 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
from openai import AsyncOpenAI

client = AsyncOpenAI()

# Turn 1
response1 = await client.responses.create(
    model="gpt-4o",
    input="My name is Alice and I'm planning a trip to Japan.",
)
print(f"Turn 1: {response1.output_text}")
print(f"Response ID: {response1.id}")

# Turn 2 — chain to Turn 1
response2 = await client.responses.create(
    model="gpt-4o",
    input="What month should I visit?",
    previous_response_id=response1.id,
)
print(f"Turn 2: {response2.output_text}")
# The model knows Alice is planning a Japan trip

# Turn 3 — chain to Turn 2 (which chains to Turn 1)
response3 = await client.responses.create(
    model="gpt-4o",
    input="And what's my name?",
    previous_response_id=response2.id,
)
print(f"Turn 3: {response3.output_text}")
# Output: "Your name is Alice."

The chain is cumulative — response3 has context from all three turns because each response links back to its predecessor.

Using Server-Managed Conversations with the Agents SDK

The Agents SDK integrates server-managed conversations through the auto_previous_response_id setting:

from agents import Agent, Runner

agent = Agent(
    name="ServerMemoryAgent",
    instructions="You are a helpful assistant with server-managed memory.",
)

# Turn 1
result1 = await Runner.run(agent, "I live in Tokyo and work as an engineer.")
response_id = result1.raw_responses[-1].id

# Turn 2 — pass previous_response_id
result2 = await Runner.run(
    agent,
    "What city do I live in?",
    previous_response_id=response_id,
)
print(result2.final_output)  # "You live in Tokyo."

auto_previous_response_id=True

To avoid manually tracking response IDs, enable automatic chaining:

from agents import Agent, Runner, RunConfig

agent = Agent(
    name="AutoChainAgent",
    instructions="You are a helpful assistant.",
)

config = RunConfig(auto_previous_response_id=True)

# The runner automatically chains responses
result1 = await Runner.run(agent, "My favorite language is Python.", run_config=config)
result2 = await Runner.run(agent, "What's my favorite language?", run_config=config)
print(result2.final_output)  # "Your favorite language is Python."

With auto_previous_response_id=True, the runner tracks the last response ID and passes it automatically on the next call. No session backend needed, no history management code.

Building a Chat Application with Server-Managed Memory

Here is a complete chatbot using server-managed conversations:

flowchart TD
    CENTER(("Core Concepts"))
    CENTER --> N0["The new user message"]
    CENTER --> N1["A reference to the previous response pr…"]
    CENTER --> N2["Simplicity is a priority: No session ba…"]
    CENTER --> N3["You trust OpenAI with conversation data…"]
    CENTER --> N4["You want minimal client-side code: Just…"]
    CENTER --> N5["Data sovereignty matters: You need conv…"]
    style CENTER fill:#4f46e5,stroke:#4338ca,color:#fff
import asyncio
from agents import Agent, Runner

agent = Agent(
    name="ChatBot",
    instructions="""You are a friendly conversational assistant.
    Remember everything the user tells you across the conversation.""",
)

class ServerManagedChat:
    def __init__(self):
        self.last_response_id: str | None = None

    async def send_message(self, message: str) -> str:
        """Send a message and get a response, with automatic chaining."""
        kwargs = {}
        if self.last_response_id:
            kwargs["previous_response_id"] = self.last_response_id

        result = await Runner.run(agent, message, **kwargs)

        # Store the response ID for the next turn
        self.last_response_id = result.raw_responses[-1].id

        return result.final_output

    def reset(self):
        """Start a new conversation."""
        self.last_response_id = None

async def main():
    chat = ServerManagedChat()
    print("Chat started. Type 'quit' to exit, 'reset' for new conversation.\n")

    while True:
        user_input = input("You: ").strip()
        if user_input.lower() == "quit":
            break
        if user_input.lower() == "reset":
            chat.reset()
            print("Conversation reset.\n")
            continue

        response = await chat.send_message(user_input)
        print(f"Bot: {response}\n")

asyncio.run(main())

Combining Server and Client-Side Approaches

You can use both approaches together. Server-managed conversations handle the immediate multi-turn context, while client-side sessions store long-term user data.

from agents import Agent, Runner
from agents.extensions.sessions import SQLiteSession

# Client-side: persistent user preferences
user_session = SQLiteSession(db_path="./user_profiles.db")

agent = Agent(
    name="HybridAgent",
    instructions="You are an assistant with both short-term and long-term memory.",
)

class HybridMemoryChat:
    def __init__(self, user_id: str):
        self.user_id = user_id
        self.last_response_id: str | None = None

    async def load_user_context(self) -> str:
        """Load persistent user context from client-side session."""
        items = await user_session.get_items(f"profile:{self.user_id}")
        if items:
            return "User context: " + str(items[-1].get("content", ""))
        return ""

    async def send_message(self, message: str) -> str:
        # Load persistent context
        context = await self.load_user_context()
        full_message = f"{context}\n\nUser: {message}" if context else message

        kwargs = {}
        if self.last_response_id:
            kwargs["previous_response_id"] = self.last_response_id

        result = await Runner.run(agent, full_message, **kwargs)
        self.last_response_id = result.raw_responses[-1].id

        return result.final_output

    async def save_preference(self, preference: str):
        """Save a long-term preference to client-side session."""
        await user_session.add_items(
            f"profile:{self.user_id}",
            [{"role": "system", "content": preference}],
        )

Server vs Client-Side: When to Use Each

Use Server-Managed When:

  • Simplicity is a priority: No session backend to manage, no history storage code.
  • You trust OpenAI with conversation data: The data lives on OpenAI's servers.
  • Conversations are short to medium length: Server-managed history works well for typical chat interactions.
  • You want minimal client-side code: Just track one ID instead of managing a full history store.

Use Client-Side Sessions When:

  • Data sovereignty matters: You need conversation data in your own infrastructure.
  • You need custom storage: DynamoDB, MongoDB, encrypted storage, etc.
  • Conversations are very long: Compaction and custom pruning strategies require client-side control.
  • Multi-agent sharing: Multiple agents reading from the same session is easier with client-side sessions.
  • Offline or air-gapped environments: Client-side sessions work without internet connectivity to OpenAI.
  • Audit and compliance: Full control over data retention, encryption, and access logging.

The Hybrid Approach

For many production systems, the best approach is hybrid:

Concern Approach
Immediate conversation context Server-managed (previous_response_id)
Long-term user preferences Client-side session (SQLite/Redis)
Cross-conversation memory Client-side session
Compliance and auditing Client-side session
Quick prototyping Server-managed

The two approaches are not mutually exclusive. Use server-managed conversations for the easy case and layer in client-side sessions where you need more control.

Sources:

Share
C

Written by

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

Technical Guides

Building Voice Agents with the OpenAI Realtime API: Full Tutorial

Hands-on tutorial for building voice agents with the OpenAI Realtime API — WebSocket setup, PCM16 audio, server VAD, and function calling.

Technical Guides

How AI Voice Agents Actually Work: Technical Deep Dive (2026 Edition)

A full technical walkthrough of how modern AI voice agents work — speech-to-text, LLM orchestration, TTS, tool calling, and sub-second latency.

Technical Guides

Voice AI Latency: Why Sub-Second Response Time Matters (And How to Hit It)

A technical breakdown of voice AI latency budgets — STT, LLM, TTS, network — and how to hit sub-second end-to-end response times.

AI Interview Prep

8 AI System Design Interview Questions Actually Asked at FAANG in 2026

Real AI system design interview questions from Google, Meta, OpenAI, and Anthropic. Covers LLM serving, RAG pipelines, recommendation systems, AI agents, and more — with detailed answer frameworks.

AI Interview Prep

8 LLM & RAG Interview Questions That OpenAI, Anthropic & Google Actually Ask

Real LLM and RAG interview questions from top AI labs in 2026. Covers fine-tuning vs RAG decisions, production RAG pipelines, evaluation, PEFT methods, positional embeddings, and safety guardrails with expert answers.

AI Interview Prep

7 ML Fundamentals Questions That Top AI Companies Still Ask in 2026

Real machine learning fundamentals interview questions from OpenAI, Google DeepMind, Meta, and xAI in 2026. Covers attention mechanisms, KV cache, distributed training, MoE, speculative decoding, and emerging architectures.