Multi-Turn Reasoning: Building Agents That Think Across Multiple LLM Calls

Why Single-Call Reasoning Falls Short

A single LLM call operates within a fixed context window and produces output in a single forward pass. For simple tasks this is fine, but complex problems — analyzing a 50-page contract, debugging a multi-file codebase, or planning a multi-step research process — exceed what any model can reliably handle in one shot.

Multi-turn reasoning breaks complex problems into a sequence of focused LLM calls where each call builds on the accumulated understanding from previous calls. This mirrors how human experts work: they read, reflect, revise, and refine iteratively rather than attempting to produce a perfect answer on the first try.

The Core Pattern: Reason-Accumulate-Refine

The fundamental architecture for multi-turn reasoning involves three components: a reasoning step that analyzes a specific aspect of the problem, a state accumulator that captures key findings, and a refinement step that integrates new information with prior conclusions.

flowchart LR
    INPUT(["User intent"])
    PARSE["Parse plus<br/>classify"]
    PLAN["Plan and tool<br/>selection"]
    AGENT["Agent loop<br/>LLM plus tools"]
    GUARD{"Guardrails<br/>and policy"}
    EXEC["Execute and<br/>verify result"]
    OBS[("Trace and metrics")]
    OUT(["Outcome plus<br/>next action"])
    INPUT --> PARSE --> PLAN --> AGENT --> GUARD
    GUARD -->|Pass| EXEC --> OUT
    GUARD -->|Fail| AGENT
    AGENT --> OBS
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OBS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff

from dataclasses import dataclass, field
from openai import OpenAI

@dataclass
class ReasoningState:
    """Accumulated state across reasoning turns."""
    findings: list[str] = field(default_factory=list)
    uncertainties: list[str] = field(default_factory=list)
    conclusions: list[str] = field(default_factory=list)
    turn_count: int = 0

    def summary(self) -> str:
        parts = []
        if self.findings:
            parts.append("Findings:\n" + "\n".join(f"- {f}" for f in self.findings))
        if self.uncertainties:
            parts.append("Open questions:\n" + "\n".join(f"- {u}" for u in self.uncertainties))
        if self.conclusions:
            parts.append("Conclusions so far:\n" + "\n".join(f"- {c}" for c in self.conclusions))
        return "\n\n".join(parts)

def multi_turn_analyze(document: str, client: OpenAI, max_turns: int = 5) -> ReasoningState:
    """Analyze a document through multiple reasoning turns."""
    state = ReasoningState()
    chunks = split_into_sections(document)

    for i, chunk in enumerate(chunks[:max_turns]):
        state.turn_count += 1

        prompt = f"""You are analyzing a document section by section.

Previous analysis:
{state.summary() or "No prior analysis yet."}

Current section:
{chunk}

Provide: (1) new findings, (2) any uncertainties, (3) updated conclusions.
Return as JSON with keys: findings, uncertainties, conclusions (each a list of strings)."""

        response = client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            response_format={"type": "json_object"},
        )
        result = json.loads(response.choices[0].message.content)

        state.findings.extend(result.get("findings", []))
        state.uncertainties.extend(result.get("uncertainties", []))
        state.conclusions = result.get("conclusions", state.conclusions)

    return state

The most powerful multi-turn pattern is self-critique, where the agent reviews its own output and iteratively improves it. Each turn receives both the original task and the previous attempt, allowing the model to identify gaps, correct errors, and add nuance:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

def refine_with_critique(
    task: str, client: OpenAI, max_refinements: int = 3
) -> str:
    """Generate an answer and refine it through self-critique."""
    # Initial generation
    messages = [{"role": "user", "content": task}]
    response = client.chat.completions.create(model="gpt-4", messages=messages)
    current_answer = response.choices[0].message.content

    for turn in range(max_refinements):
        critique_prompt = f"""Review this answer for accuracy, completeness, and clarity.

Original task: {task}

Current answer:
{current_answer}

List specific issues, then provide an improved version.
If the answer is already excellent, respond with exactly: SATISFACTORY"""

        critique_response = client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": critique_prompt}],
        )
        critique = critique_response.choices[0].message.content

        if "SATISFACTORY" in critique:
            break
        current_answer = critique  # the critique contains the improved version

    return current_answer

State Accumulation Strategies

How you accumulate state across turns significantly affects reasoning quality. Three common strategies:

Full history passes all previous LLM outputs into each subsequent call. This preserves maximum context but consumes tokens rapidly and may hit context limits.

Summary compression periodically summarizes accumulated findings into a compact representation. This scales to many turns but risks losing nuanced details during summarization.

Structured extraction parses each LLM response into structured data (facts, entities, relationships) and reconstructs the context from this structured state. This is the most token-efficient and supports the most reasoning turns.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Knowing When to Stop

Multi-turn reasoning needs termination conditions. Without them, agents waste tokens refining already-good answers or loop indefinitely. Effective stopping criteria include convergence detection (consecutive turns produce no new findings), confidence thresholds (the model reports high confidence), and budget limits (maximum turns or token spend).

FAQ

How many reasoning turns should an agent use?

It depends on task complexity. Simple classification tasks rarely benefit from more than 2-3 turns. Complex analysis tasks like contract review or code audit may need 5-10 turns. Use convergence detection rather than a fixed turn count — stop when turns stop producing new insights.

Does multi-turn reasoning increase costs significantly?

Yes, each turn is a separate API call. However, the cost is often justified: a 3-turn refinement that produces a correct answer is cheaper than a single-turn answer that requires human correction. Use summary compression to keep per-turn token counts manageable.

How do I prevent the agent from contradicting its earlier reasoning?

Include a structured summary of prior conclusions in each turn's prompt and explicitly instruct the model to either build on or explicitly revise (with justification) its previous conclusions. The structured state approach makes contradictions easier to detect programmatically.

#MultiTurnReasoning #ReasoningChains #AgentArchitecture #StateManagement #AgenticAI #LearnAI #AIEngineering

Multi-Turn Reasoning: Building Agents That Think Across Multiple LLM Calls

Why Single-Call Reasoning Falls Short

The Core Pattern: Reason-Accumulate-Refine

Progressive Refinement: The Self-Critique Loop

State Accumulation Strategies

Knowing When to Stop

FAQ

How many reasoning turns should an agent use?

Does multi-turn reasoning increase costs significantly?

How do I prevent the agent from contradicting its earlier reasoning?

Try CallSphere AI Voice Agents

Related Articles You May Like

Graphiti: How Temporal Knowledge Graphs Give AI Voice Agents Persistent Memory (2026 Guide)

Desktop AI Agents in 2026: Project Arc, Claude Cowork, OpenAI Agents Compared

Anatomy of an AI Pitchbook Builder Powered by Claude Opus 4.7

ReAct Loop vs Model-Native: Head-to-Head on Reliability and Cost

The Agent Control Loop Is Moving Inside the Model: Old vs New Diagram

MCP vs A2A: When To Use Which Protocol (2026 Decision Guide)