Skip to content
Learn Agentic AI
Learn Agentic AI11 min read2 views

Multi-Turn Reasoning: Building Agents That Think Across Multiple LLM Calls

Learn how to architect agents that maintain reasoning chains across multiple LLM invocations, accumulate state progressively, and refine their analysis through iterative thinking.

Why Single-Call Reasoning Falls Short

A single LLM call operates within a fixed context window and produces output in a single forward pass. For simple tasks this is fine, but complex problems — analyzing a 50-page contract, debugging a multi-file codebase, or planning a multi-step research process — exceed what any model can reliably handle in one shot.

Multi-turn reasoning breaks complex problems into a sequence of focused LLM calls where each call builds on the accumulated understanding from previous calls. This mirrors how human experts work: they read, reflect, revise, and refine iteratively rather than attempting to produce a perfect answer on the first try.

The Core Pattern: Reason-Accumulate-Refine

The fundamental architecture for multi-turn reasoning involves three components: a reasoning step that analyzes a specific aspect of the problem, a state accumulator that captures key findings, and a refinement step that integrates new information with prior conclusions.

flowchart TD
    START["Multi-Turn Reasoning: Building Agents That Think …"] --> A
    A["Why Single-Call Reasoning Falls Short"]
    A --> B
    B["The Core Pattern: Reason-Accumulate-Ref…"]
    B --> C
    C["Progressive Refinement: The Self-Critiq…"]
    C --> D
    D["State Accumulation Strategies"]
    D --> E
    E["Knowing When to Stop"]
    E --> F
    F["FAQ"]
    F --> DONE["Key Takeaways"]
    style START fill:#4f46e5,stroke:#4338ca,color:#fff
    style DONE fill:#059669,stroke:#047857,color:#fff
from dataclasses import dataclass, field
from openai import OpenAI

@dataclass
class ReasoningState:
    """Accumulated state across reasoning turns."""
    findings: list[str] = field(default_factory=list)
    uncertainties: list[str] = field(default_factory=list)
    conclusions: list[str] = field(default_factory=list)
    turn_count: int = 0

    def summary(self) -> str:
        parts = []
        if self.findings:
            parts.append("Findings:\n" + "\n".join(f"- {f}" for f in self.findings))
        if self.uncertainties:
            parts.append("Open questions:\n" + "\n".join(f"- {u}" for u in self.uncertainties))
        if self.conclusions:
            parts.append("Conclusions so far:\n" + "\n".join(f"- {c}" for c in self.conclusions))
        return "\n\n".join(parts)


def multi_turn_analyze(document: str, client: OpenAI, max_turns: int = 5) -> ReasoningState:
    """Analyze a document through multiple reasoning turns."""
    state = ReasoningState()
    chunks = split_into_sections(document)

    for i, chunk in enumerate(chunks[:max_turns]):
        state.turn_count += 1

        prompt = f"""You are analyzing a document section by section.

Previous analysis:
{state.summary() or "No prior analysis yet."}

Current section:
{chunk}

Provide: (1) new findings, (2) any uncertainties, (3) updated conclusions.
Return as JSON with keys: findings, uncertainties, conclusions (each a list of strings)."""

        response = client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            response_format={"type": "json_object"},
        )
        result = json.loads(response.choices[0].message.content)

        state.findings.extend(result.get("findings", []))
        state.uncertainties.extend(result.get("uncertainties", []))
        state.conclusions = result.get("conclusions", state.conclusions)

    return state

Progressive Refinement: The Self-Critique Loop

The most powerful multi-turn pattern is self-critique, where the agent reviews its own output and iteratively improves it. Each turn receives both the original task and the previous attempt, allowing the model to identify gaps, correct errors, and add nuance:

def refine_with_critique(
    task: str, client: OpenAI, max_refinements: int = 3
) -> str:
    """Generate an answer and refine it through self-critique."""
    # Initial generation
    messages = [{"role": "user", "content": task}]
    response = client.chat.completions.create(model="gpt-4", messages=messages)
    current_answer = response.choices[0].message.content

    for turn in range(max_refinements):
        critique_prompt = f"""Review this answer for accuracy, completeness, and clarity.

Original task: {task}

Current answer:
{current_answer}

List specific issues, then provide an improved version.
If the answer is already excellent, respond with exactly: SATISFACTORY"""

        critique_response = client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": critique_prompt}],
        )
        critique = critique_response.choices[0].message.content

        if "SATISFACTORY" in critique:
            break
        current_answer = critique  # the critique contains the improved version

    return current_answer

State Accumulation Strategies

How you accumulate state across turns significantly affects reasoning quality. Three common strategies:

Full history passes all previous LLM outputs into each subsequent call. This preserves maximum context but consumes tokens rapidly and may hit context limits.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Summary compression periodically summarizes accumulated findings into a compact representation. This scales to many turns but risks losing nuanced details during summarization.

Structured extraction parses each LLM response into structured data (facts, entities, relationships) and reconstructs the context from this structured state. This is the most token-efficient and supports the most reasoning turns.

Knowing When to Stop

Multi-turn reasoning needs termination conditions. Without them, agents waste tokens refining already-good answers or loop indefinitely. Effective stopping criteria include convergence detection (consecutive turns produce no new findings), confidence thresholds (the model reports high confidence), and budget limits (maximum turns or token spend).

FAQ

How many reasoning turns should an agent use?

It depends on task complexity. Simple classification tasks rarely benefit from more than 2-3 turns. Complex analysis tasks like contract review or code audit may need 5-10 turns. Use convergence detection rather than a fixed turn count — stop when turns stop producing new insights.

Does multi-turn reasoning increase costs significantly?

Yes, each turn is a separate API call. However, the cost is often justified: a 3-turn refinement that produces a correct answer is cheaper than a single-turn answer that requires human correction. Use summary compression to keep per-turn token counts manageable.

How do I prevent the agent from contradicting its earlier reasoning?

Include a structured summary of prior conclusions in each turn's prompt and explicitly instruct the model to either build on or explicitly revise (with justification) its previous conclusions. The structured state approach makes contradictions easier to detect programmatically.


#MultiTurnReasoning #ReasoningChains #AgentArchitecture #StateManagement #AgenticAI #LearnAI #AIEngineering

Share
C

Written by

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

Learn Agentic AI

Fine-Tuning LLMs for Agentic Tasks: When and How to Customize Foundation Models

When fine-tuning beats prompting for AI agents: dataset creation from agent traces, SFT and DPO training approaches, evaluation methodology, and cost-benefit analysis for agentic fine-tuning.

AI Interview Prep

7 Agentic AI & Multi-Agent System Interview Questions for 2026

Real agentic AI and multi-agent system interview questions from Anthropic, OpenAI, and Microsoft in 2026. Covers agent design patterns, memory systems, safety, orchestration frameworks, tool calling, and evaluation.

Learn Agentic AI

Adaptive Thinking in Claude 4.6: How AI Agents Decide When and How Much to Reason

Technical exploration of adaptive thinking in Claude 4.6 — how the model dynamically adjusts reasoning depth, its impact on agent architectures, and practical implementation patterns.

Learn Agentic AI

How NVIDIA Vera CPU Solves the Agentic AI Bottleneck: Architecture Deep Dive

Technical analysis of NVIDIA's Vera CPU designed for agentic AI workloads — why the CPU is the bottleneck, how Vera's architecture addresses it, and what it means for agent performance.

Learn Agentic AI

Microservices for AI Agents: Service Decomposition and Inter-Agent Communication

How to structure AI agents as microservices with proper service boundaries, gRPC communication, circuit breakers, health checks, and service mesh integration.

Learn Agentic AI

Event-Driven Agent Architectures: Using NATS, Kafka, and Redis Streams for Agent Communication

Deep dive into event-driven patterns for AI agent coordination: pub/sub messaging, dead letter queues, exactly-once processing with NATS, Kafka, and Redis Streams.