---
title: "Building Multi-Step Reasoning Agents with Claude Extended Thinking"
description: "Learn how to use Claude's extended thinking feature to build agents that solve complex reasoning problems, showing internal thought processes for math, code analysis, and multi-step decision making."
canonical: https://callsphere.ai/blog/claude-extended-thinking-multi-step-reasoning-agents
category: "Learn Agentic AI"
tags: ["Claude", "Extended Thinking", "Reasoning", "Chain of Thought", "Python"]
author: "CallSphere Team"
published: 2026-03-17T00:00:00.000Z
updated: 2026-05-06T22:46:34.403Z
---

# Building Multi-Step Reasoning Agents with Claude Extended Thinking

> Learn how to use Claude's extended thinking feature to build agents that solve complex reasoning problems, showing internal thought processes for math, code analysis, and multi-step decision making.

## What is Extended Thinking

Claude's extended thinking feature gives the model a dedicated space to reason through problems before producing a response. When enabled, Claude generates internal "thinking" tokens that are visible to the developer but are clearly separated from the final output. This is not prompt engineering — it is a model-level feature that allocates compute specifically to reasoning.

Extended thinking dramatically improves performance on tasks requiring multi-step logic: mathematical proofs, complex code analysis, strategic planning, and any scenario where the first intuition might be wrong.

## Enabling Extended Thinking

Enable extended thinking by adding a `thinking` parameter to your API call:

```mermaid
flowchart TD
    SPEC(["Task spec"])
    SYSTEM["System prompt
role plus rules"]
    SHOTS["Few shot examples
3 to 5"]
    VARS["Variable injection
Jinja or f-string"]
    COT["Chain of thought
or scratchpad"]
    CONSTR["Output constraint
JSON schema"]
    LLM["LLM call"]
    EVAL["Offline eval
LLM as judge plus regex"]
    GATE{"Score over
threshold?"}
    COMMIT(["Promote to prod
version pinned"])
    REVISE(["Revise prompt"])
    SPEC --> SYSTEM --> SHOTS --> VARS --> COT --> CONSTR --> LLM --> EVAL --> GATE
    GATE -->|Yes| COMMIT
    GATE -->|No| REVISE --> SYSTEM
    style LLM fill:#4f46e5,stroke:#4338ca,color:#fff
    style EVAL fill:#f59e0b,stroke:#d97706,color:#1f2937
    style COMMIT fill:#059669,stroke:#047857,color:#fff
```

```python
import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000,  # Max tokens for thinking
    },
    messages=[{
        "role": "user",
        "content": "Solve this step by step: If a train leaves Station A at 60 mph and another leaves Station B (300 miles away) at 40 mph heading toward each other, when and where do they meet?"
    }]
)

# Response contains both thinking and text blocks
for block in response.content:
    if block.type == "thinking":
        print("=== THINKING ===")
        print(block.thinking)
    elif block.type == "text":
        print("=== RESPONSE ===")
        print(block.text)
```

The `budget_tokens` parameter sets the maximum number of tokens Claude can spend on thinking. Set it higher for harder problems. Claude will not always use the full budget — it stops thinking when it has enough clarity to answer.

## Building a Reasoning Agent with Tools

Extended thinking combines naturally with tool use. Claude thinks through the problem, decides which tools to call, and then reasons about the results:

```python
tools = [
    {
        "name": "execute_python",
        "description": "Execute Python code and return the output. Use for calculations, data processing, or verification.",
        "input_schema": {
            "type": "object",
            "properties": {
                "code": {"type": "string", "description": "Python code to execute"}
            },
            "required": ["code"]
        }
    },
    {
        "name": "query_knowledge_base",
        "description": "Search an internal knowledge base for facts and reference data.",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "Search query"}
            },
            "required": ["query"]
        }
    }
]

def run_reasoning_agent(question: str) -> dict:
    messages = [{"role": "user", "content": question}]
    thinking_log = []

    while True:
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=16000,
            thinking={
                "type": "enabled",
                "budget_tokens": 8000,
            },
            tools=tools,
            messages=messages,
        )

        # Capture thinking blocks
        for block in response.content:
            if block.type == "thinking":
                thinking_log.append(block.thinking)

        if response.stop_reason == "end_turn":
            final_text = [b.text for b in response.content if b.type == "text"]
            return {
                "answer": "\n".join(final_text),
                "thinking_steps": thinking_log,
            }

        # Process tool calls
        messages.append({"role": "assistant", "content": response.content})
        tool_results = []
        for block in response.content:
            if block.type == "tool_use":
                result = execute_tool(block.name, block.input)
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": str(result),
                })
        messages.append({"role": "user", "content": tool_results})
```

## When Extended Thinking Makes a Difference

Extended thinking is not always necessary. It adds latency and token cost. Use it selectively for tasks where reasoning quality matters more than speed.

**High-value use cases:**

```python
# Complex code analysis
result = run_reasoning_agent(
    "Review this function for concurrency bugs, edge cases, and "
    "performance issues. The function handles concurrent database "
    "writes with optimistic locking:\n\n" + code_snippet
)

# Multi-step math and logic
result = run_reasoning_agent(
    "A company's revenue follows R(t) = 100e^(0.05t) - 20t^2 + 500t. "
    "Find when revenue is maximized and the maximum value."
)

# Strategic decision making
result = run_reasoning_agent(
    "Given these three architecture options for our payment system, "
    "analyze tradeoffs for latency, consistency, cost, and operational "
    "complexity:\n\n" + options_description
)
```

**Skip extended thinking for:** Simple lookups, straightforward text generation, translation, and tasks where Claude already performs well without extra reasoning time.

## Controlling Thinking Budget

The `budget_tokens` parameter gives you fine-grained control over reasoning depth:

```python
# Quick analysis — 2K thinking tokens
quick_response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4000,
    thinking={"type": "enabled", "budget_tokens": 2000},
    messages=[{"role": "user", "content": "What are the main pros and cons of microservices?"}]
)

# Deep analysis — 16K thinking tokens
deep_response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 16000},
    messages=[{"role": "user", "content": complex_code_review_prompt}]
)
```

Start with a modest budget (4,000-8,000 tokens) and increase it if you notice Claude's thinking being cut short on difficult problems. You can inspect the thinking output to calibrate.

## Streaming Thinking Tokens

For long-running reasoning tasks, stream the response so you can display thinking in real time:

```python
with client.messages.stream(
    model="claude-sonnet-4-20250514",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 10000},
    messages=[{"role": "user", "content": hard_problem}]
) as stream:
    for event in stream:
        if event.type == "content_block_start":
            if event.content_block.type == "thinking":
                print("[Thinking...]", end="", flush=True)
            elif event.content_block.type == "text":
                print("\n[Answer] ", end="", flush=True)
        elif event.type == "content_block_delta":
            if hasattr(event.delta, "thinking"):
                print(event.delta.thinking, end="", flush=True)
            elif hasattr(event.delta, "text"):
                print(event.delta.text, end="", flush=True)
```

## FAQ

### Does extended thinking work with all Claude models?

Extended thinking is available on Claude Sonnet and Claude Opus. The thinking budget limits and capabilities may vary between models. Check the Anthropic documentation for the latest model support details.

### Can I use extended thinking with tool use simultaneously?

Yes. When both are enabled, Claude thinks before deciding whether to call tools, and thinks again after receiving tool results. The thinking tokens from all turns accumulate in the conversation, providing a full reasoning trace across the entire agent loop.

### How much do thinking tokens cost?

Thinking tokens are billed at the same rate as output tokens for the model you are using. A `budget_tokens` of 10,000 means up to 10,000 additional output tokens charged at the model's per-token output rate. Monitor your thinking token usage to balance reasoning quality against cost.

---

#Claude #ExtendedThinking #Reasoning #ChainOfThought #Python #AgenticAI #LearnAI #AIEngineering

---

Source: https://callsphere.ai/blog/claude-extended-thinking-multi-step-reasoning-agents
