---
title: "Claude Extended Thinking: Leveraging Chain-of-Thought for Complex Reasoning"
description: "Learn how to use Claude's extended thinking feature to unlock deeper reasoning for complex agent tasks. Understand thinking blocks, budget tokens, and when extended thinking outperforms standard responses."
canonical: https://callsphere.ai/blog/claude-extended-thinking-chain-of-thought-complex-reasoning
category: "Learn Agentic AI"
tags: ["Anthropic", "Claude", "Extended Thinking", "Chain of Thought", "Reasoning"]
author: "CallSphere Team"
published: 2026-03-17T00:00:00.000Z
updated: 2026-05-06T01:02:45.034Z
---

# Claude Extended Thinking: Leveraging Chain-of-Thought for Complex Reasoning

> Learn how to use Claude's extended thinking feature to unlock deeper reasoning for complex agent tasks. Understand thinking blocks, budget tokens, and when extended thinking outperforms standard responses.

## What Is Extended Thinking

Extended thinking is a Claude feature that lets the model "think out loud" before producing its final answer. When enabled, Claude generates an internal chain-of-thought reasoning trace — a thinking block — that works through the problem step by step before committing to a response.

This is not the same as asking Claude to "think step by step" in a prompt. Extended thinking is a model-level feature where Claude allocates dedicated compute to reasoning. The thinking happens in a structured `thinking` content block that is returned alongside the final `text` block, giving you visibility into the model's reasoning process.

## Enabling Extended Thinking

Extended thinking requires a `thinking` configuration with a `budget_tokens` parameter that controls how many tokens Claude can spend on reasoning:

```mermaid
flowchart TD
    SPEC(["Task spec"])
    SYSTEM["System prompt
role plus rules"]
    SHOTS["Few shot examples
3 to 5"]
    VARS["Variable injection
Jinja or f-string"]
    COT["Chain of thought
or scratchpad"]
    CONSTR["Output constraint
JSON schema"]
    LLM["LLM call"]
    EVAL["Offline eval
LLM as judge plus regex"]
    GATE{"Score over
threshold?"}
    COMMIT(["Promote to prod
version pinned"])
    REVISE(["Revise prompt"])
    SPEC --> SYSTEM --> SHOTS --> VARS --> COT --> CONSTR --> LLM --> EVAL --> GATE
    GATE -->|Yes| COMMIT
    GATE -->|No| REVISE --> SYSTEM
    style LLM fill:#4f46e5,stroke:#4338ca,color:#fff
    style EVAL fill:#f59e0b,stroke:#d97706,color:#1f2937
    style COMMIT fill:#059669,stroke:#047857,color:#fff
```

```python
import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000
    },
    messages=[
        {"role": "user", "content": "Analyze the trade-offs between microservices and monolithic architecture for a startup with 5 engineers building a fintech product."}
    ]
)

# The thinking block contains the reasoning trace
for block in response.content:
    if block.type == "thinking":
        print("=== THINKING ===")
        print(block.thinking)
    elif block.type == "text":
        print("=== RESPONSE ===")
        print(block.text)
```

The `budget_tokens` sets the maximum tokens Claude can use for thinking. The model may use fewer tokens if it reaches a conclusion early. The `max_tokens` must be larger than `budget_tokens` to leave room for the actual response.

## Understanding the Response Structure

With extended thinking enabled, the response contains multiple content blocks:

```python
import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=12000,
    thinking={"type": "enabled", "budget_tokens": 8000},
    messages=[
        {"role": "user", "content": "Write a Python function that finds the longest palindromic substring in O(n) time using Manacher's algorithm."}
    ]
)

for block in response.content:
    if block.type == "thinking":
        print(f"Thinking used approximately {len(block.thinking.split())} words")
    elif block.type == "text":
        print(block.text)

# Token usage shows thinking tokens separately
print(f"Input tokens: {response.usage.input_tokens}")
print(f"Output tokens: {response.usage.output_tokens}")
```

The thinking block is visible to you as the developer but is not included in conversation history for subsequent turns. This means thinking does not accumulate context window usage across multi-turn conversations.

## When to Use Extended Thinking

Extended thinking is most valuable for tasks that require multi-step reasoning:

```python
import anthropic

client = anthropic.Anthropic()

# Complex analysis task - good candidate for extended thinking
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 10000},
    system="You are a code review agent. Analyze code for bugs, security issues, and performance problems.",
    messages=[
        {"role": "user", "content": """Review this authentication function:

def authenticate(username, password):
    query = f"SELECT * FROM users WHERE username='{username}' AND password='{password}'"
    result = db.execute(query)
    if result:
        token = base64.b64encode(f"{username}:{time.time()}".encode()).decode()
        session['token'] = token
        return {"status": "ok", "token": token}
    return {"status": "fail"}
"""}
    ]
)

for block in response.content:
    if block.type == "text":
        print(block.text)
```

This is ideal for extended thinking because the model needs to evaluate SQL injection risks, password storage issues, token generation weaknesses, and session management problems — multiple distinct analyses that benefit from structured reasoning.

## Budget Token Strategies

The budget allocation depends on task complexity:

```python
import anthropic

client = anthropic.Anthropic()

def smart_query(prompt: str, complexity: str = "medium") -> str:
    budgets = {
        "low": 2000,     # Simple factual questions
        "medium": 6000,  # Analysis and comparison tasks
        "high": 12000,   # Complex reasoning, code generation, math
    }

    budget = budgets.get(complexity, 6000)

    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=budget + 4000,
        thinking={"type": "enabled", "budget_tokens": budget},
        messages=[{"role": "user", "content": prompt}]
    )

    return "".join(
        block.text for block in response.content if block.type == "text"
    )

# Low complexity - fast, cheap
answer = smart_query("What is the capital of France?", "low")

# High complexity - deep reasoning
answer = smart_query(
    "Design a rate limiting system that handles 100K requests/second with geographic distribution",
    "high"
)
```

Start with lower budgets and increase only when you observe the model cutting its reasoning short. Oversized budgets waste tokens (and money) without improving quality on simple tasks.

## Extended Thinking in Agent Loops

When combining extended thinking with tool use, thinking happens before each tool call decision:

```python
import anthropic

client = anthropic.Anthropic()

# Extended thinking works alongside tools
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 8000},
    tools=[{
        "name": "run_sql",
        "description": "Execute a SQL query and return results.",
        "input_schema": {
            "type": "object",
            "properties": {"query": {"type": "string"}},
            "required": ["query"]
        }
    }],
    messages=[
        {"role": "user", "content": "Find the top 5 customers by lifetime revenue, excluding test accounts."}
    ]
)

# Response may contain: thinking -> text -> tool_use
for block in response.content:
    print(f"Block type: {block.type}")
```

The thinking block reveals how Claude reasons about which tool to call and what arguments to provide, which is invaluable for debugging agent behavior.

## FAQ

### Does extended thinking increase costs?

Yes. Thinking tokens are billed as output tokens, which are more expensive than input tokens. A 10,000 token thinking budget could add significant cost per request. Use extended thinking selectively for tasks where the quality improvement justifies the cost, not for every API call.

### Can I use extended thinking with streaming?

Yes. When streaming with extended thinking, you receive `thinking_delta` events followed by `content_block_delta` events for the text response. This lets you show a "reasoning" indicator to users while Claude thinks, then stream the final answer in real time.

### Should I include the thinking block in conversation history?

No. The API does not include thinking blocks in the conversation history for subsequent turns. If you need to reference Claude's reasoning in follow-up turns, extract the relevant parts from the thinking block and include them as regular text content in your messages.

---

#Anthropic #Claude #ExtendedThinking #ChainOfThought #Reasoning #AgenticAI #LearnAI #AIEngineering

---

Source: https://callsphere.ai/blog/claude-extended-thinking-chain-of-thought-complex-reasoning
