Skip to content
Learn Agentic AI
Learn Agentic AI11 min read2 views

Claude Extended Thinking: Leveraging Chain-of-Thought for Complex Reasoning

Learn how to use Claude's extended thinking feature to unlock deeper reasoning for complex agent tasks. Understand thinking blocks, budget tokens, and when extended thinking outperforms standard responses.

What Is Extended Thinking

Extended thinking is a Claude feature that lets the model "think out loud" before producing its final answer. When enabled, Claude generates an internal chain-of-thought reasoning trace — a thinking block — that works through the problem step by step before committing to a response.

This is not the same as asking Claude to "think step by step" in a prompt. Extended thinking is a model-level feature where Claude allocates dedicated compute to reasoning. The thinking happens in a structured thinking content block that is returned alongside the final text block, giving you visibility into the model's reasoning process.

Enabling Extended Thinking

Extended thinking requires a thinking configuration with a budget_tokens parameter that controls how many tokens Claude can spend on reasoning:

flowchart TD
    START["Claude Extended Thinking: Leveraging Chain-of-Tho…"] --> A
    A["What Is Extended Thinking"]
    A --> B
    B["Enabling Extended Thinking"]
    B --> C
    C["Understanding the Response Structure"]
    C --> D
    D["When to Use Extended Thinking"]
    D --> E
    E["Budget Token Strategies"]
    E --> F
    F["Extended Thinking in Agent Loops"]
    F --> G
    G["FAQ"]
    G --> DONE["Key Takeaways"]
    style START fill:#4f46e5,stroke:#4338ca,color:#fff
    style DONE fill:#059669,stroke:#047857,color:#fff
import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000
    },
    messages=[
        {"role": "user", "content": "Analyze the trade-offs between microservices and monolithic architecture for a startup with 5 engineers building a fintech product."}
    ]
)

# The thinking block contains the reasoning trace
for block in response.content:
    if block.type == "thinking":
        print("=== THINKING ===")
        print(block.thinking)
    elif block.type == "text":
        print("=== RESPONSE ===")
        print(block.text)

The budget_tokens sets the maximum tokens Claude can use for thinking. The model may use fewer tokens if it reaches a conclusion early. The max_tokens must be larger than budget_tokens to leave room for the actual response.

Understanding the Response Structure

With extended thinking enabled, the response contains multiple content blocks:

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=12000,
    thinking={"type": "enabled", "budget_tokens": 8000},
    messages=[
        {"role": "user", "content": "Write a Python function that finds the longest palindromic substring in O(n) time using Manacher's algorithm."}
    ]
)

for block in response.content:
    if block.type == "thinking":
        print(f"Thinking used approximately {len(block.thinking.split())} words")
    elif block.type == "text":
        print(block.text)

# Token usage shows thinking tokens separately
print(f"Input tokens: {response.usage.input_tokens}")
print(f"Output tokens: {response.usage.output_tokens}")

The thinking block is visible to you as the developer but is not included in conversation history for subsequent turns. This means thinking does not accumulate context window usage across multi-turn conversations.

When to Use Extended Thinking

Extended thinking is most valuable for tasks that require multi-step reasoning:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

import anthropic

client = anthropic.Anthropic()

# Complex analysis task - good candidate for extended thinking
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 10000},
    system="You are a code review agent. Analyze code for bugs, security issues, and performance problems.",
    messages=[
        {"role": "user", "content": """Review this authentication function:

def authenticate(username, password):
    query = f"SELECT * FROM users WHERE username='{username}' AND password='{password}'"
    result = db.execute(query)
    if result:
        token = base64.b64encode(f"{username}:{time.time()}".encode()).decode()
        session['token'] = token
        return {"status": "ok", "token": token}
    return {"status": "fail"}
"""}
    ]
)

for block in response.content:
    if block.type == "text":
        print(block.text)

This is ideal for extended thinking because the model needs to evaluate SQL injection risks, password storage issues, token generation weaknesses, and session management problems — multiple distinct analyses that benefit from structured reasoning.

Budget Token Strategies

The budget allocation depends on task complexity:

import anthropic

client = anthropic.Anthropic()

def smart_query(prompt: str, complexity: str = "medium") -> str:
    budgets = {
        "low": 2000,     # Simple factual questions
        "medium": 6000,  # Analysis and comparison tasks
        "high": 12000,   # Complex reasoning, code generation, math
    }

    budget = budgets.get(complexity, 6000)

    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=budget + 4000,
        thinking={"type": "enabled", "budget_tokens": budget},
        messages=[{"role": "user", "content": prompt}]
    )

    return "".join(
        block.text for block in response.content if block.type == "text"
    )

# Low complexity - fast, cheap
answer = smart_query("What is the capital of France?", "low")

# High complexity - deep reasoning
answer = smart_query(
    "Design a rate limiting system that handles 100K requests/second with geographic distribution",
    "high"
)

Start with lower budgets and increase only when you observe the model cutting its reasoning short. Oversized budgets waste tokens (and money) without improving quality on simple tasks.

Extended Thinking in Agent Loops

When combining extended thinking with tool use, thinking happens before each tool call decision:

import anthropic

client = anthropic.Anthropic()

# Extended thinking works alongside tools
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 8000},
    tools=[{
        "name": "run_sql",
        "description": "Execute a SQL query and return results.",
        "input_schema": {
            "type": "object",
            "properties": {"query": {"type": "string"}},
            "required": ["query"]
        }
    }],
    messages=[
        {"role": "user", "content": "Find the top 5 customers by lifetime revenue, excluding test accounts."}
    ]
)

# Response may contain: thinking -> text -> tool_use
for block in response.content:
    print(f"Block type: {block.type}")

The thinking block reveals how Claude reasons about which tool to call and what arguments to provide, which is invaluable for debugging agent behavior.

FAQ

Does extended thinking increase costs?

Yes. Thinking tokens are billed as output tokens, which are more expensive than input tokens. A 10,000 token thinking budget could add significant cost per request. Use extended thinking selectively for tasks where the quality improvement justifies the cost, not for every API call.

Can I use extended thinking with streaming?

Yes. When streaming with extended thinking, you receive thinking_delta events followed by content_block_delta events for the text response. This lets you show a "reasoning" indicator to users while Claude thinks, then stream the final answer in real time.

Should I include the thinking block in conversation history?

No. The API does not include thinking blocks in the conversation history for subsequent turns. If you need to reference Claude's reasoning in follow-up turns, extract the relevant parts from the thinking block and include them as regular text content in your messages.


#Anthropic #Claude #ExtendedThinking #ChainOfThought #Reasoning #AgenticAI #LearnAI #AIEngineering

Share
C

Written by

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

AI Interview Prep

8 AI System Design Interview Questions Actually Asked at FAANG in 2026

Real AI system design interview questions from Google, Meta, OpenAI, and Anthropic. Covers LLM serving, RAG pipelines, recommendation systems, AI agents, and more — with detailed answer frameworks.

AI Interview Prep

8 LLM & RAG Interview Questions That OpenAI, Anthropic & Google Actually Ask

Real LLM and RAG interview questions from top AI labs in 2026. Covers fine-tuning vs RAG decisions, production RAG pipelines, evaluation, PEFT methods, positional embeddings, and safety guardrails with expert answers.

AI Interview Prep

7 AI Coding Interview Questions From Anthropic, Meta & OpenAI (2026 Edition)

Real AI coding interview questions from Anthropic, Meta, and OpenAI in 2026. Includes implementing attention from scratch, Anthropic's progressive coding screens, Meta's AI-assisted round, and vector search — with solution approaches.

AI Interview Prep

7 Agentic AI & Multi-Agent System Interview Questions for 2026

Real agentic AI and multi-agent system interview questions from Anthropic, OpenAI, and Microsoft in 2026. Covers agent design patterns, memory systems, safety, orchestration frameworks, tool calling, and evaluation.

Learn Agentic AI

MCP Ecosystem Hits 5,000 Servers: Model Context Protocol Production Guide 2026

The MCP ecosystem has grown to 5,000+ servers. This production guide covers building MCP servers, enterprise adoption patterns, the 2026 roadmap, and integration best practices.

AI Interview Prep

6 AI Safety & Alignment Interview Questions From Anthropic & OpenAI (2026)

Real AI safety and alignment interview questions from Anthropic and OpenAI in 2026. Covers alignment challenges, RLHF vs DPO, responsible scaling, red-teaming, safety-first decisions, and autonomous agent oversight.