Skip to content
Learn Agentic AI
Learn Agentic AI12 min read8 views

Building a Claude Web Scraper: Extracting Data Using Vision Instead of Selectors

Learn how to use Claude Computer Use for visual data extraction — reading HTML tables, parsing charts, extracting structured data from complex layouts, and converting visual information to JSON without any CSS selectors.

Why Vision-Based Scraping?

Traditional web scraping with BeautifulSoup or Scrapy relies on parsing HTML and navigating the DOM tree. This works well for simple, well-structured pages. But the modern web is full of content that lives outside the DOM in a straightforward way: data rendered in canvas elements, charts built with D3 or Chart.js, information embedded in images, PDF viewers rendered in the browser, and dynamically loaded content hidden behind JavaScript frameworks.

Claude's vision capability lets you skip all of that complexity. Instead of parsing HTML, you take a screenshot and ask Claude to read what it sees. The data extraction happens at the visual level, making it resilient to DOM changes, anti-scraping measures, and complex rendering pipelines.

Basic Visual Extraction

The simplest form of visual scraping sends a screenshot to Claude with structured output instructions:

flowchart TD
    START["Building a Claude Web Scraper: Extracting Data Us…"] --> A
    A["Why Vision-Based Scraping?"]
    A --> B
    B["Basic Visual Extraction"]
    B --> C
    C["Extracting Data from Charts"]
    C --> D
    D["Full-Page Scraping with Scrolling"]
    D --> E
    E["Handling Complex Layouts"]
    E --> F
    F["Accuracy Considerations"]
    F --> G
    G["FAQ"]
    G --> DONE["Key Takeaways"]
    style START fill:#4f46e5,stroke:#4338ca,color:#fff
    style DONE fill:#059669,stroke:#047857,color:#fff
import anthropic
import json

client = anthropic.Anthropic()

def extract_table_data(screenshot_b64: str, description: str) -> list[dict]:
    """Extract tabular data from a screenshot using Claude vision."""
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=4096,
        messages=[{
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": screenshot_b64,
                    },
                },
                {
                    "type": "text",
                    "text": f"""Extract all data from the table visible in this screenshot.

Context: {description}

Return the data as a JSON array of objects where each object represents
a row and the keys are the column headers. Use exact values as shown.
Return ONLY valid JSON, no other text.""",
                },
            ],
        }],
    )

    return json.loads(response.content[0].text)

This function handles any visible table — HTML tables, tables rendered inside canvas, tables in embedded PDFs, even tables in images. Claude reads the visual content and returns structured JSON.

Extracting Data from Charts

Charts are a prime use case for vision-based scraping because the data in a chart is rendered as pixels, not accessible DOM elements. Claude can read bar charts, line charts, pie charts, and more:

def extract_chart_data(screenshot_b64: str, chart_type: str) -> dict:
    """Extract data points from a chart in a screenshot."""
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=4096,
        messages=[{
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": screenshot_b64,
                    },
                },
                {
                    "type": "text",
                    "text": f"""Analyze this {chart_type} chart and extract all data points.

For each data series, provide:
- series_name: the label of the series
- data_points: array of {{label, value}} objects

Also extract:
- chart_title: the title of the chart
- x_axis_label: the x-axis label
- y_axis_label: the y-axis label

Return as JSON. Estimate numeric values from the chart's axis scale
as precisely as possible.""",
                },
            ],
        }],
    )

    return json.loads(response.content[0].text)

Full-Page Scraping with Scrolling

Real-world scraping often requires scrolling through a page to capture all content. Here is a complete scraper that handles pagination through scrolling:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

from playwright.async_api import async_playwright
import asyncio
import base64

class VisualScraper:
    def __init__(self):
        self.client = anthropic.Anthropic()
        self.all_data = []

    async def scrape_full_page(self, url: str, extraction_prompt: str) -> list:
        async with async_playwright() as p:
            browser = await p.chromium.launch()
            page = await browser.new_page(viewport={"width": 1280, "height": 800})
            await page.goto(url, wait_until="networkidle")

            prev_screenshot = None
            scroll_count = 0
            max_scrolls = 20

            while scroll_count < max_scrolls:
                screenshot = await page.screenshot()
                screenshot_b64 = base64.standard_b64encode(screenshot).decode()

                # Check if page content has changed after scroll
                if screenshot_b64 == prev_screenshot:
                    break  # Reached bottom of page

                prev_screenshot = screenshot_b64

                # Extract data from current viewport
                data = await self._extract(screenshot_b64, extraction_prompt)
                self.all_data.extend(data)

                # Scroll down
                await page.mouse.wheel(0, 600)
                await asyncio.sleep(1)
                scroll_count += 1

            await browser.close()

        return self._deduplicate(self.all_data)

    async def _extract(self, screenshot_b64: str, prompt: str) -> list:
        response = self.client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=4096,
            messages=[{
                "role": "user",
                "content": [
                    {"type": "image", "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": screenshot_b64,
                    }},
                    {"type": "text", "text": prompt + "\nReturn as JSON array."},
                ],
            }],
        )
        try:
            return json.loads(response.content[0].text)
        except json.JSONDecodeError:
            return []

    def _deduplicate(self, items: list) -> list:
        seen = set()
        unique = []
        for item in items:
            key = json.dumps(item, sort_keys=True)
            if key not in seen:
                seen.add(key)
                unique.append(item)
        return unique

Handling Complex Layouts

Some pages have data spread across cards, tiles, or non-tabular layouts. Claude handles these naturally:

extraction_prompt = """Extract all product listings visible on this page.
For each product, return:
- name: product name
- price: price as shown (include currency symbol)
- rating: star rating if visible
- review_count: number of reviews if shown
- availability: in stock or out of stock
- image_description: brief description of the product image

If any field is not visible for a product, use null."""

scraper = VisualScraper()
products = asyncio.run(
    scraper.scrape_full_page(
        "https://example.com/products",
        extraction_prompt
    )
)

The key advantage here is that Claude understands layout semantics. It knows that a price displayed below a product name belongs to that product, even if the HTML structure groups them in unexpected ways.

Accuracy Considerations

Vision-based extraction is not pixel-perfect for numeric values read from charts. Claude estimates values based on axis scales and visual position. For bar charts, expect accuracy within 2-5% of the actual value. For precise numeric extraction from tables, accuracy is typically above 99% since Claude reads the actual rendered text.

Always validate extracted data against known reference points when possible. For critical applications, extract the same data multiple times and compare results, flagging any discrepancies for human review.

FAQ

How does vision-based scraping handle anti-bot protection?

Since Claude works from screenshots rather than making HTTP requests, it is invisible to server-side anti-bot systems. The browser session itself still needs to avoid detection, but the extraction step happens entirely on the client side through image analysis.

Can Claude extract data from screenshots of mobile layouts?

Yes. Set your browser viewport to a mobile resolution (e.g., 375x812 for iPhone) and Claude will interpret the mobile layout correctly. It understands responsive design patterns like hamburger menus, stacked cards, and collapsible sections.

What is the cost of scraping a 20-page website?

With one screenshot per viewport and an average of 3-5 scrolls per page, that is roughly 60-100 API calls. At Claude Sonnet pricing with image inputs, expect approximately $1-3 for the full scrape. This is significantly more expensive than HTML parsing, so reserve vision-based scraping for pages where traditional methods fail.


#ClaudeWebScraper #VisionAI #DataExtraction #WebScraping #StructuredOutput #AIDataParsing #ComputerUse

Share
C

Written by

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

Learn Agentic AI

Computer Use in GPT-5.4: Building AI Agents That Navigate Desktop Applications

Technical guide to GPT-5.4's computer use capabilities for building AI agents that interact with desktop UIs, browser automation, and real-world application workflows.

Learn Agentic AI

How to Build an AI Coding Assistant with Claude and MCP: Step-by-Step Guide

Build a powerful AI coding assistant that reads files, runs tests, and fixes bugs using the Claude API and Model Context Protocol servers in TypeScript.

Learn Agentic AI

Building Your First MCP Server: Connect AI Agents to Any External Tool

Step-by-step tutorial on building an MCP server in TypeScript, registering tools and resources, handling requests, and connecting to Claude and other LLM clients.

Learn Agentic AI

Computer Use Agents 2026: How Claude, GPT-5.4, and Gemini Navigate Desktop Applications

Comparison of computer use capabilities across Claude, GPT-5.4, and Gemini including accuracy benchmarks, speed tests, supported applications, and real-world limitations.

Learn Agentic AI

Using GPT-4 Vision to Understand Web Pages: Screenshot Analysis for AI Agents

Learn how to capture web page screenshots and send them to GPT-4 Vision for element identification, layout understanding, and structured analysis that powers browser automation agents.

Learn Agentic AI

Building a Claude Browser Agent: Automated Web Navigation with Anthropic SDK

Step-by-step guide to building a browser automation agent with Claude Computer Use — from SDK setup and screenshot capture to executing click, type, and scroll actions for real web navigation tasks.