---
title: "Building a Claude Web Scraper: Extracting Data Using Vision Instead of Selectors"
description: "Learn how to use Claude Computer Use for visual data extraction — reading HTML tables, parsing charts, extracting structured data from complex layouts, and converting visual information to JSON without any CSS selectors."
canonical: https://callsphere.ai/blog/building-claude-web-scraper-extracting-data-vision-not-selectors
category: "Learn Agentic AI"
tags: ["Claude", "Web Scraping", "Data Extraction", "Vision AI", "Computer Use", "Structured Output"]
author: "CallSphere Team"
published: 2026-03-18T00:00:00.000Z
updated: 2026-05-07T11:16:05.514Z
---

# Building a Claude Web Scraper: Extracting Data Using Vision Instead of Selectors

> Learn how to use Claude Computer Use for visual data extraction — reading HTML tables, parsing charts, extracting structured data from complex layouts, and converting visual information to JSON without any CSS selectors.

## Why Vision-Based Scraping?

Traditional web scraping with BeautifulSoup or Scrapy relies on parsing HTML and navigating the DOM tree. This works well for simple, well-structured pages. But the modern web is full of content that lives outside the DOM in a straightforward way: data rendered in canvas elements, charts built with D3 or Chart.js, information embedded in images, PDF viewers rendered in the browser, and dynamically loaded content hidden behind JavaScript frameworks.

Claude's vision capability lets you skip all of that complexity. Instead of parsing HTML, you take a screenshot and ask Claude to read what it sees. The data extraction happens at the visual level, making it resilient to DOM changes, anti-scraping measures, and complex rendering pipelines.

## Basic Visual Extraction

The simplest form of visual scraping sends a screenshot to Claude with structured output instructions:

```mermaid
flowchart LR
    GOAL(["High level goal"])
    PLAN["Planner LLM"]
    SCREEN["Screen capture
every step"]
    VLM["Vision LLM
reads UI state"]
    ACT{"Action type"}
    CLICK["Click coordinate"]
    TYPE["Type text"]
    KEY["Keyboard shortcut"]
    GUARD["Safety filter
allow lists"]
    OS[("OS sandbox
ephemeral VM")]
    DONE(["Goal verified"])
    GOAL --> PLAN --> SCREEN --> VLM --> ACT
    ACT --> CLICK --> GUARD
    ACT --> TYPE --> GUARD
    ACT --> KEY --> GUARD
    GUARD --> OS --> SCREEN
    OS --> DONE
    style PLAN fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style DONE fill:#059669,stroke:#047857,color:#fff
```

```python
import anthropic
import json

client = anthropic.Anthropic()

def extract_table_data(screenshot_b64: str, description: str) -> list[dict]:
    """Extract tabular data from a screenshot using Claude vision."""
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=4096,
        messages=[{
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": screenshot_b64,
                    },
                },
                {
                    "type": "text",
                    "text": f"""Extract all data from the table visible in this screenshot.

Context: {description}

Return the data as a JSON array of objects where each object represents
a row and the keys are the column headers. Use exact values as shown.
Return ONLY valid JSON, no other text.""",
                },
            ],
        }],
    )

    return json.loads(response.content[0].text)
```

This function handles any visible table — HTML tables, tables rendered inside canvas, tables in embedded PDFs, even tables in images. Claude reads the visual content and returns structured JSON.

## Extracting Data from Charts

Charts are a prime use case for vision-based scraping because the data in a chart is rendered as pixels, not accessible DOM elements. Claude can read bar charts, line charts, pie charts, and more:

```python
def extract_chart_data(screenshot_b64: str, chart_type: str) -> dict:
    """Extract data points from a chart in a screenshot."""
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=4096,
        messages=[{
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": screenshot_b64,
                    },
                },
                {
                    "type": "text",
                    "text": f"""Analyze this {chart_type} chart and extract all data points.

For each data series, provide:
- series_name: the label of the series
- data_points: array of {{label, value}} objects

Also extract:
- chart_title: the title of the chart
- x_axis_label: the x-axis label
- y_axis_label: the y-axis label

Return as JSON. Estimate numeric values from the chart's axis scale
as precisely as possible.""",
                },
            ],
        }],
    )

    return json.loads(response.content[0].text)
```

## Full-Page Scraping with Scrolling

Real-world scraping often requires scrolling through a page to capture all content. Here is a complete scraper that handles pagination through scrolling:

```python
from playwright.async_api import async_playwright
import asyncio
import base64

class VisualScraper:
    def __init__(self):
        self.client = anthropic.Anthropic()
        self.all_data = []

    async def scrape_full_page(self, url: str, extraction_prompt: str) -> list:
        async with async_playwright() as p:
            browser = await p.chromium.launch()
            page = await browser.new_page(viewport={"width": 1280, "height": 800})
            await page.goto(url, wait_until="networkidle")

            prev_screenshot = None
            scroll_count = 0
            max_scrolls = 20

            while scroll_count  list:
        response = self.client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=4096,
            messages=[{
                "role": "user",
                "content": [
                    {"type": "image", "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": screenshot_b64,
                    }},
                    {"type": "text", "text": prompt + "\nReturn as JSON array."},
                ],
            }],
        )
        try:
            return json.loads(response.content[0].text)
        except json.JSONDecodeError:
            return []

    def _deduplicate(self, items: list) -> list:
        seen = set()
        unique = []
        for item in items:
            key = json.dumps(item, sort_keys=True)
            if key not in seen:
                seen.add(key)
                unique.append(item)
        return unique
```

## Handling Complex Layouts

Some pages have data spread across cards, tiles, or non-tabular layouts. Claude handles these naturally:

```python
extraction_prompt = """Extract all product listings visible on this page.
For each product, return:
- name: product name
- price: price as shown (include currency symbol)
- rating: star rating if visible
- review_count: number of reviews if shown
- availability: in stock or out of stock
- image_description: brief description of the product image

If any field is not visible for a product, use null."""

scraper = VisualScraper()
products = asyncio.run(
    scraper.scrape_full_page(
        "https://example.com/products",
        extraction_prompt
    )
)
```

The key advantage here is that Claude understands layout semantics. It knows that a price displayed below a product name belongs to that product, even if the HTML structure groups them in unexpected ways.

## Accuracy Considerations

Vision-based extraction is not pixel-perfect for numeric values read from charts. Claude estimates values based on axis scales and visual position. For bar charts, expect accuracy within 2-5% of the actual value. For precise numeric extraction from tables, accuracy is typically above 99% since Claude reads the actual rendered text.

Always validate extracted data against known reference points when possible. For critical applications, extract the same data multiple times and compare results, flagging any discrepancies for human review.

## FAQ

### How does vision-based scraping handle anti-bot protection?

Since Claude works from screenshots rather than making HTTP requests, it is invisible to server-side anti-bot systems. The browser session itself still needs to avoid detection, but the extraction step happens entirely on the client side through image analysis.

### Can Claude extract data from screenshots of mobile layouts?

Yes. Set your browser viewport to a mobile resolution (e.g., 375x812 for iPhone) and Claude will interpret the mobile layout correctly. It understands responsive design patterns like hamburger menus, stacked cards, and collapsible sections.

### What is the cost of scraping a 20-page website?

With one screenshot per viewport and an average of 3-5 scrolls per page, that is roughly 60-100 API calls. At Claude Sonnet pricing with image inputs, expect approximately $1-3 for the full scrape. This is significantly more expensive than HTML parsing, so reserve vision-based scraping for pages where traditional methods fail.

---

#ClaudeWebScraper #VisionAI #DataExtraction #WebScraping #StructuredOutput #AIDataParsing #ComputerUse

---

Source: https://callsphere.ai/blog/building-claude-web-scraper-extracting-data-vision-not-selectors