Building a Claude Web Scraper: Extracting Data Using Vision Instead of Selectors

Why Vision-Based Scraping?

Traditional web scraping with BeautifulSoup or Scrapy relies on parsing HTML and navigating the DOM tree. This works well for simple, well-structured pages. But the modern web is full of content that lives outside the DOM in a straightforward way: data rendered in canvas elements, charts built with D3 or Chart.js, information embedded in images, PDF viewers rendered in the browser, and dynamically loaded content hidden behind JavaScript frameworks.

Claude's vision capability lets you skip all of that complexity. Instead of parsing HTML, you take a screenshot and ask Claude to read what it sees. The data extraction happens at the visual level, making it resilient to DOM changes, anti-scraping measures, and complex rendering pipelines.

Basic Visual Extraction

The simplest form of visual scraping sends a screenshot to Claude with structured output instructions:

flowchart LR
    GOAL(["High level goal"])
    PLAN["Planner LLM"]
    SCREEN["Screen capture<br/>every step"]
    VLM["Vision LLM<br/>reads UI state"]
    ACT{"Action type"}
    CLICK["Click coordinate"]
    TYPE["Type text"]
    KEY["Keyboard shortcut"]
    GUARD["Safety filter<br/>allow lists"]
    OS[("OS sandbox<br/>ephemeral VM")]
    DONE(["Goal verified"])
    GOAL --> PLAN --> SCREEN --> VLM --> ACT
    ACT --> CLICK --> GUARD
    ACT --> TYPE --> GUARD
    ACT --> KEY --> GUARD
    GUARD --> OS --> SCREEN
    OS --> DONE
    style PLAN fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style DONE fill:#059669,stroke:#047857,color:#fff

import anthropic
import json

client = anthropic.Anthropic()

def extract_table_data(screenshot_b64: str, description: str) -> list[dict]:
    """Extract tabular data from a screenshot using Claude vision."""
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=4096,
        messages=[{
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": screenshot_b64,
                    },
                },
                {
                    "type": "text",
                    "text": f"""Extract all data from the table visible in this screenshot.

Context: {description}

Return the data as a JSON array of objects where each object represents
a row and the keys are the column headers. Use exact values as shown.
Return ONLY valid JSON, no other text.""",
                },
            ],
        }],
    )

    return json.loads(response.content[0].text)

This function handles any visible table — HTML tables, tables rendered inside canvas, tables in embedded PDFs, even tables in images. Claude reads the visual content and returns structured JSON.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

Extracting Data from Charts

Charts are a prime use case for vision-based scraping because the data in a chart is rendered as pixels, not accessible DOM elements. Claude can read bar charts, line charts, pie charts, and more:

def extract_chart_data(screenshot_b64: str, chart_type: str) -> dict:
    """Extract data points from a chart in a screenshot."""
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=4096,
        messages=[{
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": screenshot_b64,
                    },
                },
                {
                    "type": "text",
                    "text": f"""Analyze this {chart_type} chart and extract all data points.

For each data series, provide:
- series_name: the label of the series
- data_points: array of {{label, value}} objects

Also extract:
- chart_title: the title of the chart
- x_axis_label: the x-axis label
- y_axis_label: the y-axis label

Return as JSON. Estimate numeric values from the chart's axis scale
as precisely as possible.""",
                },
            ],
        }],
    )

    return json.loads(response.content[0].text)

Full-Page Scraping with Scrolling

Real-world scraping often requires scrolling through a page to capture all content. Here is a complete scraper that handles pagination through scrolling:

from playwright.async_api import async_playwright
import asyncio
import base64

class VisualScraper:
    def __init__(self):
        self.client = anthropic.Anthropic()
        self.all_data = []

    async def scrape_full_page(self, url: str, extraction_prompt: str) -> list:
        async with async_playwright() as p:
            browser = await p.chromium.launch()
            page = await browser.new_page(viewport={"width": 1280, "height": 800})
            await page.goto(url, wait_until="networkidle")

            prev_screenshot = None
            scroll_count = 0
            max_scrolls = 20

            while scroll_count < max_scrolls:
                screenshot = await page.screenshot()
                screenshot_b64 = base64.standard_b64encode(screenshot).decode()

                # Check if page content has changed after scroll
                if screenshot_b64 == prev_screenshot:
                    break  # Reached bottom of page

                prev_screenshot = screenshot_b64

                # Extract data from current viewport
                data = await self._extract(screenshot_b64, extraction_prompt)
                self.all_data.extend(data)

                # Scroll down
                await page.mouse.wheel(0, 600)
                await asyncio.sleep(1)
                scroll_count += 1

            await browser.close()

        return self._deduplicate(self.all_data)

    async def _extract(self, screenshot_b64: str, prompt: str) -> list:
        response = self.client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=4096,
            messages=[{
                "role": "user",
                "content": [
                    {"type": "image", "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": screenshot_b64,
                    }},
                    {"type": "text", "text": prompt + "\nReturn as JSON array."},
                ],
            }],
        )
        try:
            return json.loads(response.content[0].text)
        except json.JSONDecodeError:
            return []

    def _deduplicate(self, items: list) -> list:
        seen = set()
        unique = []
        for item in items:
            key = json.dumps(item, sort_keys=True)
            if key not in seen:
                seen.add(key)
                unique.append(item)
        return unique

Handling Complex Layouts

Some pages have data spread across cards, tiles, or non-tabular layouts. Claude handles these naturally:

extraction_prompt = """Extract all product listings visible on this page.
For each product, return:
- name: product name
- price: price as shown (include currency symbol)
- rating: star rating if visible
- review_count: number of reviews if shown
- availability: in stock or out of stock
- image_description: brief description of the product image

If any field is not visible for a product, use null."""

scraper = VisualScraper()
products = asyncio.run(
    scraper.scrape_full_page(
        "https://example.com/products",
        extraction_prompt
    )
)

The key advantage here is that Claude understands layout semantics. It knows that a price displayed below a product name belongs to that product, even if the HTML structure groups them in unexpected ways.

Accuracy Considerations

Vision-based extraction is not pixel-perfect for numeric values read from charts. Claude estimates values based on axis scales and visual position. For bar charts, expect accuracy within 2-5% of the actual value. For precise numeric extraction from tables, accuracy is typically above 99% since Claude reads the actual rendered text.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Always validate extracted data against known reference points when possible. For critical applications, extract the same data multiple times and compare results, flagging any discrepancies for human review.

FAQ

How does vision-based scraping handle anti-bot protection?

Since Claude works from screenshots rather than making HTTP requests, it is invisible to server-side anti-bot systems. The browser session itself still needs to avoid detection, but the extraction step happens entirely on the client side through image analysis.

Can Claude extract data from screenshots of mobile layouts?

Yes. Set your browser viewport to a mobile resolution (e.g., 375x812 for iPhone) and Claude will interpret the mobile layout correctly. It understands responsive design patterns like hamburger menus, stacked cards, and collapsible sections.

What is the cost of scraping a 20-page website?

With one screenshot per viewport and an average of 3-5 scrolls per page, that is roughly 60-100 API calls. At Claude Sonnet pricing with image inputs, expect approximately $1-3 for the full scrape. This is significantly more expensive than HTML parsing, so reserve vision-based scraping for pages where traditional methods fail.

#ClaudeWebScraper #VisionAI #DataExtraction #WebScraping #StructuredOutput #AIDataParsing #ComputerUse

Building a Claude Web Scraper: Extracting Data Using Vision Instead of Selectors

Why Vision-Based Scraping?

Basic Visual Extraction

Extracting Data from Charts

Full-Page Scraping with Scrolling

Handling Complex Layouts

Accuracy Considerations

FAQ

How does vision-based scraping handle anti-bot protection?

Can Claude extract data from screenshots of mobile layouts?

What is the cost of scraping a 20-page website?

Try CallSphere AI Voice Agents

Related Articles You May Like

How to Use Multiple Chat AIs at Once (and Why You Might)

Desktop AI Agents in 2026: Project Arc, Claude Cowork, OpenAI Agents Compared

Anthropic and Moody's Data Partnership: Why Grounding Matters in Finance

Anthropic Microsoft 365 Integration: What Changes for Office Knowledge Workers

OpenAI Computer-Use Agents (CUA) in Production: Build + Evaluate a Real Workflow (2026)

Raleigh Startups Building on the Claude Agent SDK