---
title: "Claude Computer Use for Form Automation: Auto-Filling Complex Multi-Step Forms"
description: "Build a Claude-powered form automation agent that detects fields, maps data intelligently, handles validation errors, and navigates multi-step form wizards — all through visual understanding instead of DOM selectors."
canonical: https://callsphere.ai/blog/claude-computer-use-form-automation-auto-filling-complex-multi-step-forms
category: "Learn Agentic AI"
tags: ["Claude", "Form Automation", "Computer Use", "Browser Automation", "Data Entry", "RPA"]
author: "CallSphere Team"
published: 2026-03-18T00:00:00.000Z
updated: 2026-05-06T01:02:45.992Z
---

# Claude Computer Use for Form Automation: Auto-Filling Complex Multi-Step Forms

> Build a Claude-powered form automation agent that detects fields, maps data intelligently, handles validation errors, and navigates multi-step form wizards — all through visual understanding instead of DOM selectors.

## The Form Automation Challenge

Automating form filling sounds simple until you encounter real-world forms. Government applications with 50+ fields across 10 pages. Insurance claim forms with conditional sections that appear based on previous answers. Healthcare intake forms with dropdown menus that load dynamically. CRM data entry screens with custom field types.

Traditional automation with Playwright or Selenium handles forms by targeting specific selectors — `page.fill("#firstName", "John")`. This works until the form changes its field IDs, switches from a text input to a dropdown, or adds a new required field. Claude Computer Use takes a fundamentally different approach: it looks at the form, reads the labels, and fills in the appropriate values.

## Form Field Detection and Mapping

The first step is to have Claude analyze the form and create a mapping between your data and the visible fields:

```mermaid
flowchart LR
    GOAL(["High level goal"])
    PLAN["Planner LLM"]
    SCREEN["Screen capture
every step"]
    VLM["Vision LLM
reads UI state"]
    ACT{"Action type"}
    CLICK["Click coordinate"]
    TYPE["Type text"]
    KEY["Keyboard shortcut"]
    GUARD["Safety filter
allow lists"]
    OS[("OS sandbox
ephemeral VM")]
    DONE(["Goal verified"])
    GOAL --> PLAN --> SCREEN --> VLM --> ACT
    ACT --> CLICK --> GUARD
    ACT --> TYPE --> GUARD
    ACT --> KEY --> GUARD
    GUARD --> OS --> SCREEN
    OS --> DONE
    style PLAN fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style DONE fill:#059669,stroke:#047857,color:#fff
```

```python
import anthropic
import json

client = anthropic.Anthropic()

def analyze_form(screenshot_b64: str) -> list[dict]:
    """Detect all form fields visible in the screenshot."""
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=4096,
        messages=[{
            "role": "user",
            "content": [
                {"type": "image", "source": {
                    "type": "base64",
                    "media_type": "image/png",
                    "data": screenshot_b64,
                }},
                {"type": "text", "text": """Analyze this form and list every input field visible.

For each field, return:
- label: the field's label text
- field_type: text, dropdown, checkbox, radio, date, textarea, file_upload
- required: true if marked as required (asterisk or "required" label)
- current_value: any pre-filled value, or null
- options: for dropdowns/radios, list the visible options if any
- approximate_position: {x, y} coordinates of the center of the input

Return as a JSON array."""},
            ],
        }],
    )
    return json.loads(response.content[0].text)
```

## The Form-Filling Agent

With field detection in place, we build an agent that maps your data to detected fields and fills them in sequence:

```python
class FormFillingAgent:
    def __init__(self, browser_manager):
        self.browser = browser_manager
        self.client = anthropic.Anthropic()

    async def fill_form(self, form_data: dict, context: str = ""):
        """Fill a form using Claude vision to identify and interact with fields."""
        screenshot_b64 = await self.browser.screenshot()

        # Step 1: Create a filling plan
        plan = self._create_plan(screenshot_b64, form_data, context)

        # Step 2: Execute each field fill
        for field in plan:
            await self._fill_field(field)
            # Brief pause for UI updates
            import asyncio
            await asyncio.sleep(0.5)

        # Step 3: Verify filled values
        verification = await self._verify_form(form_data)
        return verification

    def _create_plan(self, screenshot_b64: str, form_data: dict, context: str) -> list:
        response = self.client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=4096,
            messages=[{
                "role": "user",
                "content": [
                    {"type": "image", "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": screenshot_b64,
                    }},
                    {"type": "text", "text": f"""I need to fill this form with the following data:

{json.dumps(form_data, indent=2)}

Context: {context}

Create a step-by-step plan to fill each field. For each step:
- field_label: which field to fill
- data_key: which key from my data maps to this field
- action: click, type, select_dropdown, check_checkbox, select_radio
- coordinate: approximate {{x, y}} of the input element
- value: the value to enter

Order the steps top-to-bottom, left-to-right as fields appear on screen.
Return as a JSON array."""},
                ],
            }],
        )
        return json.loads(response.content[0].text)

    async def _fill_field(self, field: dict):
        """Fill a single field based on the plan."""
        x = field["coordinate"]["x"]
        y = field["coordinate"]["y"]
        action = field["action"]
        value = str(field["value"])

        if action == "type":
            await self.browser.click(x, y)
            import asyncio
            await asyncio.sleep(0.3)
            # Clear existing content
            await self.browser.press_key("Control+a")
            await self.browser.type_text(value)

        elif action == "select_dropdown":
            await self.browser.click(x, y)
            import asyncio
            await asyncio.sleep(0.5)
            # Use Claude to find and click the right option
            await self._select_option_visually(value)

        elif action == "check_checkbox":
            await self.browser.click(x, y)

        elif action == "select_radio":
            await self.browser.click(x, y)
```

## Handling Dropdown Menus

Dropdowns are notoriously difficult for visual automation because clicking them reveals a new set of options that must be located and clicked. Here is a robust approach:

```python
    async def _select_option_visually(self, target_value: str):
        """After opening a dropdown, find and click the target option."""
        import asyncio
        await asyncio.sleep(0.5)  # Wait for dropdown to open
        screenshot_b64 = await self.browser.screenshot()

        response = self.client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1024,
            messages=[{
                "role": "user",
                "content": [
                    {"type": "image", "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": screenshot_b64,
                    }},
                    {"type": "text", "text": f"""A dropdown menu is open on screen.
Find the option that best matches: "{target_value}"
Return the exact coordinate to click as JSON: {{"x": number, "y": number}}
If the option is not visible, return {{"scroll": "down"}} to indicate
I need to scroll within the dropdown."""},
                ],
            }],
        )

        result = json.loads(response.content[0].text)
        if "scroll" in result:
            await self.browser.scroll(result.get("x", 640), result.get("y", 400), "down")
            await self._select_option_visually(target_value)  # Retry
        else:
            await self.browser.click(result["x"], result["y"])
```

## Multi-Step Form Navigation

Many forms span multiple pages. The agent needs to handle "Next" buttons, progress indicators, and conditional sections:

```python
    async def fill_multi_step_form(self, all_data: dict, max_pages: int = 10):
        """Fill a multi-page form wizard."""
        for page_num in range(max_pages):
            screenshot_b64 = await self.browser.screenshot()

            # Analyze current page
            page_info = self._analyze_page(screenshot_b64)

            if page_info.get("is_confirmation_page"):
                return {"status": "complete", "page": page_num + 1}

            # Fill visible fields on this page
            await self.fill_form(all_data, context=f"Page {page_num + 1} of the form")

            # Check for validation errors before proceeding
            validation = await self._check_validation(screenshot_b64)
            if validation.get("has_errors"):
                await self._fix_validation_errors(validation["errors"])

            # Click Next/Continue button
            await self._click_next_button()
            import asyncio
            await asyncio.sleep(1)

        return {"status": "max_pages_reached"}
```

## Validation Error Handling

After filling fields and before clicking "Next," the agent should check for validation errors:

```python
    async def _check_validation(self, screenshot_b64: str = None) -> dict:
        if not screenshot_b64:
            screenshot_b64 = await self.browser.screenshot()

        response = self.client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1024,
            messages=[{
                "role": "user",
                "content": [
                    {"type": "image", "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": screenshot_b64,
                    }},
                    {"type": "text", "text": """Check this form for validation errors.
Look for: red borders, error messages, warning icons, tooltips.
Return JSON: {"has_errors": bool, "errors": [{"field": str, "message": str}]}"""},
                ],
            }],
        )
        return json.loads(response.content[0].text)
```

## FAQ

### How does Claude handle date picker widgets?

Claude can interact with date pickers visually — clicking the calendar icon, navigating months, and selecting dates. For complex date pickers, it often works better to click the text input first, clear it, and type the date in the expected format (MM/DD/YYYY, etc.) rather than navigating the calendar widget.

### Can Claude handle file upload fields?

Claude can identify file upload fields and click the "Choose File" button, but it cannot interact with the operating system's file dialog. For file uploads, use a hybrid approach: let Claude identify the upload field, then use Playwright's `set_input_files()` method to attach the file programmatically.

### What about CAPTCHA or anti-automation fields on forms?

Claude can visually interpret some CAPTCHA types, but bypassing them is restricted by most websites' terms of service and Anthropic's usage policies. For legitimate automation of your own forms, disable CAPTCHA in development/staging environments or use authenticated sessions that skip the challenge.

---

#FormAutomation #ClaudeComputerUse #RPA #DataEntry #BrowserAutomation #AIFormFilling #AutomatedForms

---

Source: https://callsphere.ai/blog/claude-computer-use-form-automation-auto-filling-complex-multi-step-forms
