Claude Computer Use for Form Automation: Auto-Filling Complex Multi-Step Forms

The Form Automation Challenge

Automating form filling sounds simple until you encounter real-world forms. Government applications with 50+ fields across 10 pages. Insurance claim forms with conditional sections that appear based on previous answers. Healthcare intake forms with dropdown menus that load dynamically. CRM data entry screens with custom field types.

Traditional automation with Playwright or Selenium handles forms by targeting specific selectors — page.fill("#firstName", "John"). This works until the form changes its field IDs, switches from a text input to a dropdown, or adds a new required field. Claude Computer Use takes a fundamentally different approach: it looks at the form, reads the labels, and fills in the appropriate values.

Form Field Detection and Mapping

The first step is to have Claude analyze the form and create a mapping between your data and the visible fields:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

flowchart LR
    GOAL(["High level goal"])
    PLAN["Planner LLM"]
    SCREEN["Screen capture<br/>every step"]
    VLM["Vision LLM<br/>reads UI state"]
    ACT{"Action type"}
    CLICK["Click coordinate"]
    TYPE["Type text"]
    KEY["Keyboard shortcut"]
    GUARD["Safety filter<br/>allow lists"]
    OS[("OS sandbox<br/>ephemeral VM")]
    DONE(["Goal verified"])
    GOAL --> PLAN --> SCREEN --> VLM --> ACT
    ACT --> CLICK --> GUARD
    ACT --> TYPE --> GUARD
    ACT --> KEY --> GUARD
    GUARD --> OS --> SCREEN
    OS --> DONE
    style PLAN fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style DONE fill:#059669,stroke:#047857,color:#fff

import anthropic
import json

client = anthropic.Anthropic()

def analyze_form(screenshot_b64: str) -> list[dict]:
    """Detect all form fields visible in the screenshot."""
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=4096,
        messages=[{
            "role": "user",
            "content": [
                {"type": "image", "source": {
                    "type": "base64",
                    "media_type": "image/png",
                    "data": screenshot_b64,
                }},
                {"type": "text", "text": """Analyze this form and list every input field visible.

For each field, return:
- label: the field's label text
- field_type: text, dropdown, checkbox, radio, date, textarea, file_upload
- required: true if marked as required (asterisk or "required" label)
- current_value: any pre-filled value, or null
- options: for dropdowns/radios, list the visible options if any
- approximate_position: {x, y} coordinates of the center of the input

Return as a JSON array."""},
            ],
        }],
    )
    return json.loads(response.content[0].text)

The Form-Filling Agent

With field detection in place, we build an agent that maps your data to detected fields and fills them in sequence:

class FormFillingAgent:
    def __init__(self, browser_manager):
        self.browser = browser_manager
        self.client = anthropic.Anthropic()

    async def fill_form(self, form_data: dict, context: str = ""):
        """Fill a form using Claude vision to identify and interact with fields."""
        screenshot_b64 = await self.browser.screenshot()

        # Step 1: Create a filling plan
        plan = self._create_plan(screenshot_b64, form_data, context)

        # Step 2: Execute each field fill
        for field in plan:
            await self._fill_field(field)
            # Brief pause for UI updates
            import asyncio
            await asyncio.sleep(0.5)

        # Step 3: Verify filled values
        verification = await self._verify_form(form_data)
        return verification

    def _create_plan(self, screenshot_b64: str, form_data: dict, context: str) -> list:
        response = self.client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=4096,
            messages=[{
                "role": "user",
                "content": [
                    {"type": "image", "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": screenshot_b64,
                    }},
                    {"type": "text", "text": f"""I need to fill this form with the following data:

{json.dumps(form_data, indent=2)}

Context: {context}

Create a step-by-step plan to fill each field. For each step:
- field_label: which field to fill
- data_key: which key from my data maps to this field
- action: click, type, select_dropdown, check_checkbox, select_radio
- coordinate: approximate {{x, y}} of the input element
- value: the value to enter

Order the steps top-to-bottom, left-to-right as fields appear on screen.
Return as a JSON array."""},
                ],
            }],
        )
        return json.loads(response.content[0].text)

    async def _fill_field(self, field: dict):
        """Fill a single field based on the plan."""
        x = field["coordinate"]["x"]
        y = field["coordinate"]["y"]
        action = field["action"]
        value = str(field["value"])

        if action == "type":
            await self.browser.click(x, y)
            import asyncio
            await asyncio.sleep(0.3)
            # Clear existing content
            await self.browser.press_key("Control+a")
            await self.browser.type_text(value)

        elif action == "select_dropdown":
            await self.browser.click(x, y)
            import asyncio
            await asyncio.sleep(0.5)
            # Use Claude to find and click the right option
            await self._select_option_visually(value)

        elif action == "check_checkbox":
            await self.browser.click(x, y)

        elif action == "select_radio":
            await self.browser.click(x, y)

Dropdowns are notoriously difficult for visual automation because clicking them reveals a new set of options that must be located and clicked. Here is a robust approach:

    async def _select_option_visually(self, target_value: str):
        """After opening a dropdown, find and click the target option."""
        import asyncio
        await asyncio.sleep(0.5)  # Wait for dropdown to open
        screenshot_b64 = await self.browser.screenshot()

        response = self.client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1024,
            messages=[{
                "role": "user",
                "content": [
                    {"type": "image", "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": screenshot_b64,
                    }},
                    {"type": "text", "text": f"""A dropdown menu is open on screen.
Find the option that best matches: "{target_value}"
Return the exact coordinate to click as JSON: {{"x": number, "y": number}}
If the option is not visible, return {{"scroll": "down"}} to indicate
I need to scroll within the dropdown."""},
                ],
            }],
        )

        result = json.loads(response.content[0].text)
        if "scroll" in result:
            await self.browser.scroll(result.get("x", 640), result.get("y", 400), "down")
            await self._select_option_visually(target_value)  # Retry
        else:
            await self.browser.click(result["x"], result["y"])

Many forms span multiple pages. The agent needs to handle "Next" buttons, progress indicators, and conditional sections:

    async def fill_multi_step_form(self, all_data: dict, max_pages: int = 10):
        """Fill a multi-page form wizard."""
        for page_num in range(max_pages):
            screenshot_b64 = await self.browser.screenshot()

            # Analyze current page
            page_info = self._analyze_page(screenshot_b64)

            if page_info.get("is_confirmation_page"):
                return {"status": "complete", "page": page_num + 1}

            # Fill visible fields on this page
            await self.fill_form(all_data, context=f"Page {page_num + 1} of the form")

            # Check for validation errors before proceeding
            validation = await self._check_validation(screenshot_b64)
            if validation.get("has_errors"):
                await self._fix_validation_errors(validation["errors"])

            # Click Next/Continue button
            await self._click_next_button()
            import asyncio
            await asyncio.sleep(1)

        return {"status": "max_pages_reached"}

Validation Error Handling

After filling fields and before clicking "Next," the agent should check for validation errors:

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

    async def _check_validation(self, screenshot_b64: str = None) -> dict:
        if not screenshot_b64:
            screenshot_b64 = await self.browser.screenshot()

        response = self.client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1024,
            messages=[{
                "role": "user",
                "content": [
                    {"type": "image", "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": screenshot_b64,
                    }},
                    {"type": "text", "text": """Check this form for validation errors.
Look for: red borders, error messages, warning icons, tooltips.
Return JSON: {"has_errors": bool, "errors": [{"field": str, "message": str}]}"""},
                ],
            }],
        )
        return json.loads(response.content[0].text)

FAQ

How does Claude handle date picker widgets?

Claude can interact with date pickers visually — clicking the calendar icon, navigating months, and selecting dates. For complex date pickers, it often works better to click the text input first, clear it, and type the date in the expected format (MM/DD/YYYY, etc.) rather than navigating the calendar widget.

Can Claude handle file upload fields?

Claude can identify file upload fields and click the "Choose File" button, but it cannot interact with the operating system's file dialog. For file uploads, use a hybrid approach: let Claude identify the upload field, then use Playwright's set_input_files() method to attach the file programmatically.

What about CAPTCHA or anti-automation fields on forms?

Claude can visually interpret some CAPTCHA types, but bypassing them is restricted by most websites' terms of service and Anthropic's usage policies. For legitimate automation of your own forms, disable CAPTCHA in development/staging environments or use authenticated sessions that skip the challenge.

#FormAutomation #ClaudeComputerUse #RPA #DataEntry #BrowserAutomation #AIFormFilling #AutomatedForms

Claude Computer Use for Form Automation: Auto-Filling Complex Multi-Step Forms

The Form Automation Challenge

Form Field Detection and Mapping

The Form-Filling Agent

Multi-Step Form Navigation

Validation Error Handling

FAQ

How does Claude handle date picker widgets?

Can Claude handle file upload fields?

What about CAPTCHA or anti-automation fields on forms?

Try CallSphere AI Voice Agents

Related Articles You May Like

AI Business Process Automation: A Founder's 2026 Playbook

How to Use Multiple Chat AIs at Once (and Why You Might)

Desktop AI Agents in 2026: Project Arc, Claude Cowork, OpenAI Agents Compared

Anthropic and Moody's Data Partnership: Why Grounding Matters in Finance

Anthropic Microsoft 365 Integration: What Changes for Office Knowledge Workers

OpenAI Computer-Use Agents (CUA) in Production: Build + Evaluate a Real Workflow (2026)

The Form Automation Challenge

Form Field Detection and Mapping

The Form-Filling Agent

Handling Dropdown Menus

Multi-Step Form Navigation

Validation Error Handling

FAQ

How does Claude handle date picker widgets?

Can Claude handle file upload fields?

What about CAPTCHA or anti-automation fields on forms?

Try CallSphere AI Voice Agents

Related Articles You May Like

AI Business Process Automation: A Founder's 2026 Playbook

How to Use Multiple Chat AIs at Once (and Why You Might)

Desktop AI Agents in 2026: Project Arc, Claude Cowork, OpenAI Agents Compared

Anthropic and Moody's Data Partnership: Why Grounding Matters in Finance

Anthropic Microsoft 365 Integration: What Changes for Office Knowledge Workers

OpenAI Computer-Use Agents (CUA) in Production: Build + Evaluate a Real Workflow (2026)