Skip to content
Learn Agentic AI
Learn Agentic AI12 min read4 views

Claude Computer Use for Form Automation: Auto-Filling Complex Multi-Step Forms

Build a Claude-powered form automation agent that detects fields, maps data intelligently, handles validation errors, and navigates multi-step form wizards — all through visual understanding instead of DOM selectors.

The Form Automation Challenge

Automating form filling sounds simple until you encounter real-world forms. Government applications with 50+ fields across 10 pages. Insurance claim forms with conditional sections that appear based on previous answers. Healthcare intake forms with dropdown menus that load dynamically. CRM data entry screens with custom field types.

Traditional automation with Playwright or Selenium handles forms by targeting specific selectors — page.fill("#firstName", "John"). This works until the form changes its field IDs, switches from a text input to a dropdown, or adds a new required field. Claude Computer Use takes a fundamentally different approach: it looks at the form, reads the labels, and fills in the appropriate values.

Form Field Detection and Mapping

The first step is to have Claude analyze the form and create a mapping between your data and the visible fields:

flowchart TD
    START["Claude Computer Use for Form Automation: Auto-Fil…"] --> A
    A["The Form Automation Challenge"]
    A --> B
    B["Form Field Detection and Mapping"]
    B --> C
    C["The Form-Filling Agent"]
    C --> D
    D["Handling Dropdown Menus"]
    D --> E
    E["Multi-Step Form Navigation"]
    E --> F
    F["Validation Error Handling"]
    F --> G
    G["FAQ"]
    G --> DONE["Key Takeaways"]
    style START fill:#4f46e5,stroke:#4338ca,color:#fff
    style DONE fill:#059669,stroke:#047857,color:#fff
import anthropic
import json

client = anthropic.Anthropic()

def analyze_form(screenshot_b64: str) -> list[dict]:
    """Detect all form fields visible in the screenshot."""
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=4096,
        messages=[{
            "role": "user",
            "content": [
                {"type": "image", "source": {
                    "type": "base64",
                    "media_type": "image/png",
                    "data": screenshot_b64,
                }},
                {"type": "text", "text": """Analyze this form and list every input field visible.

For each field, return:
- label: the field's label text
- field_type: text, dropdown, checkbox, radio, date, textarea, file_upload
- required: true if marked as required (asterisk or "required" label)
- current_value: any pre-filled value, or null
- options: for dropdowns/radios, list the visible options if any
- approximate_position: {x, y} coordinates of the center of the input

Return as a JSON array."""},
            ],
        }],
    )
    return json.loads(response.content[0].text)

The Form-Filling Agent

With field detection in place, we build an agent that maps your data to detected fields and fills them in sequence:

class FormFillingAgent:
    def __init__(self, browser_manager):
        self.browser = browser_manager
        self.client = anthropic.Anthropic()

    async def fill_form(self, form_data: dict, context: str = ""):
        """Fill a form using Claude vision to identify and interact with fields."""
        screenshot_b64 = await self.browser.screenshot()

        # Step 1: Create a filling plan
        plan = self._create_plan(screenshot_b64, form_data, context)

        # Step 2: Execute each field fill
        for field in plan:
            await self._fill_field(field)
            # Brief pause for UI updates
            import asyncio
            await asyncio.sleep(0.5)

        # Step 3: Verify filled values
        verification = await self._verify_form(form_data)
        return verification

    def _create_plan(self, screenshot_b64: str, form_data: dict, context: str) -> list:
        response = self.client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=4096,
            messages=[{
                "role": "user",
                "content": [
                    {"type": "image", "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": screenshot_b64,
                    }},
                    {"type": "text", "text": f"""I need to fill this form with the following data:

{json.dumps(form_data, indent=2)}

Context: {context}

Create a step-by-step plan to fill each field. For each step:
- field_label: which field to fill
- data_key: which key from my data maps to this field
- action: click, type, select_dropdown, check_checkbox, select_radio
- coordinate: approximate {{x, y}} of the input element
- value: the value to enter

Order the steps top-to-bottom, left-to-right as fields appear on screen.
Return as a JSON array."""},
                ],
            }],
        )
        return json.loads(response.content[0].text)

    async def _fill_field(self, field: dict):
        """Fill a single field based on the plan."""
        x = field["coordinate"]["x"]
        y = field["coordinate"]["y"]
        action = field["action"]
        value = str(field["value"])

        if action == "type":
            await self.browser.click(x, y)
            import asyncio
            await asyncio.sleep(0.3)
            # Clear existing content
            await self.browser.press_key("Control+a")
            await self.browser.type_text(value)

        elif action == "select_dropdown":
            await self.browser.click(x, y)
            import asyncio
            await asyncio.sleep(0.5)
            # Use Claude to find and click the right option
            await self._select_option_visually(value)

        elif action == "check_checkbox":
            await self.browser.click(x, y)

        elif action == "select_radio":
            await self.browser.click(x, y)

Handling Dropdown Menus

Dropdowns are notoriously difficult for visual automation because clicking them reveals a new set of options that must be located and clicked. Here is a robust approach:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

    async def _select_option_visually(self, target_value: str):
        """After opening a dropdown, find and click the target option."""
        import asyncio
        await asyncio.sleep(0.5)  # Wait for dropdown to open
        screenshot_b64 = await self.browser.screenshot()

        response = self.client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1024,
            messages=[{
                "role": "user",
                "content": [
                    {"type": "image", "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": screenshot_b64,
                    }},
                    {"type": "text", "text": f"""A dropdown menu is open on screen.
Find the option that best matches: "{target_value}"
Return the exact coordinate to click as JSON: {{"x": number, "y": number}}
If the option is not visible, return {{"scroll": "down"}} to indicate
I need to scroll within the dropdown."""},
                ],
            }],
        )

        result = json.loads(response.content[0].text)
        if "scroll" in result:
            await self.browser.scroll(result.get("x", 640), result.get("y", 400), "down")
            await self._select_option_visually(target_value)  # Retry
        else:
            await self.browser.click(result["x"], result["y"])

Multi-Step Form Navigation

Many forms span multiple pages. The agent needs to handle "Next" buttons, progress indicators, and conditional sections:

    async def fill_multi_step_form(self, all_data: dict, max_pages: int = 10):
        """Fill a multi-page form wizard."""
        for page_num in range(max_pages):
            screenshot_b64 = await self.browser.screenshot()

            # Analyze current page
            page_info = self._analyze_page(screenshot_b64)

            if page_info.get("is_confirmation_page"):
                return {"status": "complete", "page": page_num + 1}

            # Fill visible fields on this page
            await self.fill_form(all_data, context=f"Page {page_num + 1} of the form")

            # Check for validation errors before proceeding
            validation = await self._check_validation(screenshot_b64)
            if validation.get("has_errors"):
                await self._fix_validation_errors(validation["errors"])

            # Click Next/Continue button
            await self._click_next_button()
            import asyncio
            await asyncio.sleep(1)

        return {"status": "max_pages_reached"}

Validation Error Handling

After filling fields and before clicking "Next," the agent should check for validation errors:

    async def _check_validation(self, screenshot_b64: str = None) -> dict:
        if not screenshot_b64:
            screenshot_b64 = await self.browser.screenshot()

        response = self.client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1024,
            messages=[{
                "role": "user",
                "content": [
                    {"type": "image", "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": screenshot_b64,
                    }},
                    {"type": "text", "text": """Check this form for validation errors.
Look for: red borders, error messages, warning icons, tooltips.
Return JSON: {"has_errors": bool, "errors": [{"field": str, "message": str}]}"""},
                ],
            }],
        )
        return json.loads(response.content[0].text)

FAQ

How does Claude handle date picker widgets?

Claude can interact with date pickers visually — clicking the calendar icon, navigating months, and selecting dates. For complex date pickers, it often works better to click the text input first, clear it, and type the date in the expected format (MM/DD/YYYY, etc.) rather than navigating the calendar widget.

Can Claude handle file upload fields?

Claude can identify file upload fields and click the "Choose File" button, but it cannot interact with the operating system's file dialog. For file uploads, use a hybrid approach: let Claude identify the upload field, then use Playwright's set_input_files() method to attach the file programmatically.

What about CAPTCHA or anti-automation fields on forms?

Claude can visually interpret some CAPTCHA types, but bypassing them is restricted by most websites' terms of service and Anthropic's usage policies. For legitimate automation of your own forms, disable CAPTCHA in development/staging environments or use authenticated sessions that skip the challenge.


#FormAutomation #ClaudeComputerUse #RPA #DataEntry #BrowserAutomation #AIFormFilling #AutomatedForms

Share
C

Written by

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

Learn Agentic AI

AI Agents vs Traditional Automation: When RPA Falls Short and Agents Excel

Technical comparison of RPA and AI agents covering rule-based vs reasoning architectures, when to use each, migration strategies, and hybrid automation approaches.

Learn Agentic AI

Computer Use in GPT-5.4: Building AI Agents That Navigate Desktop Applications

Technical guide to GPT-5.4's computer use capabilities for building AI agents that interact with desktop UIs, browser automation, and real-world application workflows.

Learn Agentic AI

How to Build an AI Coding Assistant with Claude and MCP: Step-by-Step Guide

Build a powerful AI coding assistant that reads files, runs tests, and fixes bugs using the Claude API and Model Context Protocol servers in TypeScript.

Learn Agentic AI

Building Your First MCP Server: Connect AI Agents to Any External Tool

Step-by-step tutorial on building an MCP server in TypeScript, registering tools and resources, handling requests, and connecting to Claude and other LLM clients.

Learn Agentic AI

Computer Use Agents 2026: How Claude, GPT-5.4, and Gemini Navigate Desktop Applications

Comparison of computer use capabilities across Claude, GPT-5.4, and Gemini including accuracy benchmarks, speed tests, supported applications, and real-world limitations.

Learn Agentic AI

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Explore the leading web agent benchmarks including WebArena, MiniWoB++, and Mind2Web. Learn how evaluation methodology, success metrics, and reproducible environments drive progress in autonomous browser agents.