Skip to content
Learn Agentic AI
Learn Agentic AI11 min read7 views

UFO Action Types: Click, Type, Scroll, and Application-Specific Controls

Comprehensive guide to every action type UFO can perform — from basic clicks and keyboard input to scroll operations, UIA element interactions, and application-specific control manipulation.

The Action Space

Every step UFO takes involves selecting and executing an action from a defined set. Understanding these actions is essential for debugging UFO behavior, extending its capabilities, and knowing what tasks it can and cannot handle.

UFO's action space is divided into universal actions that work across all applications and application-specific actions that leverage unique control types in particular apps.

Universal Actions

Click Actions

The most fundamental action. UFO identifies a numbered UI element from its annotated screenshot and clicks it:

flowchart TD
    START["UFO Action Types: Click, Type, Scroll, and Applic…"] --> A
    A["The Action Space"]
    A --> B
    B["Universal Actions"]
    B --> C
    C["Application-Specific Control Types"]
    C --> D
    D["The Action Selection Prompt"]
    D --> E
    E["FAQ"]
    E --> DONE["Key Takeaways"]
    style START fill:#4f46e5,stroke:#4338ca,color:#fff
    style DONE fill:#059669,stroke:#047857,color:#fff
# UFO action representation for click
action = {
    "action_type": "click",
    "control_label": 7,          # The numbered label on the annotated screenshot
    "control_text": "Save",       # Human-readable description
    "parameters": {
        "button": "left",         # left, right, or middle
        "double_click": False,    # True for double-click
    }
}

# Under the hood, UFO translates this to pywinauto calls
def execute_click(control, params):
    """Execute a click action on a UIA control."""
    element = find_control_by_label(control["control_label"])

    if params.get("double_click"):
        element.double_click_input()
    elif params.get("button") == "right":
        element.click_input(button="right")
    else:
        element.click_input()

UFO supports left-click, right-click, and double-click. Right-click is used for context menus, and double-click for opening files or editing cells.

Type / Input Text

After clicking on a text field or editor, UFO types text into it:

action = {
    "action_type": "set_text",
    "control_label": 12,
    "parameters": {
        "text": "Quarterly Sales Report - Q1 2026",
        "clear_first": True,    # Clear existing text before typing
    }
}

def execute_set_text(control, params):
    """Type text into a control."""
    element = find_control_by_label(control["control_label"])

    if params.get("clear_first"):
        element.set_edit_text("")

    element.type_keys(params["text"], with_spaces=True)

The set_text action uses the UIA ValuePattern when available (faster, more reliable) and falls back to keyboard simulation when the control does not support direct value setting.

Keyboard Shortcuts

Many Windows tasks are faster with keyboard shortcuts than mouse clicks:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

action = {
    "action_type": "keyboard",
    "parameters": {
        "keys": "{Ctrl}s",       # pywinauto key format
        "description": "Save the current document"
    }
}

# Common keyboard patterns UFO uses
COMMON_SHORTCUTS = {
    "save": "{Ctrl}s",
    "copy": "{Ctrl}c",
    "paste": "{Ctrl}v",
    "undo": "{Ctrl}z",
    "select_all": "{Ctrl}a",
    "find": "{Ctrl}f",
    "new": "{Ctrl}n",
    "close_tab": "{Ctrl}w",
    "switch_app": "{Alt}{Tab}",
}

def execute_keyboard(params):
    """Send keyboard shortcuts to the active window."""
    from pywinauto.keyboard import send_keys
    send_keys(params["keys"])

Scroll Actions

For content that extends beyond the visible area:

action = {
    "action_type": "scroll",
    "control_label": 3,
    "parameters": {
        "direction": "down",     # up, down, left, right
        "amount": 5,             # Number of scroll units
    }
}

def execute_scroll(control, params):
    """Scroll within a control."""
    element = find_control_by_label(control["control_label"])
    direction = params["direction"]
    amount = params["amount"]

    if direction == "down":
        element.scroll("down", "page", amount)
    elif direction == "up":
        element.scroll("up", "page", amount)

Application-Specific Control Types

Windows applications expose different control types through the UI Automation framework. UFO recognizes and interacts with all standard UIA control types:

flowchart TD
    ROOT["UFO Action Types: Click, Type, Scroll, and A…"] 
    ROOT --> P0["Universal Actions"]
    P0 --> P0C0["Click Actions"]
    P0 --> P0C1["Type / Input Text"]
    P0 --> P0C2["Keyboard Shortcuts"]
    P0 --> P0C3["Scroll Actions"]
    ROOT --> P1["Application-Specific Control Types"]
    P1 --> P1C0["Excel-Specific Actions"]
    P1 --> P1C1["Outlook-Specific Actions"]
    ROOT --> P2["FAQ"]
    P2 --> P2C0["Can UFO interact with custom-drawn cont…"]
    P2 --> P2C1["How does UFO handle pop-up dialogs and …"]
    style ROOT fill:#4f46e5,stroke:#4338ca,color:#fff
    style P0 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
    style P1 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
    style P2 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
# UIA Control Types that UFO can interact with
UIA_CONTROL_TYPES = {
    "Button": "click",           # Standard buttons
    "CheckBox": "toggle",        # Check/uncheck
    "ComboBox": "select",        # Dropdown selection
    "DataGrid": "cell_select",   # Table/grid navigation
    "Edit": "set_text",          # Text input fields
    "Hyperlink": "click",        # Clickable links
    "ListItem": "click",         # Items in a list
    "Menu": "click",             # Menu items
    "MenuItem": "click",         # Sub-menu items
    "RadioButton": "select",     # Radio button selection
    "Slider": "set_value",       # Slider controls
    "Spinner": "set_value",      # Numeric up/down
    "Tab": "click",              # Tab switching
    "Text": "read",              # Static text (read-only)
    "Tree": "expand_collapse",   # Tree view navigation
    "TreeItem": "click",         # Tree node selection
}

Excel-Specific Actions

Excel cells support unique patterns like range selection and formula entry:

# Excel cell interaction
excel_actions = {
    "action_type": "excel_cell",
    "parameters": {
        "cell": "B5",
        "value": "=SUM(B2:B4)",
        "action": "set_formula"
    }
}

# When UFO detects Excel, it can use COM automation
def excel_set_cell(cell_ref: str, value: str):
    """Set an Excel cell value using the UIA pattern."""
    # UFO navigates to the Name Box, types the cell reference,
    # presses Enter to navigate, then types the value
    steps = [
        {"action": "click", "target": "Name Box"},
        {"action": "set_text", "text": cell_ref},
        {"action": "keyboard", "keys": "{Enter}"},
        {"action": "set_text", "text": value},
        {"action": "keyboard", "keys": "{Enter}"},
    ]
    return steps

Outlook-Specific Actions

Email composition involves interacting with rich text editors and address fields:

# Composing an email through UFO actions
outlook_compose_steps = [
    {"action": "click", "target": "New Email"},
    {"action": "click", "target": "To field"},
    {"action": "set_text", "text": "[email protected]"},
    {"action": "keyboard", "keys": "{Tab}"},     # Move to CC
    {"action": "keyboard", "keys": "{Tab}"},     # Move to Subject
    {"action": "set_text", "text": "Q1 Sales Report"},
    {"action": "keyboard", "keys": "{Tab}"},     # Move to body
    {"action": "set_text", "text": "Please find the Q1 numbers attached."},
    {"action": "click", "target": "Send"},
]

The Action Selection Prompt

UFO sends the vision model a structured prompt that includes the available actions. The model must choose from this constrained set:

ACTION_PROMPT = """You are a Windows UI automation agent. Based on the
annotated screenshot, select the next action.

Available actions:
- click(label): Click on the UI element with the given label number
- set_text(label, text): Type text into the labeled control
- keyboard(keys): Send keyboard shortcut
- scroll(label, direction, amount): Scroll within a control
- finish(status): Mark task as complete or failed

Respond in JSON format:
{
  "thought": "What I observe and why I chose this action",
  "action_type": "click|set_text|keyboard|scroll|finish",
  "control_label": 5,
  "parameters": {}
}"""

FAQ

Can UFO interact with custom-drawn controls that are not standard UIA elements?

Custom-drawn controls without UIA support are UFO's biggest challenge. In these cases, UFO falls back to coordinate-based clicking using the vision model's understanding of the screenshot. This is less reliable but often works for simple buttons and text areas rendered without standard controls.

How does UFO handle pop-up dialogs and confirmation boxes?

UFO's observation-action loop naturally handles unexpected dialogs. When a dialog appears, the next screenshot capture will show it, and the vision model will recognize it as a dialog requiring interaction (clicking OK, Cancel, or filling in fields) before continuing with the main task.


#UFOActions #UIAutomation #WindowsControls #ClickAutomation #KeyboardShortcuts #DesktopAI #PythonAutomation #pywinauto

Share
C

Written by

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

Learn Agentic AI

UFO's Visual Understanding: How GPT-4V Interprets Windows Application Screenshots

Explore how UFO captures, annotates, and sends Windows application screenshots to GPT-4V for UI element detection, control identification, and intelligent action mapping at each automation step.

Learn Agentic AI

Building Custom UFO Tasks: Automating Excel, Word, and Outlook with Natural Language

Practical examples of automating Microsoft Office applications with UFO — from Excel data manipulation and Word document formatting to Outlook email workflows, with multi-step task descriptions and result verification.

Learn Agentic AI

Installing and Configuring Microsoft UFO: Getting Started with Windows Automation

Step-by-step guide to installing Microsoft UFO, configuring API keys, setting up the configuration files, and running your first automated Windows task with natural language.

Learn Agentic AI

UFO's Dual-Agent Architecture: How HostAgent and AppAgent Coordinate Tasks

Deep dive into Microsoft UFO's dual-agent system where HostAgent orchestrates application selection and AppAgent executes in-app UI actions, with detailed coordination flow and plan execution examples.

Learn Agentic AI

UFO vs Browser Automation: Desktop Apps That Can't Be Automated with Playwright

Understand when to use Microsoft UFO for Windows desktop automation versus browser tools like Playwright or Selenium, with use cases for legacy apps, native software, and hybrid approaches.

Learn Agentic AI

UFO Limitations and Workarounds: Handling Complex UI Patterns and Edge Cases

Understand Microsoft UFO's known limitations with complex UI controls, high-DPI displays, and time-sensitive interactions, along with practical workarounds and hybrid strategies for production reliability.