---
title: "UFO Action Types: Click, Type, Scroll, and Application-Specific Controls"
description: "Comprehensive guide to every action type UFO can perform — from basic clicks and keyboard input to scroll operations, UIA element interactions, and application-specific control manipulation."
canonical: https://callsphere.ai/blog/ufo-action-types-click-type-scroll-application-specific-controls
category: "Learn Agentic AI"
tags: ["Microsoft UFO", "UI Actions", "UIA Controls", "Keyboard Automation", "Click Actions", "Windows Controls"]
author: "CallSphere Team"
published: 2026-03-18T00:00:00.000Z
updated: 2026-05-06T01:02:45.952Z
---

# UFO Action Types: Click, Type, Scroll, and Application-Specific Controls

> Comprehensive guide to every action type UFO can perform — from basic clicks and keyboard input to scroll operations, UIA element interactions, and application-specific control manipulation.

## The Action Space

Every step UFO takes involves selecting and executing an action from a defined set. Understanding these actions is essential for debugging UFO behavior, extending its capabilities, and knowing what tasks it can and cannot handle.

UFO's action space is divided into **universal actions** that work across all applications and **application-specific actions** that leverage unique control types in particular apps.

## Universal Actions

### Click Actions

The most fundamental action. UFO identifies a numbered UI element from its annotated screenshot and clicks it:

```mermaid
flowchart LR
    INPUT(["User intent"])
    PARSE["Parse plus
classify"]
    PLAN["Plan and tool
selection"]
    AGENT["Agent loop
LLM plus tools"]
    GUARD{"Guardrails
and policy"}
    EXEC["Execute and
verify result"]
    OBS[("Trace and metrics")]
    OUT(["Outcome plus
next action"])
    INPUT --> PARSE --> PLAN --> AGENT --> GUARD
    GUARD -->|Pass| EXEC --> OUT
    GUARD -->|Fail| AGENT
    AGENT --> OBS
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OBS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff
```

```python
# UFO action representation for click
action = {
    "action_type": "click",
    "control_label": 7,          # The numbered label on the annotated screenshot
    "control_text": "Save",       # Human-readable description
    "parameters": {
        "button": "left",         # left, right, or middle
        "double_click": False,    # True for double-click
    }
}

# Under the hood, UFO translates this to pywinauto calls
def execute_click(control, params):
    """Execute a click action on a UIA control."""
    element = find_control_by_label(control["control_label"])

    if params.get("double_click"):
        element.double_click_input()
    elif params.get("button") == "right":
        element.click_input(button="right")
    else:
        element.click_input()
```

UFO supports left-click, right-click, and double-click. Right-click is used for context menus, and double-click for opening files or editing cells.

### Type / Input Text

After clicking on a text field or editor, UFO types text into it:

```python
action = {
    "action_type": "set_text",
    "control_label": 12,
    "parameters": {
        "text": "Quarterly Sales Report - Q1 2026",
        "clear_first": True,    # Clear existing text before typing
    }
}

def execute_set_text(control, params):
    """Type text into a control."""
    element = find_control_by_label(control["control_label"])

    if params.get("clear_first"):
        element.set_edit_text("")

    element.type_keys(params["text"], with_spaces=True)
```

The `set_text` action uses the UIA `ValuePattern` when available (faster, more reliable) and falls back to keyboard simulation when the control does not support direct value setting.

### Keyboard Shortcuts

Many Windows tasks are faster with keyboard shortcuts than mouse clicks:

```python
action = {
    "action_type": "keyboard",
    "parameters": {
        "keys": "{Ctrl}s",       # pywinauto key format
        "description": "Save the current document"
    }
}

# Common keyboard patterns UFO uses
COMMON_SHORTCUTS = {
    "save": "{Ctrl}s",
    "copy": "{Ctrl}c",
    "paste": "{Ctrl}v",
    "undo": "{Ctrl}z",
    "select_all": "{Ctrl}a",
    "find": "{Ctrl}f",
    "new": "{Ctrl}n",
    "close_tab": "{Ctrl}w",
    "switch_app": "{Alt}{Tab}",
}

def execute_keyboard(params):
    """Send keyboard shortcuts to the active window."""
    from pywinauto.keyboard import send_keys
    send_keys(params["keys"])
```

### Scroll Actions

For content that extends beyond the visible area:

```python
action = {
    "action_type": "scroll",
    "control_label": 3,
    "parameters": {
        "direction": "down",     # up, down, left, right
        "amount": 5,             # Number of scroll units
    }
}

def execute_scroll(control, params):
    """Scroll within a control."""
    element = find_control_by_label(control["control_label"])
    direction = params["direction"]
    amount = params["amount"]

    if direction == "down":
        element.scroll("down", "page", amount)
    elif direction == "up":
        element.scroll("up", "page", amount)
```

## Application-Specific Control Types

Windows applications expose different control types through the UI Automation framework. UFO recognizes and interacts with all standard UIA control types:

```python
# UIA Control Types that UFO can interact with
UIA_CONTROL_TYPES = {
    "Button": "click",           # Standard buttons
    "CheckBox": "toggle",        # Check/uncheck
    "ComboBox": "select",        # Dropdown selection
    "DataGrid": "cell_select",   # Table/grid navigation
    "Edit": "set_text",          # Text input fields
    "Hyperlink": "click",        # Clickable links
    "ListItem": "click",         # Items in a list
    "Menu": "click",             # Menu items
    "MenuItem": "click",         # Sub-menu items
    "RadioButton": "select",     # Radio button selection
    "Slider": "set_value",       # Slider controls
    "Spinner": "set_value",      # Numeric up/down
    "Tab": "click",              # Tab switching
    "Text": "read",              # Static text (read-only)
    "Tree": "expand_collapse",   # Tree view navigation
    "TreeItem": "click",         # Tree node selection
}
```

### Excel-Specific Actions

Excel cells support unique patterns like range selection and formula entry:

```python
# Excel cell interaction
excel_actions = {
    "action_type": "excel_cell",
    "parameters": {
        "cell": "B5",
        "value": "=SUM(B2:B4)",
        "action": "set_formula"
    }
}

# When UFO detects Excel, it can use COM automation
def excel_set_cell(cell_ref: str, value: str):
    """Set an Excel cell value using the UIA pattern."""
    # UFO navigates to the Name Box, types the cell reference,
    # presses Enter to navigate, then types the value
    steps = [
        {"action": "click", "target": "Name Box"},
        {"action": "set_text", "text": cell_ref},
        {"action": "keyboard", "keys": "{Enter}"},
        {"action": "set_text", "text": value},
        {"action": "keyboard", "keys": "{Enter}"},
    ]
    return steps
```

### Outlook-Specific Actions

Email composition involves interacting with rich text editors and address fields:

```python
# Composing an email through UFO actions
outlook_compose_steps = [
    {"action": "click", "target": "New Email"},
    {"action": "click", "target": "To field"},
    {"action": "set_text", "text": "finance@company.com"},
    {"action": "keyboard", "keys": "{Tab}"},     # Move to CC
    {"action": "keyboard", "keys": "{Tab}"},     # Move to Subject
    {"action": "set_text", "text": "Q1 Sales Report"},
    {"action": "keyboard", "keys": "{Tab}"},     # Move to body
    {"action": "set_text", "text": "Please find the Q1 numbers attached."},
    {"action": "click", "target": "Send"},
]
```

## The Action Selection Prompt

UFO sends the vision model a structured prompt that includes the available actions. The model must choose from this constrained set:

```python
ACTION_PROMPT = """You are a Windows UI automation agent. Based on the
annotated screenshot, select the next action.

Available actions:
- click(label): Click on the UI element with the given label number
- set_text(label, text): Type text into the labeled control
- keyboard(keys): Send keyboard shortcut
- scroll(label, direction, amount): Scroll within a control
- finish(status): Mark task as complete or failed

Respond in JSON format:
{
  "thought": "What I observe and why I chose this action",
  "action_type": "click|set_text|keyboard|scroll|finish",
  "control_label": 5,
  "parameters": {}
}"""
```

## FAQ

### Can UFO interact with custom-drawn controls that are not standard UIA elements?

Custom-drawn controls without UIA support are UFO's biggest challenge. In these cases, UFO falls back to coordinate-based clicking using the vision model's understanding of the screenshot. This is less reliable but often works for simple buttons and text areas rendered without standard controls.

### How does UFO handle pop-up dialogs and confirmation boxes?

UFO's observation-action loop naturally handles unexpected dialogs. When a dialog appears, the next screenshot capture will show it, and the vision model will recognize it as a dialog requiring interaction (clicking OK, Cancel, or filling in fields) before continuing with the main task.

---

#UFOActions #UIAutomation #WindowsControls #ClickAutomation #KeyboardShortcuts #DesktopAI #PythonAutomation #pywinauto

---

Source: https://callsphere.ai/blog/ufo-action-types-click-type-scroll-application-specific-controls
