---
title: "Installing and Configuring Microsoft UFO: Getting Started with Windows Automation"
description: "Step-by-step guide to installing Microsoft UFO, configuring API keys, setting up the configuration files, and running your first automated Windows task with natural language."
canonical: https://callsphere.ai/blog/installing-configuring-microsoft-ufo-windows-automation-setup
category: "Learn Agentic AI"
tags: ["Microsoft UFO", "Installation", "Windows Automation", "Setup Guide", "Python", "Configuration"]
author: "CallSphere Team"
published: 2026-03-18T00:00:00.000Z
updated: 2026-05-06T22:27:35.756Z
---

# Installing and Configuring Microsoft UFO: Getting Started with Windows Automation

> Step-by-step guide to installing Microsoft UFO, configuring API keys, setting up the configuration files, and running your first automated Windows task with natural language.

## Prerequisites

Before installing UFO, ensure your system meets the following requirements:

- **Windows 10 or 11** (UFO uses Windows UI Automation APIs that are not available on macOS or Linux)
- **Python 3.10 or later** installed and added to PATH
- **An OpenAI API key** with access to GPT-4V or GPT-4o (vision-capable models)
- **Git** for cloning the repository
- At least **8 GB of RAM** (screenshots and vision model calls are memory-intensive)

UFO depends on the Windows UI Automation COM interfaces, so it must run on a Windows machine — not WSL, not a Linux VM. If you are developing on macOS or Linux, you will need a Windows machine or a cloud Windows instance.

## Step 1: Clone the Repository

UFO is distributed as a GitHub repository, not a PyPI package. Clone it and enter the project directory:

```mermaid
flowchart LR
    INPUT(["User intent"])
    PARSE["Parse plus
classify"]
    PLAN["Plan and tool
selection"]
    AGENT["Agent loop
LLM plus tools"]
    GUARD{"Guardrails
and policy"}
    EXEC["Execute and
verify result"]
    OBS[("Trace and metrics")]
    OUT(["Outcome plus
next action"])
    INPUT --> PARSE --> PLAN --> AGENT --> GUARD
    GUARD -->|Pass| EXEC --> OUT
    GUARD -->|Fail| AGENT
    AGENT --> OBS
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OBS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff
```

```bash
git clone https://github.com/microsoft/UFO.git
cd UFO
```

## Step 2: Create a Virtual Environment and Install Dependencies

Set up an isolated Python environment:

```bash
python -m venv .venv
.venv\Scripts\activate

pip install -r requirements.txt
```

The requirements include `openai`, `Pillow` for screenshot handling, `pywinauto` for Windows UI Automation, and several other dependencies for image processing and control interaction.

## Step 3: Configure API Keys

UFO reads its configuration from YAML files in the `ufo/config/` directory. The primary file you need to edit is `config.yaml`. Create it from the template:

```bash
copy ufo\config\config.yaml.template ufo\config\config.yaml
```

Open the file and set your API credentials:

```yaml
# ufo/config/config.yaml

# OpenAI API configuration
OPENAI_API_TYPE: "openai"
OPENAI_API_KEY: "sk-proj-your-api-key-here"
OPENAI_API_BASE: "https://api.openai.com/v1"
OPENAI_API_VERSION: "2024-02-15-preview"

# Model selection
HOST_AGENT:
  API_MODEL: "gpt-4o"

APP_AGENT:
  API_MODEL: "gpt-4o"

# Screenshot settings
SCREENSHOT_BACKEND: "uia"  # Options: uia, win32
ANNOTATION_COLORS:
  - "#FF0000"
  - "#00FF00"
  - "#0000FF"
```

The configuration separates model settings for the HostAgent and AppAgent. You can use different models for each — for example, a cheaper model for host-level routing and a more capable model for in-app actions.

## Step 4: Configure Azure OpenAI (Optional)

If your organization uses Azure OpenAI Service instead of the public OpenAI API, update the configuration accordingly:

```yaml
# Azure OpenAI configuration
OPENAI_API_TYPE: "azure"
OPENAI_API_KEY: "your-azure-api-key"
OPENAI_API_BASE: "https://your-resource.openai.azure.com/"
OPENAI_API_VERSION: "2024-02-15-preview"

HOST_AGENT:
  API_MODEL: "your-gpt4o-deployment-name"

APP_AGENT:
  API_MODEL: "your-gpt4o-deployment-name"
```

Note that you provide the **deployment name**, not the model name, when using Azure.

## Step 5: Run Your First Task

With everything configured, launch UFO:

```bash
python -m ufo --task "Open Notepad and type Hello World"
```

UFO will:

1. Launch or find the Notepad application
2. Capture a screenshot and annotate UI elements
3. Send the annotated screenshot to GPT-4V
4. Execute the returned actions (click in the text area, type the text)
5. Repeat until the task is complete

You will see step-by-step output in the console showing what the agent observes and what actions it takes.

## Understanding the Configuration File

Here is a more complete configuration with explanations:

```yaml
# ufo/config/config.yaml - Full reference

# API Provider: "openai" or "azure"
OPENAI_API_TYPE: "openai"
OPENAI_API_KEY: "sk-proj-..."
OPENAI_API_BASE: "https://api.openai.com/v1"

# Agent model configuration
HOST_AGENT:
  API_MODEL: "gpt-4o"
  MAX_TOKENS: 2048
  TEMPERATURE: 0.1      # Low temperature for deterministic actions

APP_AGENT:
  API_MODEL: "gpt-4o"
  MAX_TOKENS: 4096       # Higher token limit for complex UI analysis
  TEMPERATURE: 0.1

# Execution settings
MAX_STEP: 50             # Maximum steps before aborting a task
SLEEP_TIME: 2            # Seconds to wait between actions (UI settling)
SAFE_GUARD: true         # Require confirmation before destructive actions

# Screenshot configuration
SCREENSHOT_BACKEND: "uia"
INCLUDE_LAST_SCREENSHOTS: 3   # Number of previous screenshots for context
CONCAT_SCREENSHOTS: false      # Whether to tile screenshots side by side

# Logging
LOG_LEVEL: "INFO"
SAVE_SCREENSHOTS: true         # Save annotated screenshots for debugging
LOG_DIR: "logs/"
```

## Step 6: Verify With a Multi-Step Task

Test a more complex workflow to confirm everything works end to end:

```bash
python -m ufo --task "Open File Explorer, navigate to Documents, and create a new folder called TestUFO"
```

Watch the console output as the HostAgent identifies File Explorer as the target application, the AppAgent navigates the folder tree, and the folder creation sequence executes.

## Environment Variables as an Alternative

Instead of editing the YAML file directly, you can set configuration values via environment variables. This is useful for CI/CD or containerized setups:

```bash
set OPENAI_API_KEY=sk-proj-your-key
set UFO_HOST_MODEL=gpt-4o
set UFO_APP_MODEL=gpt-4o
set UFO_MAX_STEP=30

python -m ufo --task "Your task here"
```

## Troubleshooting Common Setup Issues

**"No module named pywinauto"**: Make sure you activated the virtual environment before running pip install. Run `.venv\Scripts\activate` again and reinstall.

**"Access denied" on screenshot capture**: Run your terminal as Administrator. UFO needs elevated permissions to capture screenshots of some applications.

**"Model not found" errors**: Verify your API key has access to the vision model specified in config. Try `gpt-4o` as a fallback.

**Slow execution**: Increase `SLEEP_TIME` if actions are executing before the UI finishes rendering. Windows animations can cause the agent to see transitional states.

## FAQ

### Can I use UFO without an OpenAI API key?

UFO requires a vision-capable LLM to interpret screenshots. You can use Azure OpenAI as an alternative, or configure a local model endpoint that supports the OpenAI vision API format, but some form of multimodal model access is required.

### Does UFO support multiple monitors?

UFO captures the screen where the target application window is located. Multi-monitor setups work as long as the target application is fully visible on one screen. Split windows across monitors may cause partial screenshots.

### How much does it cost to run UFO tasks?

Each step involves sending an annotated screenshot (roughly 1000-2000 tokens for the image) plus prompt tokens to GPT-4o. A simple 5-step task costs approximately $0.05-0.15 USD. Complex multi-application tasks with 30+ steps can cost $0.50-1.00 USD.

---

#MicrosoftUFO #WindowsSetup #AIAgent #DesktopAutomation #GPT4Vision #PythonAutomation #UIAutomation

---

Source: https://callsphere.ai/blog/installing-configuring-microsoft-ufo-windows-automation-setup