Installing and Configuring Microsoft UFO: Getting Started with Windows Automation

Prerequisites

Before installing UFO, ensure your system meets the following requirements:

Windows 10 or 11 (UFO uses Windows UI Automation APIs that are not available on macOS or Linux)
Python 3.10 or later installed and added to PATH
An OpenAI API key with access to GPT-4V or GPT-4o (vision-capable models)
Git for cloning the repository
At least 8 GB of RAM (screenshots and vision model calls are memory-intensive)

UFO depends on the Windows UI Automation COM interfaces, so it must run on a Windows machine — not WSL, not a Linux VM. If you are developing on macOS or Linux, you will need a Windows machine or a cloud Windows instance.

Step 1: Clone the Repository

UFO is distributed as a GitHub repository, not a PyPI package. Clone it and enter the project directory:

flowchart LR
    INPUT(["User intent"])
    PARSE["Parse plus<br/>classify"]
    PLAN["Plan and tool<br/>selection"]
    AGENT["Agent loop<br/>LLM plus tools"]
    GUARD{"Guardrails<br/>and policy"}
    EXEC["Execute and<br/>verify result"]
    OBS[("Trace and metrics")]
    OUT(["Outcome plus<br/>next action"])
    INPUT --> PARSE --> PLAN --> AGENT --> GUARD
    GUARD -->|Pass| EXEC --> OUT
    GUARD -->|Fail| AGENT
    AGENT --> OBS
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OBS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff

git clone https://github.com/microsoft/UFO.git
cd UFO

Step 2: Create a Virtual Environment and Install Dependencies

Set up an isolated Python environment:

python -m venv .venv
.venv\Scripts\activate

pip install -r requirements.txt

The requirements include openai, Pillow for screenshot handling, pywinauto for Windows UI Automation, and several other dependencies for image processing and control interaction.

Step 3: Configure API Keys

UFO reads its configuration from YAML files in the ufo/config/ directory. The primary file you need to edit is config.yaml. Create it from the template:

copy ufo\config\config.yaml.template ufo\config\config.yaml

Open the file and set your API credentials:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

# ufo/config/config.yaml

# OpenAI API configuration
OPENAI_API_TYPE: "openai"
OPENAI_API_KEY: "sk-proj-your-api-key-here"
OPENAI_API_BASE: "https://api.openai.com/v1"
OPENAI_API_VERSION: "2024-02-15-preview"

# Model selection
HOST_AGENT:
  API_MODEL: "gpt-4o"

APP_AGENT:
  API_MODEL: "gpt-4o"

# Screenshot settings
SCREENSHOT_BACKEND: "uia"  # Options: uia, win32
ANNOTATION_COLORS:
  - "#FF0000"
  - "#00FF00"
  - "#0000FF"

The configuration separates model settings for the HostAgent and AppAgent. You can use different models for each — for example, a cheaper model for host-level routing and a more capable model for in-app actions.

Step 4: Configure Azure OpenAI (Optional)

If your organization uses Azure OpenAI Service instead of the public OpenAI API, update the configuration accordingly:

# Azure OpenAI configuration
OPENAI_API_TYPE: "azure"
OPENAI_API_KEY: "your-azure-api-key"
OPENAI_API_BASE: "https://your-resource.openai.azure.com/"
OPENAI_API_VERSION: "2024-02-15-preview"

HOST_AGENT:
  API_MODEL: "your-gpt4o-deployment-name"

APP_AGENT:
  API_MODEL: "your-gpt4o-deployment-name"

Note that you provide the deployment name, not the model name, when using Azure.

Step 5: Run Your First Task

With everything configured, launch UFO:

python -m ufo --task "Open Notepad and type Hello World"

UFO will:

Launch or find the Notepad application
Capture a screenshot and annotate UI elements
Send the annotated screenshot to GPT-4V
Execute the returned actions (click in the text area, type the text)
Repeat until the task is complete

You will see step-by-step output in the console showing what the agent observes and what actions it takes.

Understanding the Configuration File

Here is a more complete configuration with explanations:

# ufo/config/config.yaml - Full reference

# API Provider: "openai" or "azure"
OPENAI_API_TYPE: "openai"
OPENAI_API_KEY: "sk-proj-..."
OPENAI_API_BASE: "https://api.openai.com/v1"

# Agent model configuration
HOST_AGENT:
  API_MODEL: "gpt-4o"
  MAX_TOKENS: 2048
  TEMPERATURE: 0.1      # Low temperature for deterministic actions

APP_AGENT:
  API_MODEL: "gpt-4o"
  MAX_TOKENS: 4096       # Higher token limit for complex UI analysis
  TEMPERATURE: 0.1

# Execution settings
MAX_STEP: 50             # Maximum steps before aborting a task
SLEEP_TIME: 2            # Seconds to wait between actions (UI settling)
SAFE_GUARD: true         # Require confirmation before destructive actions

# Screenshot configuration
SCREENSHOT_BACKEND: "uia"
INCLUDE_LAST_SCREENSHOTS: 3   # Number of previous screenshots for context
CONCAT_SCREENSHOTS: false      # Whether to tile screenshots side by side

# Logging
LOG_LEVEL: "INFO"
SAVE_SCREENSHOTS: true         # Save annotated screenshots for debugging
LOG_DIR: "logs/"

Step 6: Verify With a Multi-Step Task

Test a more complex workflow to confirm everything works end to end:

python -m ufo --task "Open File Explorer, navigate to Documents, and create a new folder called TestUFO"

Watch the console output as the HostAgent identifies File Explorer as the target application, the AppAgent navigates the folder tree, and the folder creation sequence executes.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Environment Variables as an Alternative

Instead of editing the YAML file directly, you can set configuration values via environment variables. This is useful for CI/CD or containerized setups:

set OPENAI_API_KEY=sk-proj-your-key
set UFO_HOST_MODEL=gpt-4o
set UFO_APP_MODEL=gpt-4o
set UFO_MAX_STEP=30

python -m ufo --task "Your task here"

Troubleshooting Common Setup Issues

"No module named pywinauto": Make sure you activated the virtual environment before running pip install. Run .venv\Scripts\activate again and reinstall.

"Access denied" on screenshot capture: Run your terminal as Administrator. UFO needs elevated permissions to capture screenshots of some applications.

"Model not found" errors: Verify your API key has access to the vision model specified in config. Try gpt-4o as a fallback.

Slow execution: Increase SLEEP_TIME if actions are executing before the UI finishes rendering. Windows animations can cause the agent to see transitional states.

FAQ

Can I use UFO without an OpenAI API key?

UFO requires a vision-capable LLM to interpret screenshots. You can use Azure OpenAI as an alternative, or configure a local model endpoint that supports the OpenAI vision API format, but some form of multimodal model access is required.

Does UFO support multiple monitors?

UFO captures the screen where the target application window is located. Multi-monitor setups work as long as the target application is fully visible on one screen. Split windows across monitors may cause partial screenshots.

How much does it cost to run UFO tasks?

Each step involves sending an annotated screenshot (roughly 1000-2000 tokens for the image) plus prompt tokens to GPT-4o. A simple 5-step task costs approximately $0.05-0.15 USD. Complex multi-application tasks with 30+ steps can cost $0.50-1.00 USD.

#MicrosoftUFO #WindowsSetup #AIAgent #DesktopAutomation #GPT4Vision #PythonAutomation #UIAutomation

Installing and Configuring Microsoft UFO: Getting Started with Windows Automation

Prerequisites

Step 1: Clone the Repository

Step 2: Create a Virtual Environment and Install Dependencies

Step 3: Configure API Keys

Step 4: Configure Azure OpenAI (Optional)

Step 5: Run Your First Task

Understanding the Configuration File

Step 6: Verify With a Multi-Step Task

Environment Variables as an Alternative

Troubleshooting Common Setup Issues

FAQ

Can I use UFO without an OpenAI API key?

Does UFO support multiple monitors?

How much does it cost to run UFO tasks?

Try CallSphere AI Voice Agents

Related Articles You May Like

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

Smolagents: Hugging Face's Code-First Agent Framework Reviewed

Deploy a Voice Agent on Modal with Python and Serverless GPU

Pydantic AI April 2026 Update: Typed Agents and Structured Tools

How to Add RAG to a Voice Agent with ChromaDB and OpenAI Embeddings

Docker Multi-Stage AI Agent Images: uv + Distroless = 80MB (2026)