Report Abuse

Basic Information

UFO² is an open source Desktop AgentOS and research platform that turns natural‑language goals into reliable multi‑application workflows on Windows. The repository provides a multi‑agent architecture centered on a HostAgent that parses user goals and coordinates AppAgents running per application. AppAgents execute ReAct loops with multimodal perception, hybrid control detection and a Puppeteer executor that chooses between native APIs and GUI actions. The project includes a Knowledge Substrate for retrieval‑augmented reasoning, a Speculative Executor to predict and validate batches of actions, and tooling for sandboxed execution and logging. The codebase targets researchers and developers building and evaluating desktop agents, requires Python 3.10+ on Windows 10+, and is intended for research use rather than production deployment. The README also documents installation, LLM configuration for Host and App agents, optional RAG settings and evaluation benchmarks used to measure agent performance.

Links

Categorization

App Details

Features
The README highlights strong OS integration and multi‚Äëagent orchestration as primary features. Deep control detection uses Windows UIA, Win32 and WinCOM plus a hybrid vision pipeline for custom controls. Hybrid GUI+API Actions let the Puppeteer prefer native APIs when available and fallback to clicks and keystrokes when necessary. The Speculative Executor reduces LLM latency by predicting multiple likely actions and validating them in a single pass. The Knowledge Substrate combines offline docs, web search, demonstrations and execution traces for retrieval‚Äëaugmented behavior. AppAgents run ReAct loops with multimodal perception and a per‚Äëapplication executor. Additional features include sandboxed virtual desktop execution, configurable LLM support for Host and App agents, execution logging with screenshots for debugging, and benchmark integration for evaluation.
Use Cases
UFO² helps researchers and developers build agents that can automate complex, cross‑application tasks on Windows by providing an end‑to‑end AgentOS architecture. It simplifies orchestrating multiple agents, selecting robust action strategies, and integrating knowledge sources so agents can reason with documentation, search results and past experience. The Speculative Executor and hybrid control strategy improve responsiveness and reliability compared to naive click‑based automation. Built‑in logging and screenshot capture aid debugging and replay of agent behavior. Configurable LLM settings and optional RAG enable experimentation with different models and memory sources. The included benchmarks and technical reports make it straightforward to measure progress and reproduce research results. The project is presented for research use and is not positioned as a third‑party production product.

Please fill the required fields*