Report Abuse

Basic Information

Cua is a platform and toolkit that lets developers build, run, and deploy Computer‚ÄëUse Agents that control full operating systems inside virtual containers. It provides both a Computer SDK for programmatic control of Windows, Linux, and macOS virtual machines and an Agent SDK for running and composing AI agents that perform GUI interactions and OS tasks. The project is designed to standardize agent outputs and agent-to‚Äëcomputer interactions so models and tools can be swapped, tested, and deployed locally or to the cloud. The repository includes multiple modules and integrations for VM lifecycle management, agent orchestration, server components, and examples showing how to automate UI actions and capture screenshots. It is open source under the MIT License and targets developers and researchers working on agents that must interact with real desktop environments.

Links

Categorization

App Details

Features
Consistent Computer SDK with a pyautogui‚Äëstyle API for actions like screenshots, clicks, and typing across Windows, Linux, and macOS. Agent SDK that normalizes model outputs and supports composed agents, UI grounding models, and human‚Äëin‚Äëthe‚Äëloop workflows. Multi‚Äëprovider model support including hosted and local liteLLM providers and explicit examples of supported model families. Modular components: VM managers (Lume), VM runtime interface (Lumier), Python and TypeScript libraries for Computer and Core utilities, MCP server for desktop integrations, and SOM library. CLI and Python quickstarts with example code snippets. Benchmarking support via HUD and example notebooks for OSWorld and SheetBench evaluations. Installable packages via pip, npm, and Docker images for VM runtimes.
Use Cases
Cua helps developers and researchers automate GUI and OS tasks, evaluate agent performance on standardized benchmarks, and iterate on computer‚Äëuse models without rebuilding infrastructure. It simplifies creating reproducible test environments by managing VMs locally or in the cloud, exposes a consistent programmatic interface for interacting with desktops, and standardizes agent output for easier integration with tooling. The project supports rapid experimentation with different LLM backends and composed agent architectures, enables human‚Äëin‚Äëthe‚Äëloop data collection for training, and provides server and CLI tools for deployment and integration with MCP clients. Example code demonstrates common workflows like taking screenshots and issuing input actions, allowing teams to prototype automations and collect trajectories for model training.

Please fill the required fields*