openai cua sample app

Report Abuse

Basic Information

This repository is a sample application for developers who want to build and experiment with a Computer Using Agent (CUA) using the OpenAI Responses API. It demonstrates how a model can inspect screenshots of a computer interface, propose actions like clicks and typing via computer_call outputs, and receive resulting screenshots to continue a task loop. The repo provides a minimal, extensible demonstration rather than a production agent and emphasizes that computer use is in preview and should not be trusted in high-stakes or authenticated environments. It includes setup and run instructions for a local development environment, example scripts, and guidance for contributing new computer environments. The main goal is to show the end-to-end interaction pattern and practical integrations for executing model-suggested UI actions in different runtime environments.

Links

App Details

Features
The project supplies two lightweight abstractions: a Computer interface for environments that can execute actions and an Agent loop implementing run_full_turn to handle computer actions and function calls. A command line interface lets you run agents with flags for computer selection, input, debug, image display, and start URL. Several example Computer implementations are included such as local-playwright, Docker, Browserbase, and Scrapybara, plus Docker build and run instructions and a sample DNS restriction recommendation. The repo documents supported CUA actions like click, double_click, scroll, type, wait, move, keypress, and drag. It also demonstrates function calling integration where tools can be routed to Computer methods and provides example scripts and contributed computer scaffolding for extensibility.
Use Cases
This sample app helps developers prototype and test agent-driven UI automation by providing runnable examples, reusable abstractions, and multiple environment integrations so you can experiment locally or with hosted browsers. It reduces boilerplate for connecting model outputs to real actions by defining a clear Computer contract and an Agent loop that repeatedly requests model actions and supplies screenshots. Docker and remote browser examples let teams emulate isolated desktops, and function calling routing helps handle edge cases where screenshots omit UI elements. Debug and show flags aid interactive debugging, and contributor guidance makes it straightforward to add new environments and test end-to-end behavior. Safety notes and recommended precautions are included to remind users of preview risks.

Please fill the required fields*