Report Abuse

Basic Information

Clevrr Computer is a desktop automation agent that performs precise system actions on behalf of a user by combining Python automation and multimodal language models. It uses PyAutoGUI to simulate mouse movements, clicks, keyboard inputs and window management while continuously capturing screenshots to interpret the on-screen context. The agent creates a chain-of-thought plan for tasks, queries the screen via a get_screen_info tool, and executes code-driven actions through a PythonREPLAst tool. The repository includes a runnable application with a floating TKinter interface, command-line flags to choose models (gemini or openai) and configuration via environment variables. The README emphasizes safety, advising use in isolated VMs or containers and restricting internet and sensitive data access to reduce risk from prompt injection or unintended real-world effects.

Links

App Details

Features
Automates mouse movements, clicks, and keyboard inputs using PyAutoGUI. Captures screenshots continuously and uses a grid-based get_screen_info tool to map true screen coordinates for multimodal understanding. Provides a PythonREPLAst tool to run programmatic actions driven by model-generated plans. Supports model selection between gemini and openai and a floating TKinter UI with optional disable flag. Includes error handling and feedback mechanisms to improve reliability and avoid unintentional actions. Comes with examples and demo media demonstrating automation flows. Configuration is handled via an .env file for Azure and Google API keys and the repository offers guidance on safe deployment and prompt-injection mitigation.
Use Cases
This project helps automate repetitive or precise desktop workflows such as UI interactions, demonstrations, and scripted tasks by letting an AI agent observe the screen and perform coordinated actions. It reduces manual effort for tasks that require exact cursor control or sequential keyboard input and can be used for testing interfaces, automating routine operations, or building demo agents. The multimodal approach enables the agent to interpret visual screen contents before acting, improving context-aware automation. Safety guidance and recommendations for isolated environments help mitigate risks, making it suitable for experimental automation while prompting users to confirm consequential actions and avoid exposing sensitive credentials.

Please fill the required fields*