Clevrr-Computer

Clevrr Computer is a desktop automation agent that performs precise system actions on behalf of a user by combining Python automation and multimodal language models. It uses PyAutoGUI to simulate mouse movements, clicks, keyboard inputs and window management while continuously capturing screenshots to interpret the on-screen context. The agent creates a chain-of-thought plan for tasks, queries the screen via a get_screen_info tool, and executes code-driven actions through a PythonREPLAst tool. The repository includes a runnable application with a floating TKinter interface, command-line flags to choose models (gemini or openai) and configuration via environment variables. The README emphasizes safety, advising use in isolated VMs or containers and restricting internet and sensitive data access to reduce risk from prompt injection or unintended real-world effects.

Stars

286

Language

App URL

https://github.com/Clevrr-AI/Clevrr-Computer

Github Repository

https://github.com/Clevrr-AI/Clevrr-Computer/blob/main/README.md

Features

Automates mouse movements, clicks, and keyboard inputs using PyAutoGUI. Captures screenshots continuously and uses a grid-based get_screen_info tool to map true screen coordinates for multimodal understanding. Provides a PythonREPLAst tool to run programmatic actions driven by model-generated plans. Supports model selection between gemini and openai and a floating TKinter UI with optional disable flag. Includes error handling and feedback mechanisms to improve reliability and avoid unintentional actions. Comes with examples and demo media demonstrating automation flows. Configuration is handled via an .env file for Azure and Google API keys and the repository offers guidance on safe deployment and prompt-injection mitigation.

Use Cases

This project helps automate repetitive or precise desktop workflows such as UI interactions, demonstrations, and scripted tasks by letting an AI agent observe the screen and perform coordinated actions. It reduces manual effort for tasks that require exact cursor control or sequential keyboard input and can be used for testing interfaces, automating routine operations, or building demo agents. The multimodal approach enables the agent to interpret visual screen contents before acting, improving context-aware automation. Safety guidance and recommendations for isolated environments help mitigate risks, making it suitable for experimental automation while prompting users to confirm consequential actions and avoid exposing sensitive credentials.

Clevrr-Computer

Basic Information

Links

App Details

Categories

Similar Listings

product-manager-prompts

job-application-bot-by-ollama-ai

moling

AutoDroid

athena-core

Featured Listings

Terry Bison Ranch

The Singapore Flyer

Tags

More Filters

Clevrr-Computer

Categories

Similar Listings

product-manager-prompts

job-application-bot-by-ollama-ai

moling

AutoDroid

athena-core

Featured Listings

Terry Bison Ranch

The Singapore Flyer

Tags