Report Abuse

Basic Information

ClickClickClick is a developer-focused framework for building and running autonomous control tasks on Android devices and macOS computers using large language models. It coordinates planning and visual element detection to translate natural-language task descriptions into UI actions such as opening apps, navigating websites, composing drafts, and interacting with on-screen elements. The project supports multiple LLM providers, including OpenAI, Anthropic Claude, Google Gemini, and local Ollama models, and exposes a CLI, a Python API, a REST API, and a Gradio web interface so tasks can be executed interactively, programmatically, or via HTTP. Configuration is driven by YAML files and environment variables for model keys and executor settings. The repo includes examples for Gmail, Maps, browsing, and system tasks, troubleshooting guidance for ADB and macOS permissions, and development instructions for running tests and installing in editable mode.

Links

App Details

Features
The README documents multi-platform support for Android and macOS and configurable executor parameters for screen coordinates, swipe and press durations. It supports multiple planner and finder LLMs and recommends model pairings for performance, cost, and privacy, including GPT-4o, Gemini Flash, and Ollama for offline use. Interfaces include a CLI command "click3 run", a Python API with helper functions to get executors, planners and finders, a REST API served with uvicorn, and a Gradio web UI with live screenshots and task history. Visual automation is screenshot-driven with element detection and interaction. Configuration is handled in config/models.yaml and via environment variables. The project also provides debugging guidance, performance tuning tips, model download notes for Ollama, and example tasks and scripts.
Use Cases
The project helps developers and power users automate repetitive or complex UI workflows on phones and macOS by letting them describe tasks in natural language and execute them reliably. It enables rapid prototyping of agent-driven automation, supports both cloud and local LLMs for tradeoffs between accuracy, speed, cost and privacy, and exposes multiple integration points for different use cases: CLI for ad-hoc commands, Python API for embedding in scripts, REST API for remote invocation, and Gradio for visual interaction and monitoring. Included examples and troubleshooting steps reduce setup friction for ADB and macOS accessibility. Configurable model and executor settings let teams optimize performance and reliability on different hardware. The project also provides a clear development workflow for contributors and a roadmap for expanding platforms and orchestration features.

Please fill the required fields*