cerebellum
Basic Information
Cerebellum is a lightweight browser-driven agent designed to accomplish user-defined goals on web pages by performing keyboard and mouse actions. It models web browsing as a directed graph where each page is a node and user interactions are edges, and it uses a large language model to analyze page content and interactive elements, plan the next action, and iterate until a goal node is reached or deemed unachievable. The project provides setup instructions and SDK materials for Python and TypeScript and demonstrates end-to-end tasks such as product search and add-to-cart. The current implementation uses Claude 3.5 Sonnet as the ActionPlanner and automates a Selenium-supported browser to execute planned actions.
Links
Stars
775
Github Repository
Categorization
App Details
Features
Compatible with any Selenium-supported browser for executing mouse and keyboard actions. Integrates Claude 3.5 Sonnet as an ActionPlanner to inspect pages and decide next steps. Fills forms using user-provided JSON data. Accepts runtime instructions so browsing strategy can be adjusted dynamically during a session. Represents pages and actions as nodes and edges to simplify navigation logic. Includes examples and SDK setup guides for Python and TypeScript. Roadmap and TODOs call out tabbed browsing handling, creating training datasets from sessions, improved scrolling and mouse marking on screenshots, and planned support for additional LLMs. Known issues include Claude safety refusals for CAPTCHAs and some content types.
Use Cases
Cerebellum automates complex, multi-step web tasks that would otherwise require manual browsing, such as searching for products, filling forms, and adding items to a cart. By converting page state into structured inputs for an LLM and executing LLM-decided actions in a Selenium browser, it reduces developer effort for building web automation flows and prototypes autonomous web agents. The runtime instruction feature lets users alter behavior mid-session, making it adaptable to changing requirements. Cross-language setup guidance for Python and TypeScript helps teams integrate the agent into existing tooling. The project also serves as a foundation for creating training datasets and expanding to additional planners in future work.