Chrome GPT

Report Abuse

Basic Information

Chrome-GPT is an experimental AutoGPT agent that programmatically controls a Chrome browser session to perform web tasks. The repository provides a ready-to-run agent that uses Langchain and Selenium to let a large language model scroll, click, input text, switch tabs, and generally navigate and manipulate web pages. It is intended for users who want to prototype or demo autonomous web interactions from natural language prompts, for example searching for venues and filling out contact forms. The project is explicitly experimental and warns that the agent may take incorrect actions. The README includes a demo use case, basic CLI usage via python -m chromegpt, and setup instructions such as providing an OpenAI API key and installing dependencies with Poetry. The tool supports different agent types and model selection and can run headless or with a human-in-loop option for Auto-GPT.

Links

Categorization

App Details

Features
The README documents built-in capabilities including Google search and both long-term and short-term memory management. Chrome-specific actions are supported such as describing a webpage, scrolling to elements, clicking buttons and links, inputting forms, and switching tabs. Multiple agent styles are available: zero-shot, BabyAGI, and Auto-GPT, with an option for human-in-loop when using Auto-GPT. The project integrates Langchain and Selenium for orchestration of model outputs into browser actions. CLI flags allow setting the task, agent type, model, headless mode, and verbosity. Requirements and deployment options are noted: Chrome, Python >3.8, Poetry, an OpenAI API key, and a docker-compose invocation is mentioned as an alternative start method. Chrome plugin support is noted as in progress.
Use Cases
Chrome-GPT helps automate and prototype web-focused workflows by translating natural language instructions into browser interactions, useful for tasks like searching for locations, filling out web forms, and navigating multi-step web processes. It demonstrates how LLMs can be connected to a real browser to perform end-to-end tasks that require interacting with live web pages. The repository provides CLI examples and configuration for running with different models and agent behaviors, enabling experimentation with autonomous agents and human-assisted automation. The README also lists known limitations so users can assess risk: limited web crawling reliability, slow per-action latency, and occasional Langchain parsing issues, making the project suitable primarily for testing and prototyping rather than production-critical automation.

Please fill the required fields*