Report Abuse

Basic Information

OpenAdapt is an open source Python library that acts as an adapter between large multimodal models (LMMs) and traditional desktop and web graphical user interfaces to enable AI-first process automation. It is designed for developers and researchers who want to record human GUI interactions, convert screenshots and input into tokenized representations, learn from demonstrations, and generate synthetic input or replay actions to automate repetitive workflows. The project is model-agnostic, supports virtualized and web GUIs, and focuses on grounding model behavior in recorded processes to reduce hallucinations. It includes CLI tools, a web dashboard, a browser extension for capturing events, and a set of replay strategies and data models for recordings, action events, screenshots, and window events. The codebase is MIT licensed and intended as a development platform for building, evaluating, and iterating on LMM-driven GUI automation.

Links

Categorization

App Details

Features
Recording and event capture: record screenshots, audio narration and user action events for short demonstrations. Visualization and dashboard: generate HTML visualizations of recordings and run a local web dashboard. Tokenization and replay: convert screenshots and inputs into tokenized formats and replay recordings using multiple replay strategies including Vanilla, Stateful, Visual, and browser-aware strategies. Model-agnostic inference: designed to work with transformers and future LMMs. GUI understanding: integrates segmentation tools for GUI element understanding. Privacy and security: PII/PHI scrubbing integrations and decentralized data transfer via Magic Wormhole. Performance tooling: memory and performance monitoring using pympler and tracemalloc. Developer tooling: CLI commands, installation scripts, pre-commit hooks, migrations, and testing guidance.
Use Cases
OpenAdapt helps teams automate repetitive GUI workflows by capturing real human demonstrations and using large multimodal models to generate grounded automation actions. By learning from recorded sequences rather than relying solely on hand-written prompts, it reduces model hallucinations and increases task reliability across varying screen sizes and application behaviors. Developers can prototype and iterate on replay strategies, visualize recordings, and test replays locally through the dashboard and CLI. Built-in privacy scrubbing and secure data transfer protect sensitive data during development and sharing. The framework supports browser integration for web workflows and aims to accelerate RPA-like automation for software engineers, machine learning researchers, and automation teams who want to build LMM-driven agents that interact reliably with desktop and web GUIs.

Please fill the required fields*