Report Abuse

Basic Information

Nova Act is a Python SDK and research-preview model designed to build agents that reliably take actions inside web browsers. It provides a programmatic interface for driving Chrome/Chromium via Playwright and an act() API that accepts natural language prompts to perform browser interactions step by step. The SDK is intended for developers prototyping browser automation workflows, allowing prompts to be split into small, repeatable steps, interleaving direct Python code, tests, and parallelization. The repository includes installation and authentication instructions, sample scripts such as ordering a product or searching listings, guidance for handling sensitive input and captchas, and notes on limitations. It also describes production integration options and work with customers on features like AWS IAM, S3 storage, and an integration path with a managed browser tool for scaling.

Links

App Details

Features
Provides an act() method to translate natural language step prompts into browser actions and returns structured ActResult objects. Supports script and interactive modes, pydantic schema parsing including a BOOL_SCHEMA helper, and direct access to the Playwright Page for manual interactions, screenshots, DOM access, and keyboard typing. Supports parallel sessions by running multiple lightweight NovaAct instances, persistent authenticated sessions via user_data_dir and cloning options, proxy configuration, logging and trace HTML outputs, optional video recording, and an S3Writer utility to upload run artifacts to Amazon S3. Constructor options include headless, logs_directory, record_video, proxy, API key auth, and go_to_url timeout tuning. Includes samples folder and guidance for prompt design and error handling.
Use Cases
Helps developers automate complex web workflows by breaking goals into smaller, controllable browser steps that improve repeatability and debuggability. It simplifies common tasks like searching, form filling, file upload/download, date selection, and structured data extraction using schemas so results can be parsed into typed models. Built-in logging, per-act HTML traces, and optional session video make it easier to inspect and reproduce failures. Persistent user profiles and Playwright access enable authenticated flows and secure entry of sensitive values while S3 integration enables centralized storage of session artifacts for auditing. The SDK is aimed at prototyping and productionization pathways with integration options for managed browser services, while documenting limitations such as captchas, prompt-injection risk, and non-browser interaction constraints.

Please fill the required fields*