OSWorld
Basic Information
OSWorld is an open research environment and benchmark for evaluating multimodal agents on open-ended tasks that interact with real computer environments. It provides a programmatic DesktopEnv for running agent policies against virtual machines or containerized desktops and includes interfaces and baseline agents used in the paper. The repository bundles task definitions, evaluation examples, scripts to run single and parallel experiments, and guidance for local and public verification of results. It targets researchers and developers who want to test agent behavior on GUI and web tasks inside reproducible VM or Docker-based environments, compare performance against published baselines, and submit results for the OSWorld-Verified leaderboard. The project also supplies documentation, a citation for academic use, and downloadable init-state files to accelerate setup.