WindowsAgentArena
Basic Information
Windows Agent Arena (WAA) is a reproducible, scalable platform for testing and benchmarking multi-modal, desktop AI agents in a realistic Windows OS environment. The repository provides the infrastructure, scripts, images and example agents needed to deploy, run and evaluate agentic workflows that interact with a Windows 11 virtual machine. It is intended for researchers and developers who want to measure agent performance across many GUI-driven tasks, compare screen-understanding pipelines, and run experiments locally or at scale on Azure ML. WAA includes automation to prepare a golden Windows VM image, Docker images to host the server components, configuration files for OpenAI or Azure OpenAI keys, and orchestration scripts to run baseline agents, customize agent parameters, and collect benchmark outputs.