AppAgent
Basic Information
AppAgent is an open-source research framework that turns large multimodal language models into agents that can operate smartphone applications. It provides a pipeline to control Android apps via a simplified human-like action space such as taps and swipes, without requiring back-end access to target apps. The repo implements a two-phase method: an exploration phase where the agent autonomously explores or learns from human demonstrations to build a documentation base of UI elements, and a deployment phase where the agent uses that documentation to complete user-specified tasks. The project includes Python scripts for learning and running agents, configuration via a YAML file for model choice and request settings, and support for real devices or Android emulators connected through adb. The codebase and benchmark were released alongside a CHI paper and are provided under an MIT license.