Cradle
Basic Information
Cradle is a research and developer framework designed to enable foundation language and vision models to control general desktop and game environments using the same human interface: screenshots as input and keyboard and mouse actions as output. The repository packages environment-specific adapters for multiple games and desktop applications so that models can perceive screen state, plan actions, and execute skills in situ. It provides a unified runner and modular architecture to integrate LLMs, vision modules, object detectors, OCR and other providers, plus configuration and resource files for specific targets. The project is intended for researchers and engineers who want to prototype, run and extend agents that perform complex computer tasks across diverse interactive software and game environments.