Report Abuse

Basic Information

Cradle is a research and developer framework designed to enable foundation language and vision models to control general desktop and game environments using the same human interface: screenshots as input and keyboard and mouse actions as output. The repository packages environment-specific adapters for multiple games and desktop applications so that models can perceive screen state, plan actions, and execute skills in situ. It provides a unified runner and modular architecture to integrate LLMs, vision modules, object detectors, OCR and other providers, plus configuration and resource files for specific targets. The project is intended for researchers and engineers who want to prototype, run and extend agents that perform complex computer tasks across diverse interactive software and game environments.

Links

Categorization

App Details

Features
The codebase bundles per-environment configurations, resources and prompts for multiple games and applications including Red Dead Redemption 2, Stardew Valley, Cities: Skylines, Dealer"s Life 2 and several desktop apps. Core components include a skill registry with atomic and composite skills, environment adapters, a unified runner that orchestrates execution flow, and provider modules for LLM calls, object detection, video and image augmentation, SAM segmentation and icon replacement. It supports OCR setup instructions, templates for action planning, information gathering and self-reflection, and configuration files for OpenAI, Azure and Claude backends. The repository includes migration guides, example prompts and saves, logging and memory modules, and tooling for pausing/unpausing real-time games.
Use Cases
Cradle helps teams and researchers accelerate development of agents that interact with graphical interfaces by providing reusable building blocks and concrete environment implementations. It simplifies integrating large language and vision models into closed-loop control by handling screen capture, perception, skill execution and LLM invocation through a consistent API. The provided configs, prompts, and migration guidance reduce the effort required to add new games or applications. Support for multiple model providers and included demos, videos and a companion paper make it useful for experimentation, demonstrations and reproducible research into general computer control.

Please fill the required fields*