Report Abuse

Basic Information

CRAB is a Python‚Äëcentric framework and benchmark suite designed to build, run, and evaluate language‚Äëmodel driven agents across multiple types of environments. It targets researchers and developers who need a reproducible way to create agent environments, define tasks, and measure agent behavior in in‚Äëmemory setups, Docker containers, virtual machines, or distributed physical machines so long as they are accessible via Python functions. The repository includes a benchmark collection (crab-benchmark-v0) with datasets and experiment code, example scripts that run template environments with OpenAI models, installation instructions, a demo video, and a reference paper on arXiv. CRAB emphasizes a unified interface to let an agent access different environments concurrently and supports multimodal embodied language model agents.

Links

Categorization

App Details

Features
CRAB provides cross‚Äëplatform and multi‚Äëenvironment support that unifies access to diverse deployment backends through Python functions. It offers an easy configuration model where actions are added by decorating Python functions with @action and environments are composed by integrating those actions. The repository contains a Python‚Äënative benchmarking API for defining tasks and evaluators, including a novel graph evaluator for fine‚Äëgrained metrics. It ships example scripts (single_env and multi_env) showing OpenAI agent integration, a benchmark directory (crab-benchmark-v0) with datasets and experiments, documentation and demo materials, and community links for collaboration.
Use Cases
CRAB helps teams and researchers rapidly prototype agent environments and compare agent policies under consistent conditions across different execution contexts. By exposing a unified Python interface and simple action decorators, it lowers the engineering effort to add new actions, compose environments, and run experiments. The packaged benchmark datasets and evaluators make it easier to reproduce published results and to apply the provided graph evaluator for detailed performance analysis. Example scripts demonstrate end‚Äëto‚Äëend runs with external LLMs, and installation is standard via pip for Python 3.10 or newer, enabling straightforward adoption for benchmarking multimodal embodied language model agents.

Please fill the required fields*