crab

CRAB is a Python‚Äö√Ñ√´centric framework and benchmark suite designed to build, run, and evaluate language‚Äö√Ñ√´model driven agents across multiple types of environments. It targets researchers and developers who need a reproducible way to create agent environments, define tasks, and measure agent behavior in in‚Äö√Ñ√´memory setups, Docker containers, virtual machines, or distributed physical machines so long as they are accessible via Python functions. The repository includes a benchmark collection (crab-benchmark-v0) with datasets and experiment code, example scripts that run template environments with OpenAI models, installation instructions, a demo video, and a reference paper on arXiv. CRAB emphasizes a unified interface to let an agent access different environments concurrently and supports multimodal embodied language model agents.

Stars

364

Language

App URL

https://github.com/camel-ai/crab

Github Repository

https://github.com/camel-ai/crab/blob/main/README.md

Features

CRAB provides cross‚Äö√Ñ√´platform and multi‚Äö√Ñ√´environment support that unifies access to diverse deployment backends through Python functions. It offers an easy configuration model where actions are added by decorating Python functions with @action and environments are composed by integrating those actions. The repository contains a Python‚Äö√Ñ√´native benchmarking API for defining tasks and evaluators, including a novel graph evaluator for fine‚Äö√Ñ√´grained metrics. It ships example scripts (single_env and multi_env) showing OpenAI agent integration, a benchmark directory (crab-benchmark-v0) with datasets and experiments, documentation and demo materials, and community links for collaboration.

Use Cases

CRAB helps teams and researchers rapidly prototype agent environments and compare agent policies under consistent conditions across different execution contexts. By exposing a unified Python interface and simple action decorators, it lowers the engineering effort to add new actions, compose environments, and run experiments. The packaged benchmark datasets and evaluators make it easier to reproduce published results and to apply the provided graph evaluator for detailed performance analysis. Example scripts demonstrate end‚Äö√Ñ√´to‚Äö√Ñ√´end runs with external LLMs, and installation is standard via pip for Python 3.10 or newer, enabling straightforward adoption for benchmarking multimodal embodied language model agents.

crab

Basic Information

Links

Categorization

App Details

Categories

Similar Listings

yutu

vibevideo-mcp

xpert

Curie

Open_Data_QnA

Featured Listings

Terry Bison Ranch

The Singapore Flyer

Tags

More Filters

crab

Categories

Similar Listings

yutu

vibevideo-mcp

xpert

Curie

Open_Data_QnA

Featured Listings

Terry Bison Ranch

The Singapore Flyer

Tags