Report Abuse

Basic Information

This repository implements and documents CodeAct, a proposal to use executable code as a unified action space for LLM agents. It accompanies an ICML paper and provides the research artifacts needed to reproduce experiments: an instruction-tuning dataset (CodeActInstruct), trained agent checkpoints (CodeActAgent variants), evaluation materials (M3ToolEval and API-Bank analysis), and deployment examples. The codebase includes a model-serving workflow that exposes models via an OpenAI-compatible API, a containerized code execution engine that runs a Jupyter kernel per chat session, a chat user interface and a simple Python demo client, plus scripts for Docker and optional Kubernetes deployment. The README bundles step-by-step instructions to serve models via vLLM or llama.cpp, start the execution engine, run the chat UI or CLI demo, and reproduce data generation and training experiments for research and development.

Links

Categorization

App Details

Features
CodeAct unifies agent actions as executable Python code which can be run and inspected by a code execution engine during multi-turn interactions. The repo ships CodeActInstruct, a 7k multi-turn instruction tuning dataset, and two released agent checkpoints: CodeActAgent-Mistral-7b-v0.1 and CodeActAgent-Llama-7b. It provides ready-to-use components: model serving examples (vLLM and llama.cpp) into an OpenAI-compatible API, a per-session Jupyter-based containerized code executor, a chat UI with optional MongoDB persistence, and a command-line demo script. Infrastructure tooling includes Docker scripts, Kubernetes deployment guidance, data generation and model training docs, evaluation benchmarks and scripts, and conversion/quantization guidance for running models on laptops. The repo also contains scripts to start services, reproduce experiments, and evaluate agent performance.
Use Cases
CodeAct makes it practical to build LLM agents that perform reliable, observable actions by executing code and iteratively revising behavior based on runtime results. The README and artifacts enable researchers and engineers to reproduce the paper"s claims, deploy the agent stack locally or on clusters, and test models via a chat UI or CLI. Empirically the authors report up to about 20% higher success rates compared to text or JSON action formats on evaluated benchmarks, which suggests stronger tool use and robustness. The per-session containerized Jupyter executor provides isolation and ephemeral state for safe code runs. Provided datasets, checkpoints, evaluation suites, and deployment scripts reduce friction for experimentation, fine-tuning, and integration into applications.

Please fill the required fields*