Report Abuse

Basic Information

MLGym is a research-oriented Gym-style environment and benchmark designed to evaluate and train AI research agents, including reinforcement learning algorithms and LLM-based agents. It provides MLGym-Bench, a collection of 13 diverse, open-ended ML research tasks spanning computer vision, natural language processing, reinforcement learning, and game theory. The project targets researchers who need a controlled environment to generate data, implement methods, train models, run experiments, analyze results, and iterate on ideas. The framework is experimental and under active development, with a main branch intended to hold the latest stable release and documented breaking changes. It emphasizes benchmarking and advancing AI research agents rather than production end-user applications.

Links

App Details

Features
MLGym bundles a benchmark suite of 13 research tasks across multiple ML domains and provides configurable task and agent YAML files to define experiments. It includes a CLI entry point run.py with flags for model selection, container type, GPU usage, cost limits, temperature, and max steps. The repo supports containerized execution using Docker or Podman and supplies a published container image name for running agents. Installation instructions include a conda environment and pip editable install. Users store environment variables and API keys in a provided .env example. GPU support, troubleshooting steps for Nvidia runtime, and Podman-specific setup are documented. A Streamlit-based trajectory visualizer and demo scripts are provided for inspecting agent trajectories. Licensing and citation information are included.
Use Cases
MLGym helps researchers by providing a unified, reproducible platform to benchmark and iterate on AI research agents and RL training protocols. Predefined task and agent configuration files let teams replicate experiments and compare methods consistently. Containerized workflows with Docker or Podman and an available container image simplify environment setup and ensure experiments run with GPU support when available. The run.py interface exposes experiment controls such as model choice, timeouts, cost limits, and step caps to manage runs. The Streamlit trajectory visualizer aids analysis of agent behavior and trajectories. Documentation for installation, troubleshooting, contributions, and citation help adoption and academic use. The project is maintained by research groups and intended for research and benchmarking rather than production deployment.

Please fill the required fields*