mle bench
Basic Information
Code and tooling to reproduce the MLE-bench benchmark for evaluating machine learning agents on machine learning engineering tasks. The repository provides the code used to construct the dataset, the evaluation and grading logic, and the agent implementations used in the benchmark study. It packages a collection of 75 Kaggle competitions with scripts to download raw data, split training sets into new training and test sets, and prepare both full and lite datasets. The repo also includes a leaderboard of evaluated agents, example usage, experimental artifacts from the paper, and extras such as rule-violation and plagiarism detectors. It supplies a base Docker environment and guidance on recommended resources and evaluation procedures for reproducible benchmarking.