Report Abuse

Basic Information

AIDE ML is the open source reference implementation of the AIDE algorithm, a machine‚Äëlearning engineering agent that autonomously drafts, debugs and benchmarks code to optimise a user‚Äëdefined metric. It is delivered as a research‚Äëfriendly Python package with a CLI, a Streamlit web UI, HTML visualiser and configuration presets so academics and engineer‚Äëresearchers can replicate the paper, reproduce experiments, test new search heuristics or LLM back ends, and prototype end‚Äëto‚Äëend ML pipelines. The repo implements an agentic tree‚Äësearch over code where each Python script becomes a node and LLM‚Äëgenerated patches spawn child nodes. It also provides model‚Äëneutral plumbing for OpenAI, Anthropic, Gemini or local LLMs that speak the OpenAI API and includes example tasks, logging and artefact output for experiment analysis.

Links

Categorization

App Details

Features
Natural‚Äëlanguage task specification allowing users to point the agent at a dataset and describe goal and metric in plain English without bespoke wrappers. Iterative agentic tree search where metric feedback prunes and guides exploration and LLM‚Äëgenerated drafts spawn candidate patches. HTML visualiser to inspect the full solution tree and attached code at each node. Streamlit UI for interactive runs, live logs and tree inspection. Model‚Äëneutral plumbing supporting OpenAI, Anthropic, Gemini and local LLMs via OpenAI‚Äëcompatible endpoints. CLI with advanced flags (for example agent.code.model, agent.steps, agent.search.num_drafts) and a Python API for programmatic experiments. Utility outputs include logs/<id>/best_solution.py and logs/<id>/tree_plot.html. Dockerfile and development install instructions are provided. The README cites empirical results from MLE‚ÄëBench showing strong performance.
Use Cases
The repo helps agent‚Äëarchitecture researchers by providing a lean, extendable implementation to swap in new search heuristics, evaluators or LLM back ends and to reproduce the AIDE paper and related research. It helps ML practitioners by automating iterative code generation, debugging and benchmarking so teams can quickly optimise models and pipelines for a chosen metric. Users can run experiments via CLI, the Streamlit UI or the Python API, store logs and inspect the best solution and full solution tree. Support for local LLMs and Docker enables private or reproducible runs. The package produces artefacts for analysis and citation metadata for academic use, accelerating prototyping, ablation studies and deployment‚Äëoriented research.

Please fill the required fields*