Report Abuse

Basic Information

Curie is an AI-agent framework designed to automate rigorous scientific experimentation. The repository provides tools to orchestrate an end-to-end research workflow that covers hypothesis formulation, experiment implementation, execution, verification, analysis and automated reporting. It targets researchers and developers who want reproducible, methodical experiments in areas such as machine learning engineering and system analysis. Curie accepts user-provided starter code and datasets, can perform automated model and strategy searches via an AutoML feature, and produces reproducible artifacts such as experiment scripts, workspace directories, logs and result notebooks. The project is distributed as a pip package and offers a manual developer installation that uses Docker. The README includes tutorials, example benchmarks drawn from MLE-Bench, demo materials and pointers to published papers that document the framework and evaluation.

Links

Categorization

App Details

Features
Automated experimentation pipeline that spans hypothesis generation, experiment construction, execution, result analysis and reflection. Built-in verification modules and reporting to enhance rigor, reliability and reproducibility of experimental procedures. Broad applicability to ML engineering and system analysis with example benchmarks and use cases. Support for user-supplied starter code and arbitrary datasets so experiments can be run on custom codebases and data. AutoML-style features to search for promising ML strategies. Automatic generation of experiment reports, notebooks, plain-text result files, logs and reproducible workspace scripts. Distribution options include a pip package and a manual installation workflow that leverages Docker. Documentation includes tutorials, example experiment logs and demo videos.
Use Cases
Curie reduces manual overhead in scientific experimentation by automating repetitive and error-prone steps, enabling faster iteration and more consistent procedures. It enforces methodological checks to improve experiment validity and supports reproducibility by writing runnable workspace scripts, logs and detailed reports for each run. For ML researchers it can explore model and training configurations, run benchmarks from MLE-Bench, and output notebooks and summary reports to aid interpretation. The framework helps teams reproduce results, compare strategies systematically, and archive complete experiment artifacts. Community support channels, example benchmarks and published papers are provided to help users adopt the tool and cite its methodology. The project is Apache 2.0 licensed for research and development use.

Please fill the required fields*