Report Abuse

Basic Information

This repository hosts ToRA, a research codebase and model suite for tool-integrated reasoning agents aimed at solving challenging mathematical problems. It provides the implementations, training pipeline, evaluation logic, example data, and released model checkpoints associated with the ICLR 2024 paper. The project focuses on agents that interleave natural language reasoning with programmatic tool calls to computation libraries and symbolic solvers. The repo includes instructions and scripts for setup, inference, evaluation, and training so researchers can reproduce experiments, run the provided models, or construct their own datasets. It also publishes model outputs and points to released models on model hubs, enabling replication and analysis of the reported results.

Links

Categorization

App Details

Features
The codebase integrates natural language reasoning with external tools and provides a training pipeline that includes imitation learning and output-space shaping. It ships multiple open-source ToRA models spanning sizes from 7B to 70B and highlights ToRA-Code-34B as achieving over 50% pass@1 on the MATH dataset. The repository contains inference and training scripts, an evaluation grader based on Hendrycks" MATH grading system, example data in a data/tora folder, and exported model outputs. Recommended setup uses Conda and vLLM for accelerated inference and PyTorch for training. The project also documents how to run reproducible evaluations and offers prebuilt configs for common experiments.
Use Cases
ToRA helps researchers and engineers studying reasoning and program-aided problem solving by providing ready-to-run models, training code, and evaluation tooling. Users can reproduce published results, run inference with released checkpoints, examine model outputs, and evaluate predictions using the provided grader. The training pipeline and example data enable teams to construct custom datasets and retrain or fine-tune agents. The repo"s use of vLLM and clear setup instructions simplify inference at scale, while the open-sourced models and outputs support comparative analysis, ablation studies, and further research into tool-integrated reasoning for mathematical tasks.

Please fill the required fields*