Report Abuse

Basic Information

ReCall is a research and engineering repository that provides a training and evaluation framework for teaching large language models to reason by calling external tools via reinforcement learning. The project focuses on enabling LLMs to agentically use and combine arbitrary user-defined tools without supervised trajectories for tool use or stepwise reasoning. The implementation is a successor to an earlier ReSearch project and contains code, data preparation scripts, training recipes, serving utilities and evaluation scripts to reproduce and extend the approach. The repo is intended for developers and researchers who want to train, fine-tune, evaluate and serve models that learn tool-based multi-step reasoning. It bundles a customized reinforcement learning stack, sandboxed tool execution, retriever services and inference wrappers to orchestrate model generation and tool execution in experiments and benchmarks.

Links

Categorization

App Details

Features
The repository customizes an RL training stack based on verl and includes the modified verl code under src/verl. It supports training on a mixture of a synthetic SynTool dataset and MuSiQue training data and provides a data preparation script for MuSiQue. Tool execution is isolated via a basic sandbox service implemented in Python and served by scripts/serving/sandbox.py. Retriever serving uses a FlashRAG-based FastAPI service with configuration in retriever_config.yaml and scripts/serving/retriever_serving.py. Training scripts include single-node and multi-node examples under scripts/train and a multi-node reproduction script. An inference wrapper class is provided at src/re_call/inference/re_call.py with a sample use case script. Evaluation utilities for multi-hop QA using FlashRAG and an upcoming BFCL evaluation are included. The repo documents installation, environment setup and recommended model serving with SGLang.
Use Cases
ReCall helps researchers and engineers build LLM agents that learn to call and chain tools through reinforcement learning without annotated tool-use demonstrations. It supplies end-to-end artifacts for preparing datasets, launching sandboxed tool executors, hosting retrievers, running single-node and multi-node RL training, and performing inference with an orchestration wrapper. The repo reduces integration work by packaging modified RL components, example training commands for model variants, evaluation scripts for multi-hop QA, and guidance for model serving. It also provides preprocessed training data references and pointers to released models to accelerate experimentation. The included sandbox and retriever services make it easier to test tool-based reasoning safely in remote setups, and the inference wrapper simplifies running trained models that coordinate generation and tool execution.

Please fill the required fields*