Report Abuse

Basic Information

Agentic RAG-R1 is an open-source research and engineering repository that implements an Agentic Retrieval-Augmented Generation system and training pipeline. It is designed to endow a base language model with autonomous search and multi-step reasoning abilities by applying reinforcement learning, specifically the GRPO algorithm. The project provides architecture and code for an agent memory stack that orchestrates planning, reasoning, backtracking, summarization, tool observations and conclusions. The repo contains training and evaluation scripts, example uses, model checkpoints, a chat server and client, and instructions to integrate an external search tool. It is intended for developers and researchers who want to train, evaluate and deploy agentic RAG models and reproduce experiments reported in the README.

Links

Categorization

App Details

Features
The repository includes LoRA fine-tuning support and model quantization options such as nf4. It supports distributed training with DeepSpeed Zero 2 and Zero 3 stages and provides scripts for Zero-2 and Zero-3 training modes. Custom agent tools and personal RAG datasets can be integrated and an external search tool integration is provided via an ArtSearch submodule. The project implements a tool-calling reward model composed of accuracy, format, and RAG accuracy terms and reports the total reward formula. TCRAG is used as the rollout generator. Runtime artifacts include training configurations, evaluation utilities, a chat server/client, and resource-efficient settings claiming models up to 32B on 2 A100 GPUs.
Use Cases
This repo helps researchers and engineers build and improve agentic RAG systems by supplying end-to-end code for training, rollout generation, evaluation and inference. It provides a structured architecture for multi-step agentic reasoning and mechanisms to reinforce retrieval and reasoning decisions via GRPO. The included reward design and evaluation scripts allow measuring format and RAG accuracy and reproducing reported improvements such as higher MedQA format accuracy after fine-tuning. Integration points for a search engine and TCRAG rollout generation let users test different retrieval stacks. Deployment scripts and a chat server enable running inference and sharing trained models for downstream testing and demonstration.

Please fill the required fields*