Report Abuse

Basic Information

Agent-R1 is an open-source framework to train language-model-driven agents using end-to-end reinforcement learning. It is designed for researchers and developers working at the intersection of reinforcement learning and agent design, enabling the creation of agents that learn full interaction trajectories rather than hand-engineered workflows. The repository provides abstractions for defining domain-specific tools and reward functions so users can extend the framework to their own environments. It includes tutorials, example tools and environments, data preprocessing scripts, and inference utilities so practitioners can both train and deploy models. The project emphasizes multi-modal capabilities and practical training algorithms, and it documents algorithm details, setup instructions, and community resources to help teams reproduce experiments and build custom agent applications.

Links

App Details

Features
Agent-R1 provides multi-turn tool calling that trains on complete interaction trajectories and supports multi-tool coordination so agents can combine tools to solve complex tasks. It implements process rewards for per-tool evaluation and balances them with outcome rewards using normalization inspired by PRIME. The framework exposes BaseTool and BaseToolEnv abstractions to create custom tools and environments and includes example implementations in agent_r1/tool/tools and agent_r1/tool/envs. It supports multiple RL algorithms including PPO, GRPO, and REINFORCE++, offers multi-modal support for vision-language models, supplies inference scripts and a simple interactive chat interface, and includes tutorials such as a runnable ReTool implementation. The codebase integrates verl as a submodule and contains reward utilities and data preprocessing examples.
Use Cases
Agent-R1 lowers the barrier to research and development of autonomous agents by providing reusable infrastructure for training agents with RL instead of handcrafting workflows. Developers can plug in their own tools and reward functions via documented abstractions, reuse example tools and environments, and follow quickstart guides to run experiments such as a default search tool on HotpotQA. The repo includes training and inference workflows, multi-modal support, and practical bug fixes and redesigns to tool environments that improve extensibility. It also provides community channels for feedback and collaboration, tutorials for customizing tools and tool environments, and utilities for reward computation and data preprocessing to accelerate iteration and reproducible experimentation. The project currently notes support for the Qwen model and plans broader model integration.

Please fill the required fields*