Report Abuse

Basic Information

OpenManus-RL is an open-source project for researching and developing reinforcement learning methods to tune large language model agents. It is a collaborative effort led by Ulab-UIUC and MetaGPT and extends the original OpenManus initiative. The repository aims to explore RL-based paradigms that improve agent reasoning, decision-making, and tool integration by combining trajectory data, reasoning models, and RL tunning techniques. It aggregates agent trajectory datasets, outlines methodologies for reward modeling and rollout strategies, and provides a simplified library for supervised fine-tuning and generalized reward-based policy optimization built atop existing RL toolkits. The project is maintained as a live-streaming research effort with planned code and dataset releases, benchmark evaluations on agent benchmarks, and an invitation for community contributions.

Links

App Details

Features
The README documents a combined dataset hosted on Hugging Face that merges AgentInstruct, Agent-FLAN, and AgentTraj-L to produce over fifty thousand trajectories with ReAct-style reasoning. Methodological features include multiple rollout strategies such as Tree-of-Thoughts, Graph-of-Thoughts, DFSDT, and MCTS. Post-training approaches enumerated include SFT, GRPO, PPO, DPO, and preference-based reward modeling. The repo integrates insights from RL tuning frameworks like Verl, TinyZero, OpenR1, and Trlx and provides environment setup and evaluation instructions for benchmarks such as WebShop and ALFWorld. Installation and quick-start scripts, example ReAct data instances, and documented supported tasks (text-generation and conversational-ai) are included.
Use Cases
OpenManus-RL helps researchers and developers by packaging dataset resources, experimental designs, and training recipes to study RL tuning for agentic LLMs. It standardizes trajectory data and ReAct patterns to support supervised and reward-based optimization workflows. The project describes ways to train reward models, scale trajectories at test time, and incorporate action-space awareness to improve exploration. Environment setup guides and scripts enable reproducible evaluation on established benchmarks like WebShop, GAIA, AgentBench, and ALFWorld. By integrating multiple rollout strategies and RL frameworks, the repository serves as a platform to compare algorithms, reproduce experiments, and extend agent training pipelines. The maintainers actively solicit community contributions and plan ongoing releases and documentation.

Please fill the required fields*