Report Abuse

Basic Information

RAGEN is a research-grade codebase for training large language model (LLM) reasoning agents with reinforcement learning in interactive, stochastic environments. It implements the StarPO framework (State-Thinking-Actions-Reward Policy Optimization) to generate multi-turn reasoning-action trajectories and optimize entire trajectories via alternating rollout and update stages. The repository provides the full training loop including rollout generation, reward calculation, trajectory optimization, and evaluation. It targets sequential decision-making challenges where states and actions are token sequences, enabling LLMs to reason over environment dynamics. RAGEN is modularly organized with explicit components for environment state management, context management, and an agent proxy. The project includes example environments and experiments (Bandit, Sokoban, FrozenLake), configuration templates, LoRA support for parameter-efficient training, and tooling for distributed runs and visualization to support reproducible agent research.

Links

App Details

Features
RAGEN packages a set of features for developing and evaluating RL-trained LLM agents. Core algorithmic support implements StarPO with rollout and update stages and supports multiple optimization strategies such as PPO and GRPO. It formulates interactions as MDPs where states/actions are token sequences and uses trajectory-level importance sampling for efficient long-horizon optimization. The codebase is modular: Environment State Manager, Context Manager, and Agent Proxy manage environment stepping, token parsing/context windowing, and rollouts. Configuration is centralized (config/base.yaml) with environment registration and LoRA-enabled training presets. It provides scripts for setup and training, evaluation entrypoints, wandb-compatible generation visualization, example environments, and integration examples for distributed training using dstack and Ray. The README documents how to add OpenAI Gym-compatible custom environments and contains performance and generalization results.
Use Cases
RAGEN helps researchers and engineers build, train, and study multi-turn LLM agents under stochastic dynamics by providing a complete, configurable RL training framework. It makes it straightforward to run end-to-end experiments including rollout collection, reward design, and trajectory optimization, enabling systematic comparison of algorithms and reward schemes. The modular design reduces engineering burden when adding new environments or swapping components like context windowing or value estimators. LoRA configuration options lower compute and memory cost for fine-tuning large models, and provided scripts plus dstack/Ray examples simplify distributed training. Built-in evaluation commands and wandb metrics let users inspect generated trajectories and performance. Demonstrated generalization across Sokoban variants and FrozenLake illustrates practical transfer benefits when training reasoning-capable agents.

Please fill the required fields*