RAGEN
Basic Information
RAGEN is a research-grade codebase for training large language model (LLM) reasoning agents with reinforcement learning in interactive, stochastic environments. It implements the StarPO framework (State-Thinking-Actions-Reward Policy Optimization) to generate multi-turn reasoning-action trajectories and optimize entire trajectories via alternating rollout and update stages. The repository provides the full training loop including rollout generation, reward calculation, trajectory optimization, and evaluation. It targets sequential decision-making challenges where states and actions are token sequences, enabling LLMs to reason over environment dynamics. RAGEN is modularly organized with explicit components for environment state management, context management, and an agent proxy. The project includes example environments and experiments (Bandit, Sokoban, FrozenLake), configuration templates, LoRA support for parameter-efficient training, and tooling for distributed runs and visualization to support reproducible agent research.