Grounding_LLMs_with_online_RL

Report Abuse

Basic Information

This repository contains the code and environment used to reproduce the experiments from the paper "Grounding Large Language Models with Online Reinforcement Learning." It provides an implementation of the GLAM method to perform functional grounding of LLMs on the BabyAI-Text benchmark and integrates with the Lamorel library to use and fine-tune language models. The repo bundles a custom BabyAI-Text environment, multiple agent implementations, training and evaluation scripts, and configuration/examples to run PPO-based online reinforcement learning with LLMs. The materials are intended for researchers and developers who want to train, evaluate, and analyze how LLMs can be grounded through interaction and reinforcement learning in a controlled simulated environment.

Links

Categorization

App Details

Features
The repository ships a BabyAI-Text environment implementation and an experiments directory with modular agent code. Agents include a bot wrapper, a uniformly random agent, a DRRN agent, PPO agents with both a SymbolicPPO variant and an LLM-grounded PPO variant, and scripts for behavioral cloning. It provides Lamorel-compatible configs, SLURM launch scripts and campaign examples for cluster runs, and utilities for training, post-training evaluation and results formatting. The README documents installation steps, required packages and example config entries for PPO hyperparameters, action spaces, prompt templates and evaluation flags. The project relies on Lamorel for LLM management and supplies example hyperparameters and logging/model saving hooks.
Use Cases
This codebase enables reproducible research into grounding LLMs with online reinforcement learning by providing the environment, agent implementations and end-to-end training and evaluation pipelines. Researchers can reproduce the paper"s experiments, run PPO fine-tuning of language models, try behavioral cloning, and measure generalization with provided post-training tests. The Lamorel integration simplifies using pretrained or finetuned LLMs, while SLURM scripts and campaign examples ease running experiments at scale. Config-driven training parameters and documented hyperparameters let users adapt action spaces, observation windows, prompt templates and evaluation modes to explore variants and compare agent behaviors under controlled settings.

Please fill the required fields*