Grounding_LLMs_with_online_RL

This repository contains the code and environment used to reproduce the experiments from the paper "Grounding Large Language Models with Online Reinforcement Learning." It provides an implementation of the GLAM method to perform functional grounding of LLMs on the BabyAI-Text benchmark and integrates with the Lamorel library to use and fine-tune language models. The repo bundles a custom BabyAI-Text environment, multiple agent implementations, training and evaluation scripts, and configuration/examples to run PPO-based online reinforcement learning with LLMs. The materials are intended for researchers and developers who want to train, evaluate, and analyze how LLMs can be grounded through interaction and reinforcement learning in a controlled simulated environment.

Stars

268

Language

App URL

https://github.com/flowersteam/Grounding_LLMs_with_online_RL

Github Repository

https://github.com/flowersteam/Grounding_LLMs_with_online_RL/blob/main/README.md

Features

The repository ships a BabyAI-Text environment implementation and an experiments directory with modular agent code. Agents include a bot wrapper, a uniformly random agent, a DRRN agent, PPO agents with both a SymbolicPPO variant and an LLM-grounded PPO variant, and scripts for behavioral cloning. It provides Lamorel-compatible configs, SLURM launch scripts and campaign examples for cluster runs, and utilities for training, post-training evaluation and results formatting. The README documents installation steps, required packages and example config entries for PPO hyperparameters, action spaces, prompt templates and evaluation flags. The project relies on Lamorel for LLM management and supplies example hyperparameters and logging/model saving hooks.

Use Cases

This codebase enables reproducible research into grounding LLMs with online reinforcement learning by providing the environment, agent implementations and end-to-end training and evaluation pipelines. Researchers can reproduce the paper"s experiments, run PPO fine-tuning of language models, try behavioral cloning, and measure generalization with provided post-training tests. The Lamorel integration simplifies using pretrained or finetuned LLMs, while SLURM scripts and campaign examples ease running experiments at scale. Config-driven training parameters and documented hyperparameters let users adapt action spaces, observation windows, prompt templates and evaluation modes to explore variants and compare agent behaviors under controlled settings.

Grounding_LLMs_with_online_RL

Basic Information

Links

Categorization

App Details

Categories

Similar Listings

LLM-Powered-RAG-System

awesome-llm-plaza

serverless-rag-demo

ai-agent-smart-assist

MA-AIRL

Featured Listings

Terry Bison Ranch

The Singapore Flyer

Tags

More Filters

Grounding_LLMs_with_online_RL

Categories

Similar Listings

LLM-Powered-RAG-System

awesome-llm-plaza

serverless-rag-demo

ai-agent-smart-assist

MA-AIRL

Featured Listings

Terry Bison Ranch

The Singapore Flyer

Tags