Report Abuse

Basic Information

This repository contains the code, demonstration notebooks, and logged experiment outputs for the NeurIPS 2023 paper "Reflexion: Language Agents with Verbal Reinforcement Learning." It is organized to reproduce and explore experiments in three domains described in the paper: reasoning (HotPotQA), decision-making (AlfWorld), and programming. The materials include notebooks that run agent variants, shell scripts to launch iterative AlfWorld trials, and recorded runs and logs for prior experiments. Setup instructions show how to install required Python dependencies and configure an OpenAI API key. The project exposes configurable agent types and reflexion strategies and stores outputs in structured log directories so researchers and developers can inspect reasoning traces, self-reflections, and trial-level results without rerunning costly API experiments.

Links

Categorization

App Details

Features
Provides interactive notebooks for HotPotQA reasoning experiments and separate directories for AlfWorld decision-making and programming runs. Includes predefined agent types such as ReAct and chain-of-thought variants and an Enum of reflexion strategies: NONE, LAST_ATTEMPT, REFLEXION, and LAST_ATTEMPT_AND_REFLEXION. Offers shell tooling (run_reflexion.sh) with parameters for num_trials, num_envs, run_name, use_memory, resume options, and logging locations. Contains example logs and root directories for reproducing reported runs so users can inspect prior outputs. Contains figures illustrating the Reflexion approach and references to other implementations and related resources. Provides a requirements.txt for dependency installation and instructions to set the OPENAI_API_KEY environment variable.
Use Cases
This repository helps researchers and developers reproduce and analyze experiments on language agents that learn via verbal self-reflection. Users can run notebooks to sample HotPotQA questions, compare agent types and reflexion strategies, and explore recorded reasoning traces and self-reflections to understand failure modes and improvement patterns. The AlfWorld scripts let users run iterative decision-making trials with options to enable persistent memory for storing reflections, resume runs, and tune trial and environment counts. Logged outputs allow offline analysis of agent behavior without rerunning costly API calls. The README also documents practical constraints such as limited access to high-capacity models and API costs and provides a citation and contact for follow-up.

Please fill the required fields*