agent-workflow-memory

Report Abuse

Basic Information

This repository implements Agent Workflow Memory (AWM), a research codebase and experimental toolkit for inducing, integrating, and utilizing reusable workflows inside agent systems. The project presents methods to extract common sub-routines (workflows) from examples or past agent experiences and to attach those workflows to agent memory to improve task solving. It includes runnable pipelines and environment-specific code for two evaluation settings, WebArena and Mind2Web, with instructions to run offline and online modes. The repo enables reproduction of the AWM paper's experiments, provides assets and result figures, and exposes scripts to run the provided pipelines for different websites or setups. The primary goal is to demonstrate a memory-driven workflow induction approach and provide the code necessary for researchers to evaluate and extend the method.

Links

Categorization

App Details

Features
AWM supports both offline induction from annotated training examples and online induction from an agent"s past experiences, with distinct code paths for each mode. The repository contains two main experiment suites in webarena/ and mind2web/ and includes pipeline scripts to run evaluations (for example, pipeline.py with --website or --setup flags). It reports state-of-the-art results on WebArena (a reported 35.6% success rate) and strong performance on Mind2Web, with assets and result visualizations included. The codebase is organized for reproducibility, includes example commands and environment notes, and provides a citation entry for the accompanying paper. The design focuses on extracting workflow abstractions, integrating them into agent memory, and measuring downstream task success.
Use Cases
For researchers and developers working on agent architectures and memory, this repo provides a concrete implementation of workflow induction and integration that can be reproduced and extended. It lets users run end-to-end pipelines to evaluate how induced workflows affect agent performance in simulated web and text environments. The dual offline/online modes let experimenters compare training-from-examples versus online learning from agent traces. Provided scripts, environment notes, and result artifacts make it straightforward to benchmark AWM against other approaches, analyze failure modes, and adapt the workflow-memory mechanism to new agent tasks or domains. The repo is useful for benchmarking, method comparison, and as a starting point for further research on agent memory and procedural abstraction.

Please fill the required fields*