Report Abuse

Basic Information

This repository provides an open source implementation of research on advanced reasoning and learning for autonomous AI agents, focused on completing tasks on the web reliably. It bundles multiple agentic architectures inspired by the Agent Q paper and related work so developers and researchers can build, run and evaluate web-capable agents. The project includes setup guidance such as dependency installation with poetry, launching a Chrome instance with remote debugging for web interaction, and configuring environment keys for OpenAI and Langfuse. The repo is intended as a development and experimentation platform rather than a polished end-user product. It exposes commands to start the agent, run evaluation suites, and generate reinforcement learning / DPO training pairs, enabling reproduction of experiments described in the associated research.

Links

App Details

Features
Implements several agentic architectures documented in the README: a planner<>navigator multi-agent architecture, a solo planner-actor agent, an actor<>critic multi-agent setup, and an actor<>critic configuration augmented with Monte Carlo tree search and DPO fine-tuning. Provides utilities to run the agent (python -m agentq), run evaluation tests (tests_processor with orchestrator types), and generate DPO pairs via a browser MCTS module. Integrates with OpenAI for model access and Langfuse for tracing, with notes on how to disable tracing if desired. Includes citations to the underlying research and related work, and instructions for launching Chrome in remote-debugging mode to enable web interactions.
Use Cases
The repository helps developers and researchers prototype and evaluate autonomous web agents using multiple reasoning and learning approaches from recent research. It supplies reproducible setup steps and example commands to run agents, perform evaluations, and generate training data for reinforcement learning and preference fine-tuning workflows. Integration with OpenAI makes it straightforward to plug in language models while Langfuse tracing aids observability during runs. The multi-architecture codebase allows comparative experiments across planner, navigator, actor-critic, and MCTS-augmented methods, supporting study of performance and learning behaviors on web tasks that require navigation and sequential decision making.

Please fill the required fields*