Multi Agents Debate

Report Abuse

Basic Information

This repository implements MAD (Multi-Agent Debate), a research framework to explore how multiple large language model agents can debate to improve reasoning and translation outcomes. The project formalizes a debate interaction among opposing agents labeled as devil and angel and a judge that decides between their arguments. It is intended for researchers and developers who want to reproduce the MAD experiment, run interactive demos, or study multi-agent debating strategies for LLMs. The README includes installation steps, example scripts, and instructions to set an OpenAI API key for running debate4tran.sh and an interactive Python script. The code is demonstrated on Counterintuitive QA and Commonsense Machine Translation tasks and includes case studies illustrating the debate process and judge adjudication.

Links

App Details

Features
The repo provides a debate framework that structures agents into adversarial roles (devil and angel) plus a judge to adjudicate answers. It includes runnable scripts such as debate4tran.sh and an interactive.py for local experimentation, a requirements file for Python dependencies, and instructions to configure an OpenAI API key. Demonstrations include animated examples and concrete case studies for arithmetic and translation puzzles, along with evaluation on Counterintuitive QA and Commonsense-MT tasks. The README documents the debate flow, role responsibilities, and sample dialogues that show how agents exchange arguments and how the judge reaches a conclusion. It also cites related literature used as references for the approach.
Use Cases
MAD helps researchers and practitioners study and utilize adversarial multi-agent interactions to reduce issues observed with solitary self-reflection, such as biased or degenerate reasoning. By forcing agents to counter and correct each other, MAD aims to expose blind spots, provide external feedback among agents, and encourage divergent thinking that improves answer quality on reasoning and translation benchmarks. The repository provides ready-to-run scripts and an interactive demo so users can reproduce experiments, test new prompts or model settings, and observe the debate dynamics. The documented case studies demonstrate how the framework leads to corrected reasoning and clearer translation choices, making it useful for experimentation and method development in LLM research.

Please fill the required fields*