RankGPT

Report Abuse

Basic Information

RankGPT is the research codebase accompanying the paper that investigates using large generative language models as re-ranking agents for information retrieval. It provides implementations to generate instructional permutations with LLMs, apply those permutations to re-rank candidate passages, and evaluate re-ranking performance on standard IR benchmarks. The repository includes example code showing how to call the permutation pipeline with models such as ChatGPT, a sliding window strategy to handle long candidate lists beyond token limits, and scripts to run end-to-end retrieval and evaluation using pyserini and trec_eval. It also contains data releases and precomputed ChatGPT permutations for MS MARCO, procedures to distill LLM outputs into compact supervised rankers, and utilities for reproducing the experiments reported in the paper.

Links

Categorization

App Details

Features
The project exposes a permutation generation pipeline with helper functions to create instructions, call LLMs, and apply predicted permutations to item hits. It implements a sliding window strategy to re-rank large sets of passages in chunks and provides evaluation utilities and example workflows for TREC, BEIR and Mr. TyDi using pyserini and trec_eval. The repo includes data and model artifacts such as sampled MS MARCO queries and ChatGPT-predicted permutations, scripts for distillation to specialized models using DeBERTa and RankNet loss, multi-GPU training examples via accelerate, and support for multiple LLM backends through LiteLLM. It also bundles ready-to-run evaluation and training scripts and benchmark result visualizations.
Use Cases
This codebase helps researchers and engineers reproduce and extend experiments on using LLMs for relevance re-ranking. It supplies concrete pipelines for converting LLM outputs into ranking permutations and for integrating those permutations into retrieval evaluations, lowering the barrier to try different LLMs and settings. The sliding window method enables practical use of LLMs on long lists by chunked re-ranking. Precomputed training permutations and provided distillation scripts allow teams to compress LLM behavior into smaller, deployable rankers, facilitating efficiency and scalability. Evaluation tooling and examples support benchmark comparisons and quantitative analysis to validate improvements in nDCG and other IR metrics.

Please fill the required fields*