ViDoRAG
Basic Information
ViDoRAG is a research and engineering repository for building and evaluating retrieval-augmented generation systems over visually rich documents. It provides a multi-agent RAG framework that uses iterative actor-critic style reasoning agents to handle complex multi-hop queries against large document collections. The project includes the ViDoSeek benchmark dataset designed for retrieval-reason-answer tasks on visually rich documents, tooling to preprocess PDFs into image pages, optional OCR or vision-language model text extraction, and scripts to build an index and run end-to-end retrieval and generation. The README documents dependency setup, ingestion and embedding steps, dynamic retrieval options, a multi-agent generation entry point, and an evaluation pipeline. The codebase is aimed at researchers and developers who want to reproduce experiments, experiment with multimodal retrievers, or extend agent-based RAG for visual documents.