Report Abuse

Basic Information

Search-o1 is a research and implementation repository that augments Large Reasoning Models (LRMs) with an agentic retrieval-augmented generation pipeline and a Reason-in-Documents module to reduce knowledge gaps during long, stepwise reasoning. The project provides code, data preprocessing notebooks, inference scripts and evaluation tools so researchers can apply Search-o1 to a range of challenging reasoning and open-domain QA benchmarks. It demonstrates a batch generation mechanism with interleaved search that detects when a model needs external information, issues search queries, fetches and refines documents, and reintegrates evidence into ongoing reasoning chains. The repository includes setup instructions, example commands for different inference modes, dataset preprocessing guidance and a backoff evaluation strategy for retrieval-based outputs.

Links

Categorization

App Details

Features
The repository implements an agentic Search Workflow that lets reasoning models dynamically generate search queries and retrieve external documents during inference. It provides a Reason-in-Documents module to analyze and integrate retrieved content into the model"s chain of thought. Search-o1 supports batch generation with interleaved search, configurable limits for searches, URL fetches, turns and document length, and options to use Jina for document processing and Bing for web search. Included scripts cover direct generation, naive RAG, RAG with agentic search and the full Search-o1 pipeline. Data preprocessing notebooks standardize multiple datasets into a unified JSON format. The project also ships evaluation utilities with a backoff mechanism to fall back on direct generation when retrieval outputs do not yield final answers.
Use Cases
Search-o1 helps researchers and developers improve the factual accuracy and reliability of large reasoning models by providing a reproducible system to integrate web search into reasoning loops. It reduces uncertainty in multi-step problem solving by detecting knowledge gaps, retrieving relevant documents, and weaving evidence into the reasoning chain. The included preprocessing tools, example scripts and configurable parameters make it straightforward to run experiments on benchmarks such as PhD-level science QA, math and code tasks, single- and multi-hop QA. The evaluation utilities save inputs and outputs, apply a backoff strategy for retrieval failures, and support comparisons across direct, naive RAG, agentic RAG and the Search-o1 method.

Please fill the required fields*