Report Abuse

Basic Information

Doctor is a tooling stack for discovering, crawling, indexing and exposing websites so they can be consumed by LLM agents via an MCP server. The project orchestrates web crawling, text chunking, embedding creation and vector storage, and serves a web API and MCP endpoint so language models and editor integrations can query up-to-date site content. The README documents the overall architecture, setup steps, required environment variables and how to run the stack with Docker Compose. It is intended as an infrastructure component that turns web sites into searchable, hierarchical document stores that LLMs can use for improved reasoning and code generation.

Links

App Details

Features
The repository integrates crawl4ai-based crawling with hierarchy tracking, a chunker built with LangChain, and embedding creation using OpenAI via litellm. Documents and vectors are stored in DuckDB with vector search support and managed by a unified Database class. A crawl worker processes jobs asynchronously using Redis as a message broker. A FastAPI web server exposes endpoints for starting crawls, checking job progress, searching documents and viewing indexed pages. The site map feature provides hierarchical navigation, domain grouping, breadcrumb and sibling navigation, and raw page retrieval. The project includes tests, pytest configuration, pre-commit hooks and Docker Compose deployment instructions.
Use Cases
Doctor enables teams and developers to convert websites into structured, searchable knowledge bases that LLMs can query through an MCP server. By combining crawling, chunking, embeddings and vector storage, it provides more current and context-rich inputs for reasoning and code generation tasks. The exposed FastAPI endpoints and MCP integration allow editor tools like Cursor or VSCode to connect directly to the indexed content. Hierarchical sitemaps and navigation improve context discovery and browsing of large sites. The stack is packaged for local development with Docker Compose, supports asynchronous job processing with Redis, and includes testing and quality tooling to help maintain reliable indexing pipelines.

Please fill the required fields*