mcp crawl4ai rag

Report Abuse

Basic Information

This repository provides an MCP server that enables AI agents and AI coding assistants to crawl the web, index content into a vector database, and perform retrieval augmented generation workflows. It integrates Crawl4AI for web crawling, Supabase for vector storage and search, and optionally Neo4j for a knowledge graph used in hallucination detection and repository analysis. The server exposes MCP tools for single page crawling, smart site crawling, listing available sources, and running semantic RAG queries. It also includes optional agentic code extraction and code example search, cross-encoder reranking, and contextual embedding generation. The project is intended as a testbed and a building block to be integrated into a larger knowledge engine called Archon, and it supports running in Docker or directly with Python uv while using configuration flags to enable or disable advanced RAG strategies.

Links

App Details

Features
The server offers smart URL detection that handles sitemaps and text index files, recursive link following, parallelized crawling, and intelligent content chunking by headers and size. Core retrieval features include vector search with optional source filtering, hybrid search that merges keyword and semantic results, contextual embeddings that enrich chunk semantics, agentic RAG for extracting and summarizing large code blocks into a dedicated code search table, and reranking using a lightweight cross-encoder model. Optional knowledge graph tools parse GitHub repositories into a Neo4j schema, validate AI generated code against repository structure, and provide interactive graph queries. Deployment and integration features include Docker and uv workflows, SSE and stdio transports for MCP clients, and environment driven configuration via a .env file.
Use Cases
This MCP server helps developers and AI assistants by automating the collection and indexing of web documentation so models can answer specific queries with source-backed context. It supplies tools to quickly add a page or whole site to a vector store and to run precise RAG queries with options for reranking and hybrid search to improve relevance. For coding assistants it can extract large code examples with summaries, support dedicated code searches, and optionally detect hallucinations by validating generated code against a Neo4j knowledge graph built from parsed repositories. The server integrates with MCP clients via SSE or stdio, supports Docker or direct Python execution, and includes recommended configuration presets for general RAG, code-focused workflows, and hallucination detection.

Please fill the required fields*