SurfSense

Report Abuse

Basic Information

SurfSense is an open source AI research agent designed to let users query and research across their personal knowledge base and external services. It combines document ingestion, vector embeddings, and hybrid search to provide a private NotebookLM/Perplexity-like experience that integrates with search engines and collaboration tools such as Slack, Linear, Jira, ClickUp, Confluence, Notion, YouTube, GitHub and Discord. The project is intended to be self-hostable and configurable for local or cloud deployments. It provides backend services for indexing, retrieval-augmented generation (RAG), reranking and chat, plus a Next.js frontend and a cross-browser extension to save webpages. Installation supports Docker and manual setup and requires a vector-enabled database and an ETL/file-processing service. The repo targets researchers and knowledge workers who need privacy-aware, extensible, and cited research over mixed personal and external content.

Links

Categorization

App Details

Features
SurfSense supports multi-format ingestion (documents, images, audio, video) with compatibility details for LlamaCloud, Unstructured and Docling and lists support for 50+ formats depending on ETL choice. It offers powerful semantic search with hybrid search (semantic plus full-text) and Reciprocal Rank Fusion, hierarchical two-tier RAG indices, rerankers support and AutoEmbeddings/LateChunker for optimized chunking. Users can chat with saved content and receive cited answers. The platform supports 100+ LLMs, 6000+ embedding models, multiple TTS providers and local LLMs (Ollama) and includes a fast podcast generation agent. The stack includes FastAPI, PostgreSQL with pgvector, LangChain/LangGraph, Next.js frontend, a browser extension, Dockerized deployment and pgAdmin in the Docker setup.
Use Cases
SurfSense helps users centralize and query personal and external knowledge with cited, context-aware answers, reducing time spent searching across siloed tools. Its hybrid search and reranking improve relevance for research tasks while hierarchical RAG and many embedding model options help scale to varied document types. Privacy and local LLM support enable on-premise deployments or sensitive workflows. The cross-browser extension and connectors make it easier to capture protected webpages and third-party content. Fast podcast generation converts conversations into audio quickly and multiple TTS providers are supported. Docker and manual installation options simplify deployment and maintenance. Overall, SurfSense is aimed at researchers and teams who need a customizable, self-hosted research assistant that integrates corporate tools and large numbers of file formats for reproducible, cited results.

Please fill the required fields*