OpenContracts

Report Abuse

Basic Information

OpenContracts is a free, GPL-3 licensed open source document analytics platform for enterprise use that focuses on ingesting, annotating and extracting data from PDF and text-based documents. It provides a web application and backend designed to manage document collections (Corpuses), parse document layout, generate vector embeddings, and display analysis and annotations over original documents. The project emphasizes a pluggable pipeline and microservice architecture so teams can add new parsers, embedders and thumbnailers, and it includes integrations such as LlamaIndex and a Django + pgvector-backed vector store for hybrid vector/metadata queries. The repo includes documentation, example parsers like Docling and NLM ingestors, a human annotation UI, and tooling to build custom data extractors and bespoke analytics. It is intended for organizations that need scalable, auditable document analysis and LLM-assisted querying of legal or unstructured documents.

Links

Categorization

App Details

Features
OpenContracts exposes a set of concrete features to support document analytics: management of document corpuses, an automated layout parser for PDFs, and automatic generation of vector embeddings for uploaded documents and layout blocks. It uses a pluggable parsing pipeline that supports custom parsers, embedders and thumbnailers and ships example integrations such as Docling and NLM ingest. The platform provides a pluggable microservice analyzer architecture for automated annotations, a human annotation interface for manual multi-page labeling, and a data extraction grid to run LLM-powered queries across many documents. Backend storage is built around Django with pgvector for hybrid vector and metadata storage and there is an official wrapper for LlamaIndex to simplify LLM queries. Documentation, quickstart guides and walkthroughs are provided.
Use Cases
OpenContracts helps teams and organizations analyze large collections of contracts and other unstructured documents by combining visual PDF/text parsing, vector embeddings, and LLM-powered retrieval. It enables rapid deployment of bespoke analytics and custom extractors so users can define extraction workflows and run bulk queries across hundreds of documents. The human annotation UI supports human-in-the-loop workflows to improve extraction quality and training data. Its pluggable pipeline and standardized data format make document processing portable and extensible, letting engineers add new parsers or embedder backends without reworking the core application. Integration with LlamaIndex and a pgvector-backed store enables hybrid semantic and metadata search for more accurate, context-aware answers.

Please fill the required fields*