llm-reader
Basic Information
This repository provides an open source utility to convert webpages into LLM-friendly input text for use in retrieval-augmented generation and other LLM workflows. It is presented as an alternative to proprietary reader APIs and crawler services and is intended for developers who need a preprocessing step that turns a URL or web document into cleaner, LLM-ready text. The README shows simple usage via an asynchronous function that takes a URL and returns prepared text. The project notes it does not include anti-blocking or advanced parsing for non-HTML formats and recommends a paid API service named ParseExtract for anti-blocking, PDF/DOCX/image parsing, OCR, and table extraction when those capabilities are required. The repository also links to a companion scraping project for broader scraping and search workflows.