docetl
Basic Information
DocETL is a developer-focused toolkit for creating, iterating, and running data processing pipelines with an emphasis on complex document processing tasks. It provides both an interactive UI called DocWrangler for prompt engineering and pipeline design, and a Python package for running pipelines in production from the CLI or programmatically. The project is designed to integrate LLMs for text transformation and extraction workflows and documents necessary environment variables and prerequisites such as Python 3.10+ and an OpenAI API key. The README also highlights local development and deployment options including Docker and manual setup, plus integration support for AWS Bedrock when configured with credentials. The repository includes tutorials, a playground hosted at docetl.org/playground, and community examples to help users get started and export finalized pipeline configurations for production use.