Report Abuse

Basic Information

DocETL is a developer-focused toolkit for creating, iterating, and running data processing pipelines with an emphasis on complex document processing tasks. It provides both an interactive UI called DocWrangler for prompt engineering and pipeline design, and a Python package for running pipelines in production from the CLI or programmatically. The project is designed to integrate LLMs for text transformation and extraction workflows and documents necessary environment variables and prerequisites such as Python 3.10+ and an OpenAI API key. The README also highlights local development and deployment options including Docker and manual setup, plus integration support for AWS Bedrock when configured with credentials. The repository includes tutorials, a playground hosted at docetl.org/playground, and community examples to help users get started and export finalized pipeline configurations for production use.

Links

Categorization

App Details

Features
DocETL bundles an interactive playground (DocWrangler) that lets users iteratively develop and test pipeline steps and prompts with real-time feedback and export configurations. It offers a Python package installable via pip for production execution and command-line usage. The project documents Docker-based quickstart commands such as make docker and make docker-clean, and provides manual development commands including make install and make run-ui-dev. LLM integration is explicit with OpenAI API key requirements and optional AWS Bedrock support with make test-aws and docker compose profiles. The repo includes tutorials, example community projects, a test target for validating setup, environment variable templates for backend and frontend, and links to documentation and educational resources.
Use Cases
DocETL helps teams reduce the effort required to build, refine, and operationalize document-centric ETL workflows by combining interactive experimentation with repeatable production deployment. The DocWrangler UI supports iterative prompt engineering so developers can tune LLM-driven operations and immediately observe results before exporting a stable pipeline. The Python package and CLI enable running the same exported pipelines in production environments and automating them with Docker. Built-in support notes for OpenAI and AWS Bedrock expand model backend options. Documentation, tutorials, a test suite, and community examples provide onboarding and reproducibility, while environment templates and make targets simplify local development and CI-friendly setup for contributors and deployers.

Please fill the required fields*