trulens
Basic Information
TruLens is a developer-focused toolkit for systematically evaluating and tracking LLM experiments and related components such as prompts, models, retrievers, and knowledge sources. It provides fine-grained, stack-agnostic instrumentation and logging that runs alongside an application to capture model behavior and application-level events. The project codifies evaluation concepts like Feedback Functions, the RAG Triad, and Honest/Harmless/Helpful evaluations so teams can define objective feedbacks and metrics. TruLens aims to surface failure modes, support iterative improvement of LLM-based systems, and present experiment comparisons through an easy-to-use user interface. The repository includes installation instructions, quickstart examples and notebooks, and links to documentation and community contribution guidance.