judgeval
Basic Information
Judgeval is an open-source SDK and toolkit designed to capture, evaluate, and manage runtime data from autonomous, stateful agents. It instruments agent-environment interactions and tool calls to produce traces that support monitoring, post-training workflows, and continuous improvement. The repo provides a pip-installable client and documentation for both cloud-hosted and self-hosted deployments, along with cookbooks and examples showing how to capture full environment responses with minimal code. Its primary purpose is to supply the data and evaluation signals needed to run evaluators, export datasets, track metrics, and integrate with developer workflows so teams can analyze agent behavior, run regressions, and iterate on agent policies and configurations.