TheoremExplainAgent

Report Abuse

Basic Information

This repository contains the full codebase, data pointers and scripts for TheoremExplainAgent, an AI system developed to generate long-form Manim videos that visually explain mathematical theorems. It implements the methods described in the ACL/ArXiv paper and provides generation and evaluation pipelines, example commands, environment setup instructions and a paired dataset called TheoremExplainBench. The code supports multimodal generation including voiceover, Manim scene rendering and optional retrieval-augmented generation (RAG). The repo also hosts utilities for configuring models via a .env, downloading pretrained TTS models used in the project, and producing batch or single-topic video outputs. The project is intended for research use and includes instructions to reproduce baseline videos used in the paper.

Links

Categorization

App Details

Features
End-to-end generation pipeline driven by generate_video.py with single-topic, batch and RAG-enabled modes and many CLI options such as model selection, helper model, output directory, concurrency and rendering flags. Manim-based visual scene generation with optional Kokoro TTS voiceover integration and instructions to download Kokoro model files. Support for LiteLLM-style model naming and multiple provider backends configured via .env. RAG support that builds a Chroma vector DB from local manim docs, context learning options and visual-fix code using VLMs. Evaluation tools via evaluate.py for automatic multimodal scoring using text/video/image LLMs, support for combining and bulk evaluation, and requirements/setup guidance including conda, LaTeX and system dependencies. Includes TheoremExplainBench dataset metadata and example usage.
Use Cases
The repository helps researchers and developers build, reproduce and evaluate multimodal theorem explanation videos to probe LLM understanding and expose reasoning gaps that text alone may miss. It provides reproducible generation scripts, a benchmark dataset for sampling theorems, and released baseline video data so users can compare new methods to the paper results. The included evaluation pipeline automates scoring of videos with configurable LLM and vision models, enabling systematic comparison across models and settings. RAG and context-learning options let users incorporate external manim documentation to improve code and explanations. Detailed installation, configuration and FAQ entries reduce setup friction for experiments and for producing or evaluating new multimodal theorem explanations.

Please fill the required fields*