AutoDidact
Basic Information
AutoDidact is a research codebase for building and training an autonomous research agent that bootstraps its own search and reasoning abilities. The repository demonstrates how a smaller open-source LLM, specifically Llama-8B, can generate question-answer pairs from a document corpus, use those QA pairs to practice retrieval and reasoning, and learn from self-assessed correctness. The project includes a fully local pipeline that covers data generation, semantic search, embedding creation, function calling, agentic loops, and reinforcement learning using Group Relative Policy Optimization. It is intended for experiments and reproducible research where the model iteratively generates, researches, verifies, and improves its answers on a custom dataset such as the included Apollo 13 mission report. The codebase is built on top of Unsloth's Efficient GRPO implementation and is designed to run on a single high-end GPU.