Report Abuse

Basic Information

AutoDidact is a research codebase for building and training an autonomous research agent that bootstraps its own search and reasoning abilities. The repository demonstrates how a smaller open-source LLM, specifically Llama-8B, can generate question-answer pairs from a document corpus, use those QA pairs to practice retrieval and reasoning, and learn from self-assessed correctness. The project includes a fully local pipeline that covers data generation, semantic search, embedding creation, function calling, agentic loops, and reinforcement learning using Group Relative Policy Optimization. It is intended for experiments and reproducible research where the model iteratively generates, researches, verifies, and improves its answers on a custom dataset such as the included Apollo 13 mission report. The codebase is built on top of Unsloth's Efficient GRPO implementation and is designed to run on a single high-end GPU.

Links

App Details

Features
The repository provides an end-to-end autonomous pipeline that includes QA pair generation, embedding and index creation, semantic search tooling, and a reinforcement learning training loop. It uses Llama-8B as the primary model and implements Group Relative Policy Optimization (GRPO) to refine search and reasoning policies. The agent performs autonomous self-verification by evaluating its own answers and creating reward signals for RL. Tooling files include generate_data.py for data and embeddings, search_module.py for semantic retrieval, embeddings.py for vector creation, rl_helpers.py for agent interactions and reward logic, and an autodidact.ipynb notebook demonstrating the full training workflow. The project supports function calling and agentic loops and is designed to run locally, with demonstrated short-run training on a single RTX 4090.
Use Cases
AutoDidact helps researchers and developers experiment with self-improving LLM agents by providing a reusable pipeline for generating training data, searching a document corpus, and applying reinforcement learning to improve retrieval-driven reasoning. It shows a practical workflow to customize any markdown dataset by replacing the sample Apollo 13 report and regenerating embeddings and QA pairs. The repo documents a reproducible example where 100 GRPO steps on one RTX 4090 improved validation accuracy from 23% to 59% on a 68-question set, illustrating measurable gains from the approach. The included scripts and notebook lower the barrier to exploring agentic search behaviors, function calling, and self-verification methods without requiring cloud services, enabling local research and iteration.

Please fill the required fields*